Repository metrics

Stars: (1,217 stars)
PR merge metrics: (Avg merge 2d 12h) (4 merged PRs in 30d)

Description

Brief Description

In general, when you create dummy variables it is a good idea to drop one of the resultant columns as it is a linear combination of the other columns. See https://datascience.stackexchange.com/questions/27957/why-do-we-need-to-discard-one-dummy-variable

Pandas has the drop_first option in get_dummies

I would like to propose that drop_first be added as a parameter to expand_columns

Example API

Please modify the example API below to illustrate your proposed API, and then delete this sentence.

>>> X_cat2 = pd.DataFrame({'A': [1, None, 3],
...     'names': ['Fred,George', 'George', 'John,Paul']})
>>> jn.expand_column(X_cat2, 'names', sep=',')
     A        names  Fred  George  John  Paul
0  1.0  Fred,George     1       1     0     0
1  NaN       George     0       1     0     0
2  3.0    John,Paul     0       0     1     1

>>> X_cat2 = pd.DataFrame({'A': [1, None, 3],
...     'names': ['Fred,George', 'George', 'John,Paul']})
>>> jn.expand_column(X_cat2, 'names', sep=',', drop_first=True)
     A        names  Fred  George  John
0  1.0  Fred,George     1       1     0     
1  NaN       George     0       1     0     
2  3.0    John,Paul     0       0     1

Contributor guide

Research direction: Add a `drop first` parameter to `expand column`. Implement the drop logic by removing the first dummy column after creation. Update existing tests and add new tests for the parameter. Consult pandas `get dummies` documentation for reference.
Tech stack: pythonpandas
Domain: data
Issue type: Feature
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: Pythonpandas
Newbie friendliness: 75

Repository metrics

Description

Brief Description

Example API

Contributor guide

Get fresh easy issues in your inbox.