scikit-learn-contrib/category_encoders

Add argument for alternative indexing of OrdinalEncoder

Closed

#291 opened on 2021年2月11日

GitHub で見る
 (0 comments) (0 reactions) (0 assignees)Python (2,322 stars) (397 forks)batch import
enhancementgood first issue

説明

Expected Behavior

There are a variety of applications in which zero-indexing would be preferred for the OrdinalEncoder. One example is preparing features for a PyTorch model with categorical embeddings, in which case the ordinal label is used to slice dimensions of an embedding matrix. Note also that the sklearn OrdinalEncoder is zero-indexed.

One could possibly add an argument to init() that specifies the indexing (e.g., self.index_start), so that the ordinal_encoding() method can do something like:

data = pd.Series(index=index, data=range(self.index_start, len(index) + self.index_start))

Actual Behavior

The ordinal_encoding() method imposes one-indexing in this line: data = pd.Series(index=index, data=range(1, len(index) + 1))

Specifications

  • Version: 2.2.2

コントリビューターガイド