scikit-learn-contrib/category_encoders

Add argument for alternative indexing of OrdinalEncoder

Closed

#291 建立於 2021年2月11日

在 GitHub 查看
 (0 留言) (0 反應) (0 負責人)Python (2,322 star) (397 fork)batch import
enhancementgood first issue

描述

Expected Behavior

There are a variety of applications in which zero-indexing would be preferred for the OrdinalEncoder. One example is preparing features for a PyTorch model with categorical embeddings, in which case the ordinal label is used to slice dimensions of an embedding matrix. Note also that the sklearn OrdinalEncoder is zero-indexed.

One could possibly add an argument to init() that specifies the indexing (e.g., self.index_start), so that the ordinal_encoding() method can do something like:

data = pd.Series(index=index, data=range(self.index_start, len(index) + self.index_start))

Actual Behavior

The ordinal_encoding() method imposes one-indexing in this line: data = pd.Series(index=index, data=range(1, len(index) + 1))

Specifications

  • Version: 2.2.2

貢獻者指南