[ASK] <xDeepFM> help on how to set FEATURE_COUNT values for unseen data for training and for inference. · recommenders-team/recommenders#1830

Repository metrics

Stars: (17,706 stars)
PR merge metrics: (平均マージ 6d 16h) (30d で 10 merged PRs)

説明

Hi all, I am trying to understand the FEATURE_COUNT values in xDeepFM model. From the code, I understand that FEATURE_COUNT values is determine by the numbers of features generated when creating the dictionary mapping using LibffmConverter. However, when I try to predict on new dataset, it will throw out of bound or indices not found error. If I am not mistaken, from the paper (xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems), the Feature_COUNT are used to determine the size of the embedding layer according to section 2.1.

If this is the case, do we arbitrary increase the number of FEATURE_COUNT values in order to cater for unseen dataset during inference? But will this increase the size of the embedding layer which might not be used at all during training, in order to cater for inference for unseen data? Or am I mistaken all together, and there is a way to handle unseen data and calculate the FEATURE_COUNT values?

コントリビューターガイド

調査方針: トレーニング中にFEATURE COUNTがどのように計算されるか、および推論中に未知の特徴を処理する方法を調査します。特徴ハッシュや未知の特徴に対するデフォルト埋め込みの使用などの戦略を検討してください。
技術スタック: python
領域: machine learning
Issue 種別: 調査
難度: 3
推定時間: 半日
活動状況: 新着
明確さ: 明確
前提条件: PythonDeep Learning
初心者向け度: 60

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。