recommenders-team/recommenders

[ASK] the encode of the feature in field

Open

#1,114 opened on Jun 1, 2020

View on GitHub
 (2 comments) (0 reactions) (0 assignees)Python (2,972 forks)batch import
help wanted

Repository metrics

Stars
 (17,706 stars)
PR merge metrics
 (Avg merge 6d 16h) (10 merged PRs in 30d)

Description

hi,dear

Description

I'm a little confused that the Class for encoding the feature in the field, I see the example,

df_feature_original = pd.DataFrame({
    'rating': [1, 0, 0, 1, 1],
    'field1': ['xxx1', 'xxx2', 'xxx4', 'xxx4', 'xxx4'],
    'field2': [3, 4, 5, 6, 7],
    'field3': [1.0, 2.0, 3.0, 4.0, 5.0],
    'field4': ['1', '2', '3', '4', '5']
})
converter = LibffmConverter().fit(df_feature_original, col_rating='rating')
df_out = converter.transform(df_feature_original)
df_out

 | rating | field1 | field2 | field3 | field4

0 | 1 | 1:1:1 | 2:4:3 | 3:5:1.0 | 4:6:1
1 | 0 | 1:2:1 | 2:4:4 | 3:5:2.0 | 4:7:1
2 | 0 | 1:3:1 | 2:4:5 | 3:5:3.0 | 4:8:1
3 | 1 | 1:3:1 | 2:4:6 | 3:5:4.0 | 4:9:1
4 | 1 | 1:3:1 | 2:4:7 | 3:5:5.0 | 4:10:1

I found the number of the feature is increasing in a field,then encode the features in another field , but that's not same with the FFM author down

Click  Advertiser  Publisher
=====  ==========  =========
    0        Nike        CNN
    1        ESPN        BBC
Then, you can generate FFM format data:
    0 0:0:1 1:1:1
    1 0:2:1 1:3:1

he encodes the features in a example and then another example , so the method in the rp is different, does the difference affect the results ??

Other Comments

maybe my poor English could not be understood,Now is the Chinese Time down 这里的rp编码规则: 先对一个field内的 feature进行编码,然后再对另一个field内的feature进行编码 而libffm的编码: 对一条数据的所有fields内的features进行编码,然后下一条数据, 这两种feature编码规则会影响最终的结果吗?

多谢大佬

waiting for your kind reply ! thx

Contributor guide