[ASK] the encode of the feature in field
#1,114 opened on Jun 1, 2020
Repository metrics
- Stars
- (17,706 stars)
- PR merge metrics
- (Avg merge 6d 16h) (10 merged PRs in 30d)
Description
hi,dear
Description
I'm a little confused that the Class for encoding the feature in the field, I see the example,
df_feature_original = pd.DataFrame({
'rating': [1, 0, 0, 1, 1],
'field1': ['xxx1', 'xxx2', 'xxx4', 'xxx4', 'xxx4'],
'field2': [3, 4, 5, 6, 7],
'field3': [1.0, 2.0, 3.0, 4.0, 5.0],
'field4': ['1', '2', '3', '4', '5']
})
converter = LibffmConverter().fit(df_feature_original, col_rating='rating')
df_out = converter.transform(df_feature_original)
df_out
| rating | field1 | field2 | field3 | field4
0 | 1 | 1:1:1 | 2:4:3 | 3:5:1.0 | 4:6:1
1 | 0 | 1:2:1 | 2:4:4 | 3:5:2.0 | 4:7:1
2 | 0 | 1:3:1 | 2:4:5 | 3:5:3.0 | 4:8:1
3 | 1 | 1:3:1 | 2:4:6 | 3:5:4.0 | 4:9:1
4 | 1 | 1:3:1 | 2:4:7 | 3:5:5.0 | 4:10:1
I found the number of the feature is increasing in a field,then encode the features in another field , but that's not same with the FFM author down
Click Advertiser Publisher
===== ========== =========
0 Nike CNN
1 ESPN BBC
Then, you can generate FFM format data:
0 0:0:1 1:1:1
1 0:2:1 1:3:1
he encodes the features in a example and then another example , so the method in the rp is different, does the difference affect the results ??
Other Comments
maybe my poor English could not be understood,Now is the Chinese Time down 这里的rp编码规则: 先对一个field内的 feature进行编码,然后再对另一个field内的feature进行编码 而libffm的编码: 对一条数据的所有fields内的features进行编码,然后下一条数据, 这两种feature编码规则会影响最终的结果吗?
多谢大佬
waiting for your kind reply ! thx