facebookresearch/fairseq

Invalid suffix of raw dataset when preprocessing without language

Open

#1,426 opened on 2019年11月25日

GitHub で見る
 (2 comments) (0 reactions) (0 assignees)Python (6,224 forks)batch import
bughelp wanted

Repository metrics

Stars
 (29,107 stars)
PR merge metrics
 (30d に merged PR はありません)

説明

When preprocessing using --dataset-impl raw and no source and target languages are specified, the datasets are stored under train.None-None due to this line:

https://github.com/pytorch/fairseq/blob/5349052aae4ec1350822c894fbb6be350dff61a0/preprocess.py#L218

Is this expected behavior or can we remove this suffix?

コントリビューターガイド