facebookresearch/fairseq
Auf GitHub ansehenInvalid suffix of raw dataset when preprocessing without language
Open
#1.426 geöffnet am 25. Nov. 2019
bughelp wanted
Repository-Metriken
- Stars
- (29.107 Stars)
- PR-Merge-Metriken
- (Keine gemergten PRs in 30 T)
Beschreibung
When preprocessing using --dataset-impl raw and no source and target languages are specified, the datasets are stored under train.None-None due to this line:
https://github.com/pytorch/fairseq/blob/5349052aae4ec1350822c894fbb6be350dff61a0/preprocess.py#L218
Is this expected behavior or can we remove this suffix?