facebookresearch/fairseq

Invalid suffix of raw dataset when preprocessing without language

Open

#1.426 geöffnet am 25. Nov. 2019

Auf GitHub ansehen
 (2 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (6.224 Forks)batch import
bughelp wanted

Repository-Metriken

Stars
 (29.107 Stars)
PR-Merge-Metriken
 (Keine gemergten PRs in 30 T)

Beschreibung

When preprocessing using --dataset-impl raw and no source and target languages are specified, the datasets are stored under train.None-None due to this line:

https://github.com/pytorch/fairseq/blob/5349052aae4ec1350822c894fbb6be350dff61a0/preprocess.py#L218

Is this expected behavior or can we remove this suffix?

Contributor Guide