facebookresearch/fairseq

Invalid suffix of raw dataset when preprocessing without language

Open

#1426 aperta il 25 nov 2019

Vedi su GitHub
 (2 commenti) (0 reazioni) (0 assegnatari)Python (6224 fork)batch import
bughelp wanted

Metriche repository

Star
 (29.107 star)
Metriche merge PR
 (Nessuna PR mergiata in 30 g)

Descrizione

When preprocessing using --dataset-impl raw and no source and target languages are specified, the datasets are stored under train.None-None due to this line:

https://github.com/pytorch/fairseq/blob/5349052aae4ec1350822c894fbb6be350dff61a0/preprocess.py#L218

Is this expected behavior or can we remove this suffix?

Guida contributor