facebookresearch/fairseq

Invalid suffix of raw dataset when preprocessing without language

Open

#1 426 ouverte le 25 nov. 2019

Voir sur GitHub
 (2 commentaires) (0 réactions) (0 assignés)Python (6 224 forks)batch import
bughelp wanted

Métriques du dépôt

Stars
 (29 107 stars)
Métriques de merge PR
 (Aucune PR mergée en 30 j)

Description

When preprocessing using --dataset-impl raw and no source and target languages are specified, the datasets are stored under train.None-None due to this line:

https://github.com/pytorch/fairseq/blob/5349052aae4ec1350822c894fbb6be350dff61a0/preprocess.py#L218

Is this expected behavior or can we remove this suffix?

Guide contributeur