facebookresearch/fairseq

Invalid suffix of raw dataset when preprocessing without language

Open

#1.426 aberto em 25 de nov. de 2019

Ver no GitHub
 (2 comments) (0 reactions) (0 assignees)Python (6.224 forks)batch import
bughelp wanted

Métricas do repositório

Stars
 (29.107 stars)
Métricas de merge de PR
 (Nenhuma PRs mesclada em 30d)

Description

When preprocessing using --dataset-impl raw and no source and target languages are specified, the datasets are stored under train.None-None due to this line:

https://github.com/pytorch/fairseq/blob/5349052aae4ec1350822c894fbb6be350dff61a0/preprocess.py#L218

Is this expected behavior or can we remove this suffix?

Guia do colaborador