datahelp wanted
Metriche repository
- Star
- (5587 star)
- Metriche merge PR
- (Nessuna PR mergiata in 30 g)
Descrizione
We should prepare datasets for All WMT'17 language pairs. This is also a change to try out google/sentencepiece as a preprocessor.
Each dataset should come in different configurations, i.e. different vocabulary sizes and also have a character-level version.
Together with the raw data files we also need the script that was used for the process.