[TTS] Try to train a universial GAN Vocoder using CSMSC + LJSpeech + AISHELL3 + VCTK · PaddlePaddle/PaddleSpeech#2803

(0 commenti) (1 reazione) (1 assegnatario)Python (1702 fork)batch import

T2Sfeature requestgood first issue

Metriche repository

Star: (9453 star)
Metriche merge PR: (Nessuna PR mergiata in 30 g)

Descrizione

An universial GAN Vocoder may works well for all AMs of different datasets, for example, CSMSC is a single female dataset, may generate bad wavs for mels of male speakers, cause different genders have different distribution of speech features.

Please try to train a universial GAN Vocoder using CSMSC + LJSpeech + AISHELL3 + VCTK + some other TTS datasets (if you want) with the config of CSMSC (24kHz).

LJSpeech is 22.05kHz, but you don't need to resample it yourself, cause we will resample the wavs to the sample rate setted in config file in preprocess stage ~

Guida contributor

Direzione di ricerca: Questa issue richiede di addestrare un vocoder GAN universale utilizzando più dataset (CSMSC, LJSpeech, AISHELL3, VCTK) con la configurazione CSMSC (24kHz). L'assegnatario probabilmente sta lavorando su questo. Per iniziare, rivedere l'implementazione esistente del vocoder GAN in PaddleSpeech, come il file di configurazione in examples/csmsc/tts3/conf/default.yaml. Prepararsi a gestire diverse frequenze di campionamento (LJSpeech è 22.05kHz ma il preprocesso effettua un ricampionamento). Combinare i dataset e modificare la pipeline di addestramento per supportare l'addestramento multi dataset. Verificare se esistono PR aperti o branch per questo compito.
Tech stack: python
Dominio: machine learningai
Tipo issue: Ricerca
Difficoltà: 4
Tempo stimato: Oltre 1 settimana
Stato attività: Bloccata
Chiarezza: Chiara
Prerequisiti: Knowledge of TTSGAN vocoder conceptsPythonPaddleSpeech
Adatta ai principianti: 20

Metriche repository

Descrizione

Guida contributor

Ricevi issue Easy fresche nella tua inbox.