[TTS] Try to train a universial GAN Vocoder using CSMSC + LJSpeech + AISHELL3 + VCTK · PaddlePaddle/PaddleSpeech#2803

(0 comments) (1 reaction) (1 assignee)Python (9,453 stars) (1,702 forks)batch import

T2Sfeature requestgood first issue

説明

An universial GAN Vocoder may works well for all AMs of different datasets, for example, CSMSC is a single female dataset, may generate bad wavs for mels of male speakers, cause different genders have different distribution of speech features.

Please try to train a universial GAN Vocoder using CSMSC + LJSpeech + AISHELL3 + VCTK + some other TTS datasets (if you want) with the config of CSMSC (24kHz).

LJSpeech is 22.05kHz, but you don't need to resample it yourself, cause we will resample the wavs to the sample rate setted in config file in preprocess stage ~

コントリビューターガイド

技術スタック: python
領域: machine learningai
Issue 種別: research
難度: 4
推定時間: over 1 week
活動状況: blocked
明確さ: clear
前提条件: Knowledge of TTSGAN vocoder conceptsPythonPaddleSpeech
初心者向け度: 20
調査方針: This issue asks to train a universal GAN vocoder using multiple datasets (CSMSC, LJSpeech, AISHELL3, VCTK) with the CSMSC config (24kHz). The assignee is likely working on this. To start, review the existing GAN vocoder implementation in PaddleSpeech, such as the config file under examples/csmsc/tts3/conf/default.yaml. Prepare to handle different sample rates (LJSpeech is 22.05kHz but preprocessing resamples). Combine datasets and modify the training pipeline to support multi dataset training. Check if any open PRs or branches exist for this task.