facebookresearch/fairseq

Is same base model used for MMS-ASR and MMS-TTS (like AudioPaLM)?

Open

#5221 aperta il 27 giu 2023

Vedi su GitHub
 (1 commento) (0 reazioni) (0 assegnatari)Python (6224 fork)batch import
enhancementhelp wantedneeds triage

Metriche repository

Star
 (29.107 star)
Metriche merge PR
 (Nessuna PR mergiata in 30 g)

Descrizione

Google introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation.

How about the MMS? I found fine-tuning MMS ASR based on pretrained base model mms-1b, but I can not find for TTS. Is the same base model mms-1b used for MMS-TTS? How can I fine-tuning or add new language for TTS?

Guida contributor