Is same base model used for MMS-ASR and MMS-TTS (like AudioPaLM)? · facebookresearch/fairseq#5221

(1 commento) (0 reazioni) (0 assegnatari)Python (6224 fork)batch import

enhancementhelp wantedneeds triage

Metriche repository

Star: (29.107 star)
Metriche merge PR: (Nessuna PR mergiata in 30 g)

Descrizione

Google introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation.

How about the MMS? I found fine-tuning MMS ASR based on pretrained base model mms-1b, but I can not find for TTS. Is the same base model mms-1b used for MMS-TTS? How can I fine-tuning or add new language for TTS?

Guida contributor

Direzione di ricerca: Il problema chiede se lo stesso modello base (mms 1b) viene utilizzato sia per ASR che per TTS in MMS. Per rispondere, rivedere il documento MMS (https://arxiv.org/abs/2305.13516) e il rilascio del modello nel repository. Controllare la documentazione di fairseq per gli script di fine tuning TTS. Cercare eventuali checkpoint o configurazioni TTS separati. Se non esistono, notare che il supporto TTS potrebbe non essere disponibile pubblicamente. Confrontare con l'architettura AudioPaLM menzionata nel problema.
Tech stack: pythonpytorch
Dominio: aimachine learning
Tipo issue: Ricerca
Difficoltà: 3
Tempo stimato: Mezza giornata
Stato attività: Attiva
Chiarezza: Chiara
Prerequisiti: PythonPyTorch
Adatta ai principianti: 45

Metriche repository

Descrizione

Guida contributor

Ricevi issue Easy fresche nella tua inbox.