Is same base model used for MMS-ASR and MMS-TTS (like AudioPaLM)? · facebookresearch/fairseq#5221

(1 comment) (0 reactions) (0 assignees)Python (6,224 forks)batch import

enhancementhelp wantedneeds triage

Repository metrics

Stars: (29,107 stars)
PR merge metrics: (No merged PRs in 30d)

Description

Google introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation.

How about the MMS? I found fine-tuning MMS ASR based on pretrained base model mms-1b, but I can not find for TTS. Is the same base model mms-1b used for MMS-TTS? How can I fine-tuning or add new language for TTS?

Contributor guide

Research direction: Look into the MMS documentation and fairseq codebase to find if the same base model (mms 1b) is used for TTS. Check the model registry and fine tuning scripts for TTS, especially for adding new languages.
Tech stack: pythonpytorch
Domain: aimachine learning
Issue type: Research
Difficulty: 3
Estimated time: Half day
Activity status: Active
Clarity: Clear
Prerequisites: PythonPyTorch
Newbie friendliness: 45

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.