facebookresearch/fairseq

Is same base model used for MMS-ASR and MMS-TTS (like AudioPaLM)?

Open

#5 221 ouverte le 27 juin 2023

Voir sur GitHub
 (1 commentaire) (0 réactions) (0 assignés)Python (6 224 forks)batch import
enhancementhelp wantedneeds triage

Métriques du dépôt

Stars
 (29 107 stars)
Métriques de merge PR
 (Aucune PR mergée en 30 j)

Description

Google introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation.

How about the MMS? I found fine-tuning MMS ASR based on pretrained base model mms-1b, but I can not find for TTS. Is the same base model mms-1b used for MMS-TTS? How can I fine-tuning or add new language for TTS?

Guide contributeur