Is same base model used for MMS-ASR and MMS-TTS (like AudioPaLM)? · facebookresearch/fairseq#5221

(1 评论) (0 反应) (0 负责人)Python (6,224 fork)batch import

enhancementhelp wantedneeds triage

仓库指标

Star: (29,107 star)
PR 合并指标: (30 天内没有已合并 PR)

描述

Google introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation.

How about the MMS? I found fine-tuning MMS ASR based on pretrained base model mms-1b, but I can not find for TTS. Is the same base model mms-1b used for MMS-TTS? How can I fine-tuning or add new language for TTS?

贡献者指南

研究方向: 该问题询问 MMS 中 ASR 和 TTS 是否使用相同的基础模型（mms 1b）。要回答此问题，请查阅 MMS 论文（https://arxiv.org/abs/2305.13516）和仓库中的模型发布。检查 fairseq 文档中关于 TTS 微调脚本的部分。寻找是否存在独立的 TTS 检查点或配置文件。如果不存在，请注意 TTS 支持可能尚未公开提供。与问题中提到的 AudioPaLM 架构进行比较。
技术栈: pythonpytorch
领域: aimachine learning
议题类型: 调研
难度: 3
预计时间: 半天
活动状态: 活跃
清晰度: 清晰
前置要求: PythonPyTorch
新手友好度: 45

仓库指标

描述

贡献者指南

每天在邮箱收到新鲜 Easy issues。