F5-TTS Integration · huggingface/diffusers#10043

(11 comments) (0 reactions) (0 assignees)Python (4,562 forks)batch import

contributions-welcomehelp wanted

Repository metrics

Stars: (22,190 stars)
PR merge metrics: (Avg merge 14d 10h) (101 merged PRs in 30d)

Description

Model/Pipeline/Scheduler description

F5-TTS is a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT). It has excellent voice cloning capabilities, and audio generation is of quite high quality.

Open source status

The model implementation is available.
The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Paper - https://arxiv.org/abs/2410.06885 Code - https://github.com/SWivid/F5-TTS?tab=readme-ov-file Weights - https://huggingface.co/SWivid/F5-TTS

Author - @SWivid

Contributor guide

Research direction: Study the F5 TTS paper and code repository, understand the diffusers pipeline architecture, and identify the components needed to integrate a new TTS pipeline.
Tech stack: pythonpytorch
Domain: machine learningai
Issue type: Feature
Difficulty: 4
Estimated time: 1-2 days
Activity status: Active
Clarity: Mostly clear
Prerequisites: PythonPyTorchGit
Newbie friendliness: 60