huggingface/diffusers

F5-TTS Integration

Open

#10.043 geöffnet am 28. Nov. 2024

Auf GitHub ansehen
 (11 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (4.562 Forks)batch import
contributions-welcomehelp wanted

Repository-Metriken

Stars
 (22.190 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 13T 1h) (96 gemergte PRs in 30 T)

Beschreibung

Model/Pipeline/Scheduler description

F5-TTS is a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT). It has excellent voice cloning capabilities, and audio generation is of quite high quality.

Open source status

  • The model implementation is available.
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Paper - https://arxiv.org/abs/2410.06885 Code - https://github.com/SWivid/F5-TTS?tab=readme-ov-file Weights - https://huggingface.co/SWivid/F5-TTS

Author - @SWivid

Contributor Guide