Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
リポジトリ
yl4579 のリポジトリ
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
Deep Neural Pitch Extractor for Voice Conversion and TTS Training
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Official Implementation of StyleTTS
Official Implementation of StyleTTS-VC
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Python libraries for Google Colaboratory
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)