Source code of APNet2, a vocoder
仓库
w-okada 的仓库
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
OneShot Learning-based hotword detection.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active project.
AIを使ったリアルタイムボイスチェンジャー(client)
AIを使ったリアルタイムボイスチェンジャー(Trainer)
WinRTのGraphicsCaptureAPIでキャプチャしたウィンドウを仮想カメラとして映すサンプル
A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML.
Demo showcasing ~real-time Latent Consistency Model pipeline with Diffusers and a MJPEG stream server
Multilingual Voice Understanding Model
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
inverse kinematics for three.js