23,760
🍴 2,734
🐛 109
👀 23,760

VoxCPM2: Tokenizer-Free TTS

Revolutionary Multilingual Speech Generation with Creative Voice Design and True-to-Life Cloning

✨ Powerful Features

🌍

30-Language Multilingual

Input text in any of 30 supported languages and synthesize directly, no language tag needed

🎨

Voice Design

Create brand-new voices from natural-language descriptions alone - gender, age, tone, emotion

🎛️

Controllable Cloning

Clone voices from short reference clips with optional style guidance for emotion and expression

🎙️

Ultimate Cloning

Reproduce every vocal nuance with reference audio and transcript for seamless continuation

🔊

48kHz High-Quality Audio

Studio-quality output with built-in super-resolution, no external upsampler needed

Real-Time Streaming

RTF as low as ~0.13 on NVIDIA RTX 4090 with Nano-vLLM and vLLM-Omni support

📊 Exceptional Performance

2B

Model Parameters

2M+

Hours Training Data

30

Supported Languages

0.13

Real-Time Factor

🚀 Quick Start

# Install VoxCPM
pip install voxcpm

# Basic Text-to-Speech
from voxcpm import VoxCPM
import soundfile as sf

model = VoxCPM.from_pretrained("openbmb/VoxCPM2")
wav = model.generate("Hello from VoxCPM2!")
sf.write("output.wav", wav, model.tts_model.sample_rate)

📦 Models & Versions

Feature VoxCPM2 VoxCPM1.5 VoxCPM-0.5B
Status ✅ Latest Stable Legacy
Parameters 2B 0.6B 0.5B
Audio Quality 48kHz 44.1kHz 16kHz
Languages 30 2 2
Voice Design
Controllable Cloning
Commercial Use ✅ Apache-2.0 ✅ Apache-2.0 ✅ Apache-2.0

🌟 Join Our Community