Revolutionary Multilingual Speech Generation with Creative Voice Design and True-to-Life Cloning
Input text in any of 30 supported languages and synthesize directly, no language tag needed
Create brand-new voices from natural-language descriptions alone - gender, age, tone, emotion
Clone voices from short reference clips with optional style guidance for emotion and expression
Reproduce every vocal nuance with reference audio and transcript for seamless continuation
Studio-quality output with built-in super-resolution, no external upsampler needed
RTF as low as ~0.13 on NVIDIA RTX 4090 with Nano-vLLM and vLLM-Omni support
Model Parameters
Hours Training Data
Supported Languages
Real-Time Factor
# Install VoxCPM
pip install voxcpm
# Basic Text-to-Speech
from voxcpm import VoxCPM
import soundfile as sf
model = VoxCPM.from_pretrained("openbmb/VoxCPM2")
wav = model.generate("Hello from VoxCPM2!")
sf.write("output.wav", wav, model.tts_model.sample_rate)
| Feature | VoxCPM2 | VoxCPM1.5 | VoxCPM-0.5B |
|---|---|---|---|
| Status | ✅ Latest | Stable | Legacy |
| Parameters | 2B | 0.6B | 0.5B |
| Audio Quality | 48kHz | 44.1kHz | 16kHz |
| Languages | 30 | 2 | 2 |
| Voice Design | ✅ | — | — |
| Controllable Cloning | ✅ | — | — |
| Commercial Use | ✅ Apache-2.0 | ✅ Apache-2.0 | ✅ Apache-2.0 |