VoxCPM2: Tokenizer-Free TTS

Revolutionary Multilingual Speech Generation with Creative Voice Design and True-to-Life Cloning

View on GitHub Try Live Demo Read Docs

✨ Powerful Features

🌍

30-Language Multilingual

Input text in any of 30 supported languages and synthesize directly, no language tag needed

🎨

Voice Design

Create brand-new voices from natural-language descriptions alone - gender, age, tone, emotion

🎛️

Controllable Cloning

Clone voices from short reference clips with optional style guidance for emotion and expression

🎙️

Ultimate Cloning

Reproduce every vocal nuance with reference audio and transcript for seamless continuation

🔊

48kHz High-Quality Audio

Studio-quality output with built-in super-resolution, no external upsampler needed

⚡

Real-Time Streaming

RTF as low as ~0.13 on NVIDIA RTX 4090 with Nano-vLLM and vLLM-Omni support

📊 Exceptional Performance

2B

Model Parameters

2M+

Hours Training Data

30

Supported Languages

0.13

Real-Time Factor

🚀 Quick Start

                
# Install VoxCPM

pip install voxcpm

# Basic Text-to-Speech

from voxcpm import VoxCPM

import soundfile as sf

model = VoxCPM.from_pretrained("openbmb/VoxCPM2")

wav = model.generate("Hello from VoxCPM2!")

sf.write("output.wav", wav, model.tts_model.sample_rate)

Full Documentation Installation Guide

📦 Models & Versions

Feature	VoxCPM2	VoxCPM1.5	VoxCPM-0.5B
Status	✅ Latest	Stable	Legacy
Parameters	2B	0.6B	0.5B
Audio Quality	48kHz	44.1kHz	16kHz
Languages	30	2	2
Voice Design	✅	—	—
Controllable Cloning	✅	—	—
Commercial Use	✅ Apache-2.0	✅ Apache-2.0	✅ Apache-2.0