Use case · Content creation

Clone a voice in 30 seconds.
Synthesise for hours.

Zero-shot voice cloning on Apple Silicon. IndexTTS2 adds native MLX cloning with emotion, tempo, and pause controls; CosyVoice 3 covers transcript-conditioned multilingual cloning; Chatterbox Flash adds a CoreML T3 + S3Gen path with an MLX reference encoder. No fine-tuning, no per-character pricing, no audio ever leaving the device.

Get started Voice cloning guide HF · IndexTTS2

What you can build

Five voice-cloning recipes.

Each recipe picks the synthesis engine that fits the product: IndexTTS2 for reference-only MLX cloning with style controls, CosyVoice 3 for transcript-conditioned multilingual zero-shot, Chatterbox Flash for CoreML Flash T3 cloning, Qwen3-TTS ICL when you only have audio, plus speaker embeddings and denoising around them.

Audiobook narration

Clone the author or a chosen voice once, render hours of consistent narration.

Dubbing & localisation

Keep a presenter's voice across translated tracks, in nine languages.

Character voices

Two-to-four custom voices per scene via inline speaker tags.

Personal-voice TTS

Restore a familiar voice for users who can no longer speak naturally.

Brand voice

A single consistent narrator across an entire product line.

Deeper reading