Clone a voice in 30 seconds.
Synthesise for hours.
Zero-shot voice cloning on Apple Silicon. Provide a 5–30 second reference clip and its transcript; CosyVoice 3 generates speech in that voice across nine languages, fully offline. No fine-tuning, no per-character pricing, no audio ever leaving the device.
Five voice-cloning recipes.
Each recipe centres on CosyVoice 3 for the actual synthesis but mixes in different pre/post components — speaker embeddings for matching, denoising for clean reference, Qwen3-TTS ICL when you only have audio.
Clone the author or a chosen voice once, render hours of consistent narration.
Keep a presenter's voice across translated tracks, in nine languages.
Two-to-four custom voices per scene via inline speaker tags.
Restore a familiar voice for users who can no longer speak naturally.
A single consistent narrator across an entire product line.
