Use case · Audiobooks · Podcasts

Hours of audio.
One consistent voice.

Audiobook chapters, podcast episodes, training narration — rendered on-device on Apple Silicon, Android, Windows, or embedded Linux. Automatic segmentation keeps the voice stable across hours; multi-speaker mode handles dialogue between named characters.

Get started CosyVoice 3 guide VibeVoice guide

What you can build

Five long-form shapes.

Each engine has a sweet spot. Audiobooks lean on CosyVoice 3 for narrator fidelity. Multi-speaker podcasts lean on VibeVoice for episode-length context. Real-time / streaming uses the smaller VibeVoice Realtime.

Audiobook chapters

Full-chapter passes with one consistent narrator voice. Automatic sentence-level segmentation, no manual stitching.

Multi-speaker podcasts

Inline speaker tags drive turn-taking. Cast two to four voices for an episode-length scripted show.

Live podcast / streaming

Generate as the listener listens. VibeVoice Realtime keeps the latency low enough for live conversations.

Article TTS

Newsletter-length articles, blog posts, internal docs — rendered as natural narration without screen-reader pacing.

Accessibility narration

Long-form content access for users with print or visual impairments, fully offline.

Deeper reading

Component guides.

Hours of audio.One consistent voice.

Five long-form shapes.

Component guides.

Hours of audio.
One consistent voice.