Hours of audio.
One consistent voice.
Audiobook chapters, podcast episodes, training narration — rendered on-device on Apple Silicon, Android, or embedded Linux. Automatic segmentation keeps the voice stable across hours; multi-speaker mode handles dialogue between named characters.
Five long-form shapes.
Each engine has a sweet spot. Audiobooks lean on CosyVoice 3 for narrator fidelity. Multi-speaker podcasts lean on VibeVoice for episode-length context. Real-time / streaming uses the smaller VibeVoice Realtime.
Full-chapter passes with one consistent narrator voice. Automatic sentence-level segmentation, no manual stitching.
Inline speaker tags drive turn-taking. Cast two to four voices for an episode-length scripted show.
Generate as the listener listens. VibeVoice Realtime keeps the latency low enough for live conversations.
Newsletter-length articles, blog posts, internal docs — rendered as natural narration without screen-reader pacing.
Long-form content access for users with print or visual impairments, fully offline.
