On-device speech.
For real products.
Diarized transcription, zero-shot voice cloning, long-form speech synthesis — running on Apple Silicon, Android, and embedded Linux. No cloud APIs, no per-minute pricing, no data leaving the device.
brew install soniqo/tap/speechimplementation("audio.soniqo:speech:0.0.5")Local Speech AI on a MacBook
A four-minute open-source library tour: realtime transcription with Nemotron Streaming, local speech-to-speech with PersonaPlex, and 48 kHz voice cloning with VoxCPM2 — every demo runs on the laptop.
Which voice is real?
A 30-second blind comparison of a real voice, the same voice cloned locally by Speech Studio on a MacBook, and the same voice cloned by ElevenLabs in the cloud. Speech Studio is the open-source Mac app — Apache 2.0, github.com/soniqo/speech-studio.
Three on-device use-case groups.
Each group spans several sub-use-cases stitched from Soniqo components. Drop in your audio, get conversation, transcripts, or generated speech back — locally, in real time.
Voice Agents
Build voice-first interfaces — from full-duplex speech-to-speech to wake-word-driven compositional pipelines, all running locally.
Transcription
Turn audio into structured text — realtime streaming for live captions and dictation, batch high-accuracy for archives, diarized to name each speaker.
Speech Generation
Synthesize speech in any voice — clone a voice in seconds, narrate audiobooks for hours, or cast multi-speaker podcasts, fully offline.
Twenty-plus models. One stack.
The use-case pipelines above are stitched from these models. Pick a component to read its architecture, CLI, Swift API, and benchmarks. All run on Apple Silicon, most also on Android and Linux.
Speech-to-Text
Text-to-Speech
9 langs, zero-shot cloning, 4-bit → bf16
12 Hz codec LM, faster than real-time
48 kHz, 30 langs, voice design + cloning
50 voices, ~45 ms inference
90-min podcasts / audiobooks
9 langs, 5 baked voices, streaming
CosyVoice, Qwen3-TTS ICL, CAM++
