Voice in.
Voice out.
Three shapes of voice-first interfaces — a single full-duplex speech-to-speech model, a compositional wake → VAD → ASR → LLM → TTS pipeline you fully control, and wake-word activation for hands-free entry. All on-device, no cloud APIs, no audio leaving the device.
Pick the shape that fits your product.
Drop-in dialogue model, compositional pipeline with per-stage control, or a thin wake-word trigger. Each runs entirely on-device.
A single model takes mic input and produces voice output. Drop-in OpenAI-Realtime-compatible WebSocket; minimal code, opaque internals.
Wake-word → VAD → streaming ASR → on-device LLM → TTS. Per-stage control, transcript visibility, swap engines freely. Build your own Siri.
Hands-free trigger for any voice flow. Custom keywords with per-phrase thresholds, sub-5 MB on-device, 26× real-time.
