Engineering notes.
On-device speech, the long form.
Architecture deep-dives, model walkthroughs, and lessons from shipping speech models that run on your laptop. English only for now.
Related deep-dives.
Longer-form writeups from the personal blog on the engines that sit behind Soniqo — backends, ASR, speech-to-speech, diarization.
A practical guide to picking the right backend on Apple Silicon — when MLX wins, when CoreML wins, and why the choice matters for speech workloads.
Porting Qwen3-ASR to native Swift with MLX: architecture choices, the encoder/decoder split, and benchmark numbers on M-series hardware.
Getting NVIDIA’s 7B full-duplex speech-to-speech model running natively on a Mac in Swift with MLX — what it took and how it sounds.
Why a 600M-parameter model running entirely on your Mac can outperform Whisper Large v3 on accuracy, latency, and the everyday-laptop test.
Pyannote diarization and Silero VAD ported to native Swift with MLX — the pipeline, the gotchas, and why both belong on-device.
