Soniqo’s Blog

Engineering notes.
On-device speech, the long form.

Architecture deep-dives, model walkthroughs, and lessons from shipping speech models that run on your laptop.

July 7, 2026

Running a voice agent on-device: one pipeline, three memory budgets.

The same VAD → STT → LLM → TTS loop on iPhone, Galaxy S23, and Mac — measured on-device memory: ~1.2 GB on iPhone, ~1.5 GB on the S23, and the whole desktop loop (Gemma 4 brain included) under ~4 GB.

Read post

Benchmarks

July 2, 2026

Voice cloning models, measured across five languages.

Ten FLEURS reference/target pairs per language, generated samples, speaker similarity, WER/CER, UTMOS, and speed across English, German, Arabic, Spanish, and Chinese.

Read post

Voice cloning

May 17, 2026

Cloning a voice at 48 kHz with VoxCPM2

A new TTS model lands in Soniqo. Four things you can build with it, three ways it lets you clone a voice, and a friendly tour of how the model works inside — with the architecture diagram and the original paper.

Read post

From Ivan’s Blog

Related deep-dives.

Longer-form writeups from the personal blog on the engines that sit behind Soniqo — backends, ASR, speech-to-speech, diarization.

Apple Silicon

Ivan’s Blog

MLX vs CoreML on Apple Silicon

A practical guide to picking the right backend on Apple Silicon — when MLX wins, when CoreML wins, and why the choice matters for speech workloads.

ASR

Ivan’s Blog

Qwen3-ASR Swift — on-device ASR for Apple Silicon

Porting Qwen3-ASR to native Swift with MLX: architecture choices, the encoder/decoder split, and benchmark numbers on M-series hardware.

Speech-to-speech

Ivan’s Blog

NVIDIA PersonaPlex 7B on Apple Silicon

Getting NVIDIA’s 7B full-duplex speech-to-speech model running natively on a Mac in Swift with MLX — what it took and how it sounds.

Transcription

Ivan’s Blog

We beat Whisper Large v3 with a 600M model

Why a 600M-parameter model running entirely on your Mac can outperform Whisper Large v3 on accuracy, latency, and the everyday-laptop test.

Diarization

Ivan’s Blog

Speaker diarization and VAD on Apple Silicon

Pyannote diarization and Silero VAD ported to native Swift with MLX — the pipeline, the gotchas, and why both belong on-device.

All posts on Ivan’s Blog

Engineering notes.On-device speech, the long form.

Running a voice agent on-device: one pipeline, three memory budgets.

Voice cloning models, measured across five languages.

Cloning a voice at 48 kHz with VoxCPM2

Related deep-dives.

Engineering notes.
On-device speech, the long form.