Apple — Homebrew:
brew tap soniqo/speech https://github.com/soniqo/speech-swift && brew install speech
Android — Gradle:
implementation("audio.soniqo:speech:0.0.5")
Qwen3-ASR
Multilingual transcription, 4-bit/8-bit quantized, RTF ~0.06
MLXCoreMLParakeet TDT
NVIDIA FastConformer on Neural Engine, ~32x real-time
CoreMLONNXOmnilingual ASR
Meta wav2vec2 + CTC, 1,672 languages, 300M / 1B / 3B / 7B
CoreMLMLXForced Alignment
Word-level timestamps via CTC, 80ms resolution
MLXCoreMLVoice Activity Detection
Pyannote (offline) + Silero v5 (streaming, 23x real-time)
MLXCoreMLONNXWake-Word / KWS
KWS Zipformer (3.49M params) — on-device keyword spotting, 26x real-time
CoreMLSpeaker Diarization
Who spoke when — Pyannote pipeline or end-to-end Sortformer
MLXCoreMLSpeaker Embeddings
WeSpeaker ResNet34 — 256-dim vectors for speaker ID
MLXCoreMLSpeech Enhancement
DeepFilterNet3 — real-time noise suppression at 48kHz
CoreMLONNXSource Separation
Open-Unmix — split music into vocals, drums, bass, other. 4x real-time
MLXParakeet TDT v3
114 languages, INT8 quantized, TDT greedy decoder, RTF 0.12
ONNX RuntimeNNAPIKokoro-82M
50 voices, 7 languages, dictionary-based phonemizer, 24 kHz output
ONNX RuntimeSilero VAD v5
Streaming voice activity detection, 32ms chunks, sub-ms latency
ONNX RuntimeDeepFilterNet3
Real-time noise cancellation, STFT/ERB processing, RTF ~0.15
ONNX Runtime