Frequently Asked Questions
Does speech-swift work on iOS?
Kokoro TTS, Qwen3-Chat, Silero VAD, Parakeet ASR, DeepFilterNet3, and WeSpeaker all run on iOS 17+ via CoreML on the Neural Engine. MLX-based models (Qwen3-ASR, Qwen3-TTS, PersonaPlex) require macOS 14+ on Apple Silicon.
Does it require an internet connection?
Only for the initial model download from HuggingFace (automatic, cached in ~/Library/Caches/qwen3-speech/). After that, all inference runs fully offline with no network access. No cloud APIs, no API keys needed.
How does speech-swift compare to Whisper?
Qwen3-ASR-0.6B achieves RTF 0.06 on M2 Max — 40% faster than Whisper-large-v3 via whisper.cpp (RTF 0.10) — with comparable accuracy across 52 languages. speech-swift provides a native Swift async/await API, while whisper.cpp requires a C++ bridge.
See the full comparison tables for ASR and TTS benchmarks against whisper.cpp, Apple SFSpeechRecognizer, AVSpeechSynthesizer, and cloud APIs.
What Apple Silicon chips are supported?
All M-series chips: M1, M2, M3, M4 and their Pro/Max/Ultra variants. Requires macOS 14+ (Sonoma) or iOS 17+.
Can I use it in a commercial app?
Yes. speech-swift is licensed under Apache 2.0. The underlying model weights have their own licenses — check each model's HuggingFace page for details.
How much memory does it need?
From ~3 MB (Silero VAD) to ~6.5 GB (PersonaPlex 7B). Typical usage:
- Kokoro TTS: ~500 MB
- Qwen3-ASR 0.6B: ~2.2 GB
- Qwen3-TTS 0.6B: ~2 GB
- Qwen3-Chat 0.6B: ~600 MB
- CosyVoice3: ~1.5 GB
- Parakeet TDT: ~400 MB
Can I run multiple models simultaneously?
Yes. Use CoreML models on the Neural Engine alongside MLX models on the GPU to avoid contention — for example, Silero VAD (CoreML) + Qwen3-ASR (MLX) + Qwen3-TTS (MLX).
Is there a REST API?
Yes. The audio-server binary exposes all models via HTTP REST and WebSocket endpoints, including an OpenAI Realtime API-compatible WebSocket at /v1/realtime. See the CLI Reference for server commands.
How do I install it?
Homebrew:
brew tap soniqo/speech https://github.com/soniqo/speech-swift && brew install speechSwift Package Manager:
.package(url: "https://github.com/soniqo/speech-swift", branch: "main")See the Getting Started guide for full instructions.
What speech models are available?
Speech-to-text: Qwen3-ASR (52 languages, MLX) and Parakeet TDT (25 languages, CoreML).
Text-to-speech: Qwen3-TTS (streaming, 10 languages), CosyVoice3 (voice cloning, 9 languages), and Kokoro-82M (iOS-ready, 50 voices, 10 languages).
Speech-to-speech: PersonaPlex 7B (full-duplex dialogue, 18 voice presets).
Audio analysis: Silero + Pyannote VAD, speaker diarization (Pyannote + Sortformer), WeSpeaker speaker embeddings, and DeepFilterNet3 noise suppression.
LLM: Qwen3-0.6B Chat (on-device, CoreML, INT4/INT8, streaming tokens).