Question 1

Does speech-swift work on iOS?

Accepted Answer

Kokoro TTS, Qwen3.5-Chat, Silero VAD, Parakeet ASR, DeepFilterNet3, and WeSpeaker all run on iOS 18+ via CoreML on the Neural Engine. MLX-based models (Qwen3-ASR, Qwen3-TTS, Qwen3.5-Chat MLX, PersonaPlex) require macOS 15+ on Apple Silicon.

Question 2

Does speech-swift require an internet connection?

Accepted Answer

Only for the initial model download from HuggingFace. After that, all inference runs fully offline with no network access. No cloud APIs, no API keys needed.

Question 3

How does speech-swift compare to Whisper?

Accepted Answer

Qwen3-ASR-0.6B achieves RTF 0.012 on M5 Pro — ~7× faster than WhisperKit Large-v3 Turbo (RTF 0.084) and 40% faster than Whisper-large-v3 via whisper.cpp (RTF 0.10) — with comparable accuracy across 52 languages. speech-swift provides a native Swift async/await API, while whisper.cpp requires a C++ bridge.

Question 4

What Apple Silicon chips are supported?

Accepted Answer

All M-series chips: M1, M2, M3, M4 and their Pro/Max/Ultra variants. Requires macOS 15+ (Sequoia) or iOS 18+.

Question 5

Why does it require macOS 15 / iOS 18?

Accepted Answer

The floor comes from MLState — Apple's persistent ANE state API, introduced in macOS 15 and iOS 18. The CoreML pipelines (Qwen3-ASR, Qwen3-Chat, Qwen3-TTS) use MLState to keep KV caches resident on the Neural Engine across token steps, instead of shuttling them in and out each step. This cut per-token CoreML latency by 30–50% versus the earlier stateless approach.

Question 6

Can I use speech-swift in a commercial app?

Accepted Answer

Yes. speech-swift is licensed under Apache 2.0. The underlying model weights have their own licenses — check each model's HuggingFace page for details.

Question 7

How much memory does speech-swift need?

Accepted Answer

From ~3 MB (Silero VAD) to ~6.5 GB (PersonaPlex 7B). Kokoro TTS uses ~200 MB, Qwen3-ASR ~1.3 GB peak, Qwen3-TTS ~2 GB. Multiple models can run simultaneously using CoreML on the Neural Engine alongside MLX on the GPU.

Question 8

Can I run multiple models simultaneously?

Accepted Answer

Yes. Use CoreML models on the Neural Engine alongside MLX models on the GPU to avoid contention — for example, Silero VAD (CoreML) + Qwen3-ASR (MLX) + Qwen3-TTS (MLX).

Question 9

Is there a REST API?

Accepted Answer

Yes. The speech-server binary exposes all models via HTTP REST and WebSocket endpoints, including an OpenAI Realtime API-compatible WebSocket at /v1/realtime.

Question 10

How do I install speech-swift?

Accepted Answer

Via Homebrew: brew install speech. Or add as a Swift Package Manager dependency: .package(url: "https://github.com/soniqo/speech-swift", branch: "main").

Question 11

What speech models are available?

Accepted Answer

Speech-to-text: Qwen3-ASR (52 languages) and Parakeet TDT (25 languages). Text-to-speech: Qwen3-TTS (streaming, 10 languages), CosyVoice3 (voice cloning, 9 languages), and Kokoro-82M (iOS-ready, 50 voices). Speech-to-speech: PersonaPlex 7B (full-duplex). Plus: Silero/Pyannote VAD, speaker diarization (Pyannote + Sortformer), WeSpeaker speaker embeddings, and DeepFilterNet3 noise suppression.

Question 12

Does Soniqo work on Android?

Accepted Answer

Yes. The speech-android SDK provides a Kotlin API with ONNX Runtime and NNAPI acceleration. Supports arm64-v8a on Android 8+ (API 26). Models auto-download from HuggingFace on first use.

Question 13

Does Soniqo work on Linux?

Accepted Answer

Yes. The speech-core project includes a C API for embedded and automotive Linux at examples/linux. Models run through the ONNX Runtime or LiteRT backends, with optional QNN acceleration for Qualcomm hardware. Supports ARM64 and x86_64.

Question 14

Does Soniqo work on Windows?

Accepted Answer

Yes. Speech Core builds on Windows x86_64 with both inference backends — ONNX Runtime and LiteRT — covering streaming speech-to-text, voice activity detection, speaker diarization, and VoxCPM2 text-to-speech. Speech Studio also ships a Windows installer for local voice cloning.

Question 15

Can I share models between platforms?

Accepted Answer

The core models (Parakeet, Kokoro, Silero, DeepFilter) use ONNX format on both Android, Linux and Windows. Apple uses CoreML/MLX formats. Same underlying weights, different export formats.

Frequently Asked Questions