Speech Core

Open-source C++17 speech engine for voice agents — voice activity detection, batch and real-time streaming speech-to-text, speaker diarization, and text-to-speech, all running on-device on Linux, Windows, and Android. Apache 2.0.

What it is

Speech Core is a small orchestration core — state machine, turn detection, interruption handling, audio utilities, with zero ML dependencies — plus a set of abstract interfaces for speech models. Inference runs locally on CPU; audio never leaves the machine, and there is no Python at inference time. Model inference is opt-in through two interchangeable backends you can enable independently, or you can bring your own implementations of the interfaces.

Platforms & backends

BackendPlatformsHardware acceleration
ONNX Runtime (SPEECH_CORE_WITH_ONNX)Linux, macOS, Windows, AndroidNNAPI on Android, QNN on Qualcomm Linux, optional NVIDIA CUDA / TensorRT (-DSPEECH_CORE_WITH_CUDA=ON)
LiteRT (SPEECH_CORE_WITH_LITERT)Linux x86_64, Windows x86_64, Android, macOS arm64CPU today

Enable either backend, both, or neither — the orchestration core builds with no ML runtime at all.

Supported models

ModelTaskONNXLiteRT
Silero VAD v5Voice activity detection
Parakeet TDT v3 (0.6B)Speech-to-text (114 languages)
Nemotron Speech Streaming (0.6B)Streaming speech-to-text (English)
Nemotron-3.5 ASR Streaming Multilingual (0.6B)Streaming speech-to-text (multilingual, prompt-conditioned)
Omnilingual ASR CTC (300M)Speech-to-text (multilingual)
Pyannote Segmentation 3.0Diarization (segmentation)
WeSpeaker ResNet34-LMSpeaker embedding
VoxCPM2 (2B)Text-to-speech (48 kHz, voice cloning)
Kokoro 82MText-to-speech
DeepFilterNet3Speech enhancement
PersonaPlex 7BFull-duplex speech-to-speech (CUDA)

Quick start

Build the core plus the LiteRT backend (the runtime library is extracted from the ai-edge-litert wheel — no TensorFlow build):

git clone https://github.com/soniqo/speech-core && cd speech-core
scripts/fetch_litert.sh build/litert
cmake -B build -DCMAKE_BUILD_TYPE=Release \
    -DSPEECH_CORE_WITH_LITERT=ON -DLITERT_DIR=$PWD/build/litert
cmake --build build

Then link the targets you need:

target_link_libraries(my_app PRIVATE speech_core)                            # orchestration only
target_link_libraries(my_app PRIVATE speech_core speech_core_models)         # + ONNX models
target_link_libraries(my_app PRIVATE speech_core speech_core_models_litert)  # + LiteRT models

Transcribing an audio buffer is a few lines:

#include <speech_core/models/litert_parakeet_stt.h>

speech_core::LiteRTParakeetStt stt(
    "parakeet-encoder.tflite", "parakeet-decoder-joint.tflite", "vocab.json");

auto r = stt.transcribe(audio, n_samples, 16000);   // r.text / r.language / r.confidence
Embedded & automotive Linux

A reference Linux build — libspeech.so with a small C ABI, an ALSA demo CLI, and transcribe/synthesize/phonemize tools — lives at examples/linux. It targets embedded ARM64 (Yocto, Qualcomm SA8295P / SA8255P) and any Linux dev box. Setup steps are in the Linux getting-started guide.

Building for Android or Apple?

On Android, use speech-android — a Kotlin SDK that packages Speech Core behind a JNI bridge (implementation("audio.soniqo:speech:0.0.9")). On macOS and iOS, use speech-swift, which runs the models on CoreML, MLX, and the Apple Neural Engine.

Documentation

Feedback

Open an issue at github.com/soniqo/speech-core/issues, or join the Discord.