Getting Started — Windows

Apple Android Linux Windows

speech-core builds natively on Windows x86_64 — the same C++17 engine that powers Android and Linux. Both inference backends work on Windows: LiteRT (tested in CI on every change) and ONNX Runtime. That covers streaming speech-to-text (Nemotron, Parakeet), voice activity detection, speaker diarization, speaker embeddings, and VoxCPM2 text-to-speech with voice cloning — all running locally. See the full model matrix on the Speech Core page.

Requirements

Windows 10/11, x86_64
Visual Studio 2022 or the Build Tools (MSVC C++ workload)
CMake 3.16+
Python 3.11+ (the LiteRT setup script extracts Google’s ai-edge-litert wheel)
Git, including Git Bash (the setup script is a shell script)

Build with the LiteRT backend

Run from an MSVC developer environment (for example, launch Git Bash from the x64 Native Tools Command Prompt) so dumpbin and lib are on PATH — the setup script uses them to generate the import library (libLiteRt.lib) from the runtime DLL, which Google’s wheel doesn’t ship:

git clone https://github.com/soniqo/speech-core.git
cd speech-core
./scripts/fetch_litert.sh "$PWD/litert"

cmake -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DSPEECH_CORE_WITH_LITERT=ON \
    -DLITERT_DIR=$PWD/litert
cmake --build build --parallel --config Release

Windows resolves DLLs via PATH, so prepend the litert directory before running anything that loads the backend:

export PATH="$PWD/litert:$PATH"
ctest --test-dir build --output-on-failure -C Release

Build with the ONNX Runtime backend

Alternatively (or additionally), enable the ONNX backend with ORT_DIR pointing at an extracted onnxruntime-win-x64 release. Optional NVIDIA CUDA / TensorRT acceleration is available via -DSPEECH_CORE_WITH_CUDA=ON — runtime-gated by SPEECH_CORE_ORT_PROVIDER with silent CPU fallback — and is the target for PersonaPlex 7B full-duplex speech-to-speech:

cmake -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DSPEECH_CORE_WITH_ONNX=ON \
    -DORT_DIR=C:/path/to/onnxruntime-win-x64
cmake --build build --parallel --config Release

Voice cloning out of the box

A voice-cloning CLI (speech_voxcpm2_clone) is built automatically whenever SPEECH_CORE_WITH_LITERT=ON — see examples/litert. Prefer a GUI? Speech Studio ships a Windows installer with the same VoxCPM2 engine.

Next steps

Speech Core — the full model matrix (ONNX / LiteRT columns) and quick-start API examples
docs/pipeline.md — the VoicePipeline voice-agent loop
huggingface.co/soniqo — converted model weights
Discord — questions and support