Getting Started — Windows
speech-core builds natively on Windows x86_64 — the same C++17 engine that powers Android and Linux. Both inference backends work on Windows: LiteRT (tested in CI on every change) and ONNX Runtime. That covers streaming speech-to-text (Nemotron, Parakeet), voice activity detection, speaker diarization, speaker embeddings, and VoxCPM2 text-to-speech with voice cloning — all running locally. See the full model matrix on the Speech Core page.
Requirements
- Windows 10/11, x86_64
- Visual Studio 2022 or the Build Tools (MSVC C++ workload)
- CMake 3.16+
- Python 3.11+ (the LiteRT setup script extracts Google’s
ai-edge-litertwheel) - Git, including Git Bash (the setup script is a shell script)
Build with the LiteRT backend
Run from an MSVC developer environment (for example, launch Git Bash from the x64 Native Tools Command Prompt) so dumpbin and lib are on PATH — the setup script uses them to generate the import library (libLiteRt.lib) from the runtime DLL, which Google’s wheel doesn’t ship:
git clone https://github.com/soniqo/speech-core.git
cd speech-core
./scripts/fetch_litert.sh "$PWD/litert"
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DSPEECH_CORE_WITH_LITERT=ON \
-DLITERT_DIR=$PWD/litert
cmake --build build --parallel --config Release
Windows resolves DLLs via PATH, so prepend the litert directory before running anything that loads the backend:
export PATH="$PWD/litert:$PATH"
ctest --test-dir build --output-on-failure -C Release
Build with the ONNX Runtime backend
Alternatively (or additionally), enable the ONNX backend with ORT_DIR pointing at an extracted onnxruntime-win-x64 release. Optional NVIDIA CUDA / TensorRT acceleration is available via -DSPEECH_CORE_WITH_CUDA=ON — runtime-gated by SPEECH_CORE_ORT_PROVIDER with silent CPU fallback — and is the target for PersonaPlex 7B full-duplex speech-to-speech:
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DSPEECH_CORE_WITH_ONNX=ON \
-DORT_DIR=C:/path/to/onnxruntime-win-x64
cmake --build build --parallel --config Release
A voice-cloning CLI (speech_voxcpm2_clone) is built automatically whenever SPEECH_CORE_WITH_LITERT=ON — see examples/litert. Prefer a GUI? Speech Studio ships a Windows installer with the same VoxCPM2 engine.
Next steps
- Speech Core — the full model matrix (ONNX / LiteRT columns) and quick-start API examples
- docs/pipeline.md — the
VoicePipelinevoice-agent loop - huggingface.co/soniqo — converted model weights
- Discord — questions and support