Silero VAD — Android

Silero VAD v5 runs on Android and embedded Linux via ONNX Runtime, providing streaming voice activity detection with sub-millisecond latency. It serves as the pipeline's speech trigger — STT only runs when speech is detected, saving compute.

Model

ModelBackendSizeHuggingFace
Silero-VAD-v5ONNX Runtime~2 MBaufklarer/Silero-VAD-v5-ONNX

Performance

MetricValue
Chunk size32 ms (512 samples at 16 kHz)
LatencySub-millisecond per chunk
RTF< 0.01

Configuration

ParameterDefaultDescription
min_silence_duration0.5sSilence duration required to end a speech segment
min_speech_duration0.15sMinimum speech duration to trigger detection
Important

On Android, VAD is part of the SpeechPipeline and is not used standalone. The pipeline automatically handles the VAD → STT → TTS flow. See speech-android on GitHub for integration details.

Source code: github.com/soniqo/speech-android