Silero VAD — Android

Silero VAD v5 runs on Android, embedded Linux, and Windows via ONNX Runtime, providing streaming voice activity detection with sub-millisecond latency. It serves as the pipeline's speech trigger — STT only runs when speech is detected, saving compute.

Model

Model	Backend	Size	HuggingFace
Silero-VAD-v5	ONNX Runtime	~2 MB	soniqo/Silero-VAD-v5-ONNX

Performance

Metric	Value
Chunk size	32 ms (512 samples at 16 kHz)
Latency	Sub-millisecond per chunk
RTF	< 0.01

Configuration

Parameter	Default	Description
`min_silence_duration`	0.5s	Silence duration required to end a speech segment
`min_speech_duration`	0.15s	Minimum speech duration to trigger detection

Important

On Android, VAD is part of the SpeechPipeline and is not used standalone. The pipeline automatically handles the VAD → STT → TTS flow. See speech-android on GitHub for integration details.

Source code: github.com/soniqo/speech-android