Silero VAD — Android
Silero VAD v5 runs on Android and embedded Linux via ONNX Runtime, providing streaming voice activity detection with sub-millisecond latency. It serves as the pipeline's speech trigger — STT only runs when speech is detected, saving compute.
Model
| Model | Backend | Size | HuggingFace |
|---|---|---|---|
| Silero-VAD-v5 | ONNX Runtime | ~2 MB | aufklarer/Silero-VAD-v5-ONNX |
Performance
| Metric | Value |
|---|---|
| Chunk size | 32 ms (512 samples at 16 kHz) |
| Latency | Sub-millisecond per chunk |
| RTF | < 0.01 |
Configuration
| Parameter | Default | Description |
|---|---|---|
min_silence_duration | 0.5s | Silence duration required to end a speech segment |
min_speech_duration | 0.15s | Minimum speech duration to trigger detection |
Important
On Android, VAD is part of the SpeechPipeline and is not used standalone. The pipeline automatically handles the VAD → STT → TTS flow. See speech-android on GitHub for integration details.
Source code: github.com/soniqo/speech-android