Getting Started — Android
speech-android provides on-device speech processing for Android using ONNX Runtime. The pipeline runs VAD + STT + TTS with barge-in support, fully offline after model download.
Requirements
- Android 8+ (API 26)
- arm64-v8a architecture
Download the pre-built demo app to try it immediately:
Gradle Dependency
Add the SDK to your build.gradle.kts:
implementation("audio.soniqo:speech:0.0.9")
Quick Start
val modelDir = ModelManager.ensureModels(context)
val pipeline = SpeechPipeline(SpeechConfig(modelDir = modelDir))
pipeline.events.collect { event ->
when (event) {
is SpeechEvent.TranscriptionCompleted -> println(event.text)
is SpeechEvent.ResponseDone -> pipeline.resumeListening()
else -> {}
}
}
pipeline.start()
pipeline.pushAudio(samples) // 16kHz mono float32
Models auto-download from HuggingFace on first use (~1.2 GB total). After the initial download, all inference runs fully offline.
System Voice Input (RecognitionService)
The SDK ships a ready-made SpeechRecognitionService that plugs into Android’s framework SpeechRecognizer API — no code to write. Once your app is selected as the default voice recognizer, any third-party app calling SpeechRecognizer.createSpeechRecognizer(context) (with no ComponentName) gets fully on-device STT through your pipeline.
1. Declare RECORD_AUDIO and the service in AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<application>
<service
android:name="audio.soniqo.speech.service.SpeechRecognitionService"
android:exported="true"
android:permission="android.permission.RECORD_AUDIO">
<intent-filter>
<action android:name="android.speech.RecognitionService" />
</intent-filter>
<meta-data
android:name="android.speech"
android:resource="@xml/recognition_service" />
</service>
</application>
2. Add app/src/main/res/xml/recognition_service.xml:
<?xml version="1.0" encoding="utf-8"?>
<recognition-service xmlns:android="http://schemas.android.com/apk/res/android" />
3. Set the service as the system default (Settings → System → Languages & input → Voice input picker on stock Android, or via adb):
adb shell settings put secure voice_recognition_service \
your.package/audio.soniqo.speech.service.SpeechRecognitionService
4. Verify by running the demo app’s Recognizer test screen, which calls SpeechRecognizer.createSpeechRecognizer(ctx) (no component) and logs every framework callback — useful for confirming the binder round-trip without needing logcat.
The service implements onCheckRecognitionSupport (API 33+) returning the 27 BCP-47 languages Parakeet TDT v3 covers, marked installedOnDeviceLanguage once models are present (or pendingOnDeviceLanguage while they’re downloading). Audio focus is acquired with AUDIOFOCUS_GAIN_TRANSIENT for the duration of a session.
Gboard, Samsung Keyboard, and Google Assistant bundle their own recognizers and skip the system default. Apps that explicitly call the framework SpeechRecognizer API (or build their own UI on top of it) are the ones that flow through your service.
Models
All models run via ONNX Runtime with NNAPI acceleration. INT8 quantized by default.
| Model | Task | Size |
|---|---|---|
| Parakeet TDT v3 (INT8) | Speech-to-Text (114 languages) | 490 MB |
| Kokoro-82M (INT8) | Text-to-Speech (7 languages) | 89 MB |
| Silero VAD v5 | Voice Activity Detection | 1.2 MB |
| DeepFilterNet3 (FP16) | Noise Cancellation | 4.2 MB |
Source code: github.com/soniqo/speech-android
Next Steps
- Benchmarks — Android inference performance
- Linux C API — embedded Linux setup