Getting Started — Android

Apple Android Linux Windows

speech-android provides on-device speech processing for Android using ONNX Runtime. The pipeline runs VAD + STT + TTS with barge-in support, fully offline after model download.

Requirements

Android 8+ (API 26)
arm64-v8a architecture

Download the pre-built demo app to try it immediately:

app-release.apk

Gradle Dependency

Add the SDK to your build.gradle.kts:

implementation("audio.soniqo:speech:0.0.9")

Quick Start

val modelDir = ModelManager.ensureModels(context)
val pipeline = SpeechPipeline(SpeechConfig(modelDir = modelDir))
pipeline.events.collect { event ->
    when (event) {
        is SpeechEvent.TranscriptionCompleted -> println(event.text)
        is SpeechEvent.ResponseDone -> pipeline.resumeListening()
        else -> {}
    }
}
pipeline.start()
pipeline.pushAudio(samples) // 16kHz mono float32

Important

Models auto-download from HuggingFace on first use (~1.2 GB total). After the initial download, all inference runs fully offline.

System Voice Input (RecognitionService)

The SDK ships a ready-made SpeechRecognitionService that plugs into Android’s framework SpeechRecognizer API — no code to write. Once your app is selected as the default voice recognizer, any third-party app calling SpeechRecognizer.createSpeechRecognizer(context) (with no ComponentName) gets fully on-device STT through your pipeline.

1. Declare RECORD_AUDIO and the service in AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

<application>
    <service
        android:name="audio.soniqo.speech.service.SpeechRecognitionService"
        android:exported="true"
        android:permission="android.permission.RECORD_AUDIO">
        <intent-filter>
            <action android:name="android.speech.RecognitionService" />
        </intent-filter>
        <meta-data
            android:name="android.speech"
            android:resource="@xml/recognition_service" />
    </service>
</application>

2. Add app/src/main/res/xml/recognition_service.xml:

<?xml version="1.0" encoding="utf-8"?>
<recognition-service xmlns:android="http://schemas.android.com/apk/res/android" />

3. Set the service as the system default (Settings → System → Languages & input → Voice input picker on stock Android, or via adb):

adb shell settings put secure voice_recognition_service \
  your.package/audio.soniqo.speech.service.SpeechRecognitionService

4. Verify by running the demo app’s Recognizer test screen, which calls SpeechRecognizer.createSpeechRecognizer(ctx) (no component) and logs every framework callback — useful for confirming the binder round-trip without needing logcat.

The service implements onCheckRecognitionSupport (API 33+) returning the 27 BCP-47 languages Parakeet TDT v3 covers, marked installedOnDeviceLanguage once models are present (or pendingOnDeviceLanguage while they’re downloading). Audio focus is acquired with AUDIOFOCUS_GAIN_TRANSIENT for the duration of a session.

Caveat

Gboard, Samsung Keyboard, and Google Assistant bundle their own recognizers and skip the system default. Apps that explicitly call the framework SpeechRecognizer API (or build their own UI on top of it) are the ones that flow through your service.

Models

All models run via ONNX Runtime with NNAPI acceleration. INT8 quantized by default.

Model	Task	Size
Parakeet TDT v3 (INT8)	Speech-to-Text (114 languages)	891 MB
Kokoro-82M (INT8)	Text-to-Speech (8 languages)	330 MB
Silero VAD v5	Voice Activity Detection	2 MB
DeepFilterNet3	Noise Cancellation	8 MB

Source code: github.com/soniqo/speech-android

Next Steps

Benchmarks — Android inference performance
Linux C API — embedded Linux setup