Kokoro TTS — Android
Kokoro-82M is a lightweight, non-autoregressive text-to-speech model running on Android via ONNX Runtime. It produces natural 24 kHz speech with 50 preset voices across 7 languages.
Supported Languages
| Language | Code | Example Voices |
|---|---|---|
| English (US) | en | af_heart, am_adam, af_sky |
| English (UK) | en | bf_emma, bm_george |
| Spanish | es | ef_dora |
| French | fr | ff_siwis |
| Hindi | hi | hf_alpha, hm_omega |
| Italian | it | if_sara |
| Japanese | ja | jf_alpha, jm_omega |
| Portuguese | pt | pf_dora |
| Chinese | zh | zf_xiaobei, zm_yunjian |
50 preset voices total. Voice naming convention: [language][gender]_[name] — e.g., af_heart = American Female "Heart".
Model Files
| File | Size |
|---|---|
kokoro-model-int8.onnx | ~89 MB |
voices.bin | Voice embeddings |
| Phoneme dictionaries | Language-specific pronunciation data |
HuggingFace: aufklarer/Kokoro-82M-ONNX
Performance
| Metric | Value |
|---|---|
| Parameters | 82M |
| Inference backend | ONNX Runtime |
| Output sample rate | 24 kHz |
Phonemizer
Text is converted to phoneme tokens using a dictionary-based phonemizer with language-specific support. The Android implementation includes phonemizers for English, French, Spanish, Italian, Portuguese, Hindi, Japanese, and Chinese.
Pipeline Integration
On Android, Kokoro TTS is part of the SpeechPipeline. After STT transcribes speech, the text is phonemized and synthesized back into audio. The pipeline manages the full VAD → STT → TTS flow automatically.
val modelDir = ModelManager.ensureModels(context)
val pipeline = SpeechPipeline(SpeechConfig(modelDir = modelDir))
pipeline.events.collect { event ->
when (event) {
is SpeechEvent.TranscriptionCompleted -> println(event.text)
else -> {}
}
}
pipeline.start()
pipeline.pushAudio(samples) // 16kHz mono float32
Source code: github.com/soniqo/speech-android