FunctionGemma 270M
FunctionGemma 270M is a Gemma 3 derivative fine-tuned for structured tool and function calls. Instead of free-form text, it emits a strict <start_function_call>call:NAME{...}<end_function_call> grammar that the SDK parses into typed FunctionCall values. At roughly 283 MB on disk, it is small enough to load alongside an ASR + TTS pipeline on phone-class hardware and serve as the “router” that turns a user’s utterance into a tool invocation.
FunctionGemma slots into voice agents wherever you would otherwise call a hosted LLM for tool routing. The grammar is strict by construction, so you get back parsed FunctionCall objects directly — no JSON repair, no schema-mode prompting.
Platforms
| Platform | Format | Size | HuggingFace |
|---|---|---|---|
| Apple (macOS / iOS) | CoreML | ~283 MB | aufklarer/FunctionGemma-270M-CoreML |
| Android (and Linux / Windows via Speech Core) | LiteRT-LM | ~283 MB | soniqo/FunctionGemma-270M-LiteRT-LM |
Grammar
The model is trained to emit a single call (or a sequence of calls) wrapped in two sentinel tokens:
<start_function_call>call:set_timer{"minutes": 5, "label": "tea"}<end_function_call>
The SDK parses each call into a typed FunctionCall(name:, arguments:) value. Arguments are decoded as JSON so you can map them straight onto a Swift Codable or a Kotlin @Serializable data class.
Swift (Apple, CoreML)
On Apple platforms FunctionGemma is exposed through speech-swift as the FunctionGemma class. It loads the CoreML model from HuggingFace on first use and runs on the Neural Engine.
import FunctionGemma
let model = try await FunctionGemma.fromPretrained()
let tools = """
- set_timer(minutes: Int, label: String)
- get_weather(city: String)
"""
let calls = try model.callFunctions(
tools: tools,
userMessage: "Set a 5 minute tea timer"
)
for call in calls {
print(call.name) // "set_timer"
print(call.arguments) // ["minutes": 5, "label": "tea"]
}
Kotlin (Android, LiteRT-LM)
On Android the model is exposed through speech-android as audio.soniqo.speech.llm.FunctionGemma. It is a bring-your-own-runtime adapter: you provide a LiteRtLmRuntime instance (the SDK ships a default one), and FunctionGemma handles the prompt template and grammar parsing.
import audio.soniqo.speech.llm.FunctionGemma
import audio.soniqo.speech.llm.LiteRtLmRuntime
val runtime = LiteRtLmRuntime.fromPretrained(context)
val model = FunctionGemma(runtime)
val tools = """
- set_timer(minutes: Int, label: String)
- get_weather(city: String)
""".trimIndent()
val calls = model.callFunctions(
tools = tools,
userMessage = "Set a 5 minute tea timer",
)
for (call in calls) {
println(call.name) // "set_timer"
println(call.arguments) // {"minutes": 5, "label": "tea"}
}
Further Reading
- Speech Core — C++ engine that hosts the LiteRT-LM runtime and the function-calling loop on Linux, Windows and Android.
- github.com/soniqo/speech-core — orchestration core and LiteRT-LM glue, including the
VoicePipelinetool-call loop. - github.com/soniqo/speech-swift — Apple SDK with the
FunctionGemmaSwift class. - github.com/soniqo/speech-android — Android SDK with
audio.soniqo.speech.llm.FunctionGemma. - google/gemma-3-270m — upstream Gemma 3 270M base model on HuggingFace.