FunctionGemma 270M

FunctionGemma 270M is a Gemma 3 derivative fine-tuned for structured tool and function calls. Instead of free-form text, it emits a strict <start_function_call>call:NAME{...}<end_function_call> grammar that the SDK parses into typed FunctionCall values. At roughly 283 MB on disk, it is small enough to load alongside an ASR + TTS pipeline on phone-class hardware and serve as the “router” that turns a user’s utterance into a tool invocation.

Function-Calling, On-Device

FunctionGemma slots into voice agents wherever you would otherwise call a hosted LLM for tool routing. The grammar is strict by construction, so you get back parsed FunctionCall objects directly — no JSON repair, no schema-mode prompting.

Platforms

PlatformFormatSizeHuggingFace
Apple (macOS / iOS)CoreML~283 MBaufklarer/FunctionGemma-270M-CoreML
Android (and Linux / Windows via Speech Core)LiteRT-LM~283 MBsoniqo/FunctionGemma-270M-LiteRT-LM

Grammar

The model is trained to emit a single call (or a sequence of calls) wrapped in two sentinel tokens:

<start_function_call>call:set_timer{"minutes": 5, "label": "tea"}<end_function_call>

The SDK parses each call into a typed FunctionCall(name:, arguments:) value. Arguments are decoded as JSON so you can map them straight onto a Swift Codable or a Kotlin @Serializable data class.

Swift (Apple, CoreML)

On Apple platforms FunctionGemma is exposed through speech-swift as the FunctionGemma class. It loads the CoreML model from HuggingFace on first use and runs on the Neural Engine.

import FunctionGemma

let model = try await FunctionGemma.fromPretrained()

let tools = """
- set_timer(minutes: Int, label: String)
- get_weather(city: String)
"""

let calls = try model.callFunctions(
    tools: tools,
    userMessage: "Set a 5 minute tea timer"
)

for call in calls {
    print(call.name)       // "set_timer"
    print(call.arguments)  // ["minutes": 5, "label": "tea"]
}

Kotlin (Android, LiteRT-LM)

On Android the model is exposed through speech-android as audio.soniqo.speech.llm.FunctionGemma. It is a bring-your-own-runtime adapter: you provide a LiteRtLmRuntime instance (the SDK ships a default one), and FunctionGemma handles the prompt template and grammar parsing.

import audio.soniqo.speech.llm.FunctionGemma
import audio.soniqo.speech.llm.LiteRtLmRuntime

val runtime = LiteRtLmRuntime.fromPretrained(context)
val model = FunctionGemma(runtime)

val tools = """
- set_timer(minutes: Int, label: String)
- get_weather(city: String)
""".trimIndent()

val calls = model.callFunctions(
    tools = tools,
    userMessage = "Set a 5 minute tea timer",
)

for (call in calls) {
    println(call.name)       // "set_timer"
    println(call.arguments)  // {"minutes": 5, "label": "tea"}
}

Further Reading