onnx-community
/

functiongemma-270m-it-ONNX-GQA

+---
+license: gemma
+base_model:
+- google/functiongemma-270m-it
+library_name: transformers.js
+---
+# FunctionGemma model card
+**Model Page**: [FunctionGemma](https://ai.google.dev/gemma/docs/functiongemma)
+**Resources and Technical Documentation**:
+-   [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
+-   [FunctionGemma on Kaggle](https://www.kaggle.com/models/google/functiongemma/)
+-   [FunctionGemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/functiongemma)
+**Terms of Use**: [Terms](https://ai.google.dev/gemma/terms)\
+**Authors**: Google DeepMind
+## Model Information
+Summary description and brief definition of inputs and outputs.
+### Description
+> [!Note]
+> FunctionGemma is intended to be fine-tuned for your specific function-calling task, including multi-turn use cases.
+FunctionGemma is a lightweight, open model from Google, built as a foundation
+for creating your own specialized function calling models. FunctionGemma is not
+intended for use as a direct dialogue model, and is designed to be highly
+performant after further fine-tuning, as is typical of models this size. Built
+on the Gemma 3 270M model and with the same research and technology used to
+create the Gemini models, FunctionGemma has been trained specifically for
+function calling.  The model has the same architecture as Gemma 3, but uses a
+different chat format. The model is well suited for text-only function calling.
+The uniquely small size makes it possible to deploy in environments with limited
+resources such as laptops, desktops or your own cloud infrastructure,
+democratizing access to state of the art AI models and helping foster innovation
+for everyone. Furthermore, akin to the base Gemma 270M, the model has been
+optimized to be extremely versatile, performant on a variety of hardware in
+single turn scenarios, but should be finetuned on single turn or multiturn task
+specific data to achieve best accuracy in specific domains.
+To demonstrate how specializing the 270M parameter model can achieve high
+performance on specific agentic workflows, we have highlighted two use cases in
+the
+[Google AI Edge Gallery app](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery&pcampaignid=web_share).
+-   **Tiny Garden:** A model fine-tuned to power a voice-controlled
+    interactive game. It handles game logic to manage a virtual plot of land,
+    decomposing commands like "Plant sunflowers in the top row" and "Water the
+    flowers in plots 1 and 2" into app-specific functions (e.g., plant_seed,
+    water_plots) and coordinate targets. This demonstrates the model's capacity
+    to drive custom app mechanics without server connectivity.
+-   **Mobile Actions:** To empower developers to build their own expert
+    agents, we have published [a
+    dataset](https://huggingface.co/datasets/google/mobile-actions) and
+    [fine-tuning recipe](https://github.com/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb)
+    to demonstrate fine-tuning FunctionGemma. It translates user inputs (e.g.,
+    "Create a calendar event for lunch," "Turn on the flashlight") into
+    function calls that trigger Android OS system tools. This interactive
+    notebook demonstrates how to take the base FunctionGemma model and build a
+    "Mobile Actions" fine tune from scratch for use in the
+    [Google AI Edge gallery app](https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery&pcampaignid=web_share).
+    This use case demonstrates the model's ability to act as an offline,
+    private agent for personal device tasks.
+### Inputs and outputs
+-   **Input:**
+    -   Text string, such as a question, a prompt, or a document to be
+        summarized
+    -   Total input context of  32K tokens
+-   **Output:**
+    -   Generated text in response to the input, such as an answer to a
+        question, or a summary of a document
+    -   Total output context up to 32K tokens per request, subtracting
+        the request input tokens
+### Basic Usage
+The following is a code example of how to use FunctionGemma to generate a function call from a JSON definition using the Hugging Face Transformers.js library.
+If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
+```bash
+npm i @huggingface/transformers
+```
+You can then use the model as follows:
+```js
+import { AutoModelForCausalLM, AutoTokenizer } from "@huggingface/transformers";
+// Load the model and tokenizer
+const model_id = "onnx-community/functiongemma-270m-it-ONNX-GQA";
+const tokenizer = await AutoTokenizer.from_pretrained(model_id);
+const model = await AutoModelForCausalLM.from_pretrained(model_id);
+const weather_function_schema = {
+  type: "function",
+  function: {
+    name: "get_current_temperature",
+    description: "Gets the current temperature for a given location.",
+    parameters: {
+      type: "object",
+      properties: {
+        location: {
+          type: "string",
+          description: "The city name, e.g. San Francisco",
+        },
+      },
+      required: ["location"],
+    },
+  },
+};
+const messages = [
+  {
+    role: "developer",
+    content: "You are a model that can do function calling with the following functions",
+  },
+  {
+    role: "user",
+    content: "What's the temperature in London?",
+  },
+];
+const inputs = tokenizer.apply_chat_template(messages, {
+  tools: [weather_function_schema],
+  tokenize: true,
+  add_generation_prompt: true,
+  return_dict: true,
+});
+const output = await model.generate({ ...inputs, max_new_tokens: 512 });
+const decoded = tokenizer.decode(output.slice(0, [inputs.input_ids.dims[1], null]), { skip_special_tokens: false });
+console.log(decoded);
+// <start_function_call>call:get_current_temperature{location:<escape>London<escape>}<end_function_call><start_function_response>
+```
+For more detailed examples see the [Gemma documentation](https://ai.google.dev/gemma/docs/functiongemma).
+## Model Data
+Data used for model training and how the data was processed.
+### Training Dataset
+These models were trained on a dataset of text data that includes a wide
+variety of sources. The model was trained with 6T tokens. The knowledge cutoff
+date for the training data was August 2024. There are the key components:
+-   Public Tool Definitions - Common APIs found on the web
+-   Tool Use Interactions - These are a mix of prompts, function calls,
+    function responses, and natural language responses from the model to
+    summarise the function call response, or request clarifications when the
+    prompt is ambiguous or incomplete.
+### Data Preprocessing
+Here are the key data cleaning and filtering methods applied to the training
+data:
+-   CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering
+    was applied at multiple stages in the data preparation process to ensure
+    the exclusion of harmful and illegal content.
+-   Sensitive Data Filtering: As part of making Gemma pre-trained models
+    safe and reliable, automated techniques were used to filter out certain
+    personal information and other sensitive data from training sets.
+-   Additional methods: Filtering based on content quality and safety in
+    line with
+    [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf).
+## Implementation Information
+Details about the model internals.
+### Hardware
+Gemma was trained using [Tensor Processing Unit
+(TPU)](https://cloud.google.com/tpu/docs/intro-to-tpu) hardware (TPUv4p, TPUv5p
+and TPUv5e). Training vision-language models (VLMs) requires significant
+computational power. TPUs, designed specifically for matrix operations common in
+machine learning, offer several advantages in this domain:
+-   Performance: TPUs are specifically designed to handle the massive
+    computations involved in training VLMs. They can speed up training
+    considerably compared to CPUs.
+-   Memory: TPUs often come with large amounts of high-bandwidth memory,
+    allowing for the handling of large models and batch sizes during training.
+    This can lead to better model quality.
+-   Scalability: TPU Pods (large clusters of TPUs) provide a scalable
+    solution for handling the growing complexity of large foundation models.
+    You can distribute training across multiple TPU devices for faster and more
+    efficient processing.
+-   Cost-effectiveness: In many scenarios, TPUs can provide a more
+    cost-effective solution for training large models compared to CPU-based
+    infrastructure, especially when considering the time and resources saved
+    due to faster training.
+-   These advantages are aligned with
+    [Google's commitments to operate sustainably](https://sustainability.google/operating-sustainably/).
+### Software
+Training was done using [JAX](https://github.com/jax-ml/jax) and
+[ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/).
+JAX allows researchers to take advantage of the latest generation of hardware,
+including TPUs, for faster and more efficient training of large models. ML
+Pathways is Google's latest effort to build artificially intelligent systems
+capable of generalizing across multiple tasks. This is specially suitable for
+foundation models, including large language models like these ones.\
+Together, JAX and ML Pathways are used as described in the [paper about the
+Gemini family of models](https://goo.gle/gemma2report); *"the 'single
+controller' programming model of Jax and Pathways allows a single Python process
+to orchestrate the entire training run, dramatically simplifying the development
+workflow."*
+## Evaluation
+Model evaluation metrics and results.
+### Benchmark Results
+<table>
+  <thead>
+    <tr>
+      <th><strong>Benchmark</strong></th>
+      <th><strong>n-shot</strong></th>
+      <th><strong>Function Gemma 270m</strong></th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>BFCL Simple</td>
+      <td>0-shot</td>
+      <td>61.6</td>
+    </tr>
+    <tr>
+      <td>BFCL Parallel</td>
+      <td>0-shot</td>
+      <td>63.5</td>
+    </tr>
+    <tr>
+      <td>BFCL Multiple</td>
+      <td>0-shot</td>
+      <td>39</td>
+    </tr>
+    <tr>
+      <td>BFCL Parallel Multiple</td>
+      <td>0-shot</td>
+      <td>29.5</td>
+    </tr>
+    <tr>
+      <td>BFCL Live Simple </td>
+      <td>0-shot</td>
+      <td>36.2</td>
+    </tr>
+    <tr>
+      <td>BFCL Live Parallel</td>
+      <td>0-shot</td>
+      <td>25.7</td>
+    </tr>
+    <tr>
+      <td>BFCL Live Multiple</td>
+      <td>0-shot</td>
+      <td>22.9</td>
+    </tr>
+    <tr>
+      <td>BFCL Live Parallel Multiple</td>
+      <td>0-shot</td>
+      <td>20.8</td>
+    </tr>
+    <tr>
+      <td>BFCL Relevance</td>
+      <td>0-shot</td>
+      <td>61.1</td>
+    </tr>
+    <tr>
+      <td>BFCL Irrelevance</td>
+      <td>0-shot</td>
+      <td>70.6</td>
+    </tr>
+  </tbody>
+</table>
+**Impact on Performance after Fine-tuning on Mobile Actions Dataset**\
+To demonstrate the value of specialization for small language models, we
+compared the base FunctionGemma model against the fine-tuned model using the
+"Mobile Actions"
+[recipe](https://github.com/google-gemini/gemma-cookbook/blob/main/FunctionGemma/%5BFunctionGemma%5DFinetune_FunctionGemma_270M_for_Mobile_Actions_with_Hugging_Face.ipynb).
+Fine-tuning significantly improved the base FunctionGemma model's ability to
+correctly identify and format mobile system calls.
+<table>
+  <thead>
+    <tr>
+      <th><br>
+Model</th>
+      <th><br>
+Eval results for Mobile Actions</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><br>
+Base FunctionGemma model</td>
+      <td><br>
+58%</td>
+    </tr>
+    <tr>
+      <td><br>
+Mobile Actions Fine-Tune</td>
+      <td><br>
+85%</td>
+    </tr>
+  </tbody>
+</table>
+**On-Device Performance of the Gemma 270m Fine-tuned Use Cases**\
+We evaluated the fine-tuned use cases on a Samsung S25 Ultra to assess on-device
+latency and memory footprint.
+-   **Context:** 512 prefill tokens and 32 decode tokens.
+-   **Hardware:** S25 Ultra CPU using LiteRT XNNPACK delegate with 4 threads.
+Mobile Actions On Device Performance
+<table>
+  <thead>
+    <tr>
+      <th><br>
+Backend</th>
+      <th><br>
+Quantization scheme</th>
+      <th><br>
+Context length</th>
+      <th><br>
+Prefill (tokens per second)</th>
+      <th><br>
+Decode (tokens per second)</th>
+      <th><br>
+Time-to-first-token (seconds)</th>
+      <th><br>
+Model Size (MB)</th>
+      <th><br>
+Peak RSS Memory (MB)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><br>
+CPU</td>
+      <td><br>
+dynamic_int8</td>
+      <td><br>
+1024</td>
+      <td><br>
+1718</td>
+      <td><br>
+125.9</td>
+      <td><br>
+0.3</td>
+      <td><br>
+288</td>
+      <td><br>
+551</td>
+    </tr>
+  </tbody>
+</table>
+Tiny Garden On Device Performance
+<table>
+  <thead>
+    <tr>
+      <th><br>
+Backend</th>
+      <th><br>
+Quantization scheme</th>
+      <th><br>
+Context length</th>
+      <th><br>
+Prefill (tokens per second)</th>
+      <th><br>
+Decode (tokens per second)</th>
+      <th><br>
+Time-to-first-token (seconds)</th>
+      <th><br>
+Model Size (MB)</th>
+      <th><br>
+Peak RSS Memory (MB)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><br>
+CPU</td>
+      <td><br>
+dynamic_int8</td>
+      <td><br>
+1024</td>
+      <td><br>
+1743</td>
+      <td><br>
+125.7</td>
+      <td><br>
+0.3</td>
+      <td><br>
+288</td>
+      <td><br>
+549</td>
+    </tr>
+  </tbody>
+</table>
+## Ethics and Safety
+Ethics and safety evaluation approach and results.
+### Evaluation Approach
+Our evaluation methods include structured evaluations and internal red-teaming
+testing of relevant content policies. Red-teaming was conducted by a number of
+different teams, each with different goals and human evaluation metrics. These
+models were evaluated against a number of different categories relevant to
+ethics and safety, including:
+-   **Child Safety**: Evaluation of text-to-text and image to text prompts
+    covering child safety policies, including child sexual abuse and exploitation.
+-   **Content Safety:** Evaluation of text-to-text and image to text prompts
+    covering safety policies including, harassment, violence and gore, and hate
+    speech.
+-   **Representational Harms**: Evaluation of text-to-text and image to text
+    prompts covering safety policies including bias, stereotyping, and harmful
+    associations or inaccuracies.
+### Evaluation Results
+For all areas of safety testing, we saw major improvements in the categories of
+child safety, content safety, and representational harms relative to previous
+Gemma models. All testing was conducted without safety filters to evaluate the
+model capabilities and behaviors. The model produced minimal policy violations,
+and showed significant  improvements over previous Gemma models' performance
+with respect to ungrounded inferences. A limitation of our evaluations was they
+included only English language prompts.
+## Usage and Limitations
+These models have certain limitations that users should be aware of.
+### Intended Usage
+This model is not intended for use as a direct dialogue model.\
+Open Large Language Models (LLMs) have a wide range of applications across
+various industries and domains. The following list of potential uses is not
+comprehensive. The purpose of this list is to provide contextual information
+about the possible use-cases that the model creators considered as part of model
+training and development.
+-   Content Creation and Communication
+    -   Text Generation: These models can be used to generate creative
+        text formats such as poems, scripts, code, marketing copy, and email drafts.
+    -   Chatbots and Conversational AI: Power conversational interfaces
+        for customer service, virtual assistants, or interactive applications.
+    -   Text Summarization: Generate concise summaries of a text corpus,
+        research papers, or reports.
+-   Research and Education
+    -   Natural Language Processing (NLP) Research: These models can
+        serve as a foundation for researchers to experiment with NLP
+        techniques, develop algorithms, and contribute to the advancement of the field.
+    -   Language Learning Tools: Support interactive language learning
+        experiences, aiding in grammar correction or providing writing practice.
+    -   Knowledge Exploration: Assist researchers in exploring large
+        bodies of text by generating summaries or answering questions about
+        specific topics.
+### Limitations
+-   Training Data
+    -   The quality and diversity of the training data significantly
+        influence the model's capabilities. Biases or gaps in the training data
+        can lead to limitations in the model's responses.
+    -   The scope of the training dataset determines the subject areas
+        the model can handle effectively.
+-   Context and Task Complexity
+    -   Models are better at tasks that can be framed with clear
+        prompts and instructions. Open-ended or highly complex tasks might be
+        challenging.
+    -   A model's performance can be influenced by the amount of context
+        provided (longer context generally leads to better outputs, up to a
+        certain point).
+-   Language Ambiguity and Nuance
+    -   Natural language is inherently complex. Models might struggle
+        to grasp subtle nuances, sarcasm, or figurative language.
+-   Factual Accuracy
+    -   Models generate responses based on information they learned
+        from their training datasets, but they are not knowledge bases. They
+        may generate incorrect or outdated factual statements.
+-   Common Sense
+    -   Models rely on statistical patterns in language. They might
+        lack the ability to apply common sense reasoning in certain situations.
+### Ethical Considerations and Risks
+The development of large language models (LLMs) raises several ethical
+concerns. In creating an open model, we have carefully considered the
+following:
+-   Bias and Fairness
+    -   LLMs trained on large-scale, real-world text data can reflect
+        socio-cultural biases embedded in the training material. These models
+        underwent careful scrutiny, input data pre-processing described and
+        posterior evaluations reported in this card.
+-   Misinformation and Misuse
+    -   LLMs can be misused to generate text that is false, misleading,
+        or harmful.
+    -   Guidelines are provided for responsible use with the model, see
+        the [Responsible Generative AI Toolkit](https://ai.google.dev/responsible).
+-   Transparency and Accountability:
+    -   This model card summarizes details on the models' architecture,
+        capabilities, limitations, and evaluation processes.
+    -   A responsibly developed open model offers the opportunity to
+        share innovation by making LLM technology accessible to developers and
+        researchers across the AI ecosystem.
+Risks identified and mitigations:
+-   Perpetuation of biases: It's encouraged to perform continuous
+    monitoring (using evaluation metrics, human review) and the exploration of
+    de-biasing techniques during model training, fine-tuning, and other use cases.
+-   Generation of harmful content: Mechanisms and guidelines for content
+    safety are essential. Developers are encouraged to exercise caution and
+    implement appropriate content safety safeguards based on their specific
+    product policies and application use cases.
+-   Misuse for malicious purposes: Technical limitations and developer and
+    end-user education can help mitigate against malicious applications of
+    LLMs. Educational resources and reporting mechanisms for users to flag
+    misuse are provided. Prohibited uses of Gemma models are outlined in the
+    [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy)..
+-   Privacy violations: Models were trained on data filtered for removal of
+    PII (Personally Identifiable Information). Developers are encouraged to
+    adhere to privacy regulations with privacy-preserving techniques.
+### Benefits
+At the time of release, this family of models provides high-performance open large language model implementations designed from the ground up for Responsible AI development compared to similarly sized models.