Function Calling in Google Gemma3

8 min read

David Muraya Gemma Function Calling Header Image

Gemma3 is a new open model from Google, designed to be lightweight and run on all sorts of devices - phones, laptops, workstations, you name it. It's part of the Gemma family, which has already seen over 100 million downloads and a community that's built more than 60,000 variants. One thing that makes Gemma3 stand out is its function calling feature. This lets the model connect to external tools and execute real-world actions based on what users ask it to do, going beyond just spitting out text.

What Gemma3 Can Do

Gemma3 comes with many features that developers can tap into:

It handles over 35 languages right away and can be pretrained for over 140.
It processes text, images, and short videos, so you can build apps that understand multiple types of input.
It has a 128k-token context window, meaning it can keep track of a lot of information at once.
It supports function calling, which lets it automate tasks and create interactive AI experiences.
There are quantized versions available, which make it run faster on devices with less power.

These capabilities make it a flexible tool for building all kinds of applications.

How Function Calling Works in Gemma3

Function calling in Gemma3 is about giving the model the ability to talk to external tools and APIs. Normally, a language model just generates text. But with function calling, Gemma3 can figure out when it needs to call a specific function and pass along the right parameters to get something done - like fetching live data or controlling a device. This is a big deal for making AI assistants that can actually help with practical tasks.

Here's where it gets interesting: Gemma3 uses Python function calling instead of the JSON schema approach you might see in models like OpenAI's. In this setup, the model generates Python code that can be run to call functions, rather than handing you a structured JSON output.

The Debate Around Python Function Calling

Not everyone's thrilled about this choice. Some developers wonder why Gemma3 didn't stick with the JSON schema standard, which is widely used and easy to parse. They argue it's a step backward - why mess with something that works? They're also nervous about security, since running generated code can be risky if it's not carefully checked.

But others defend the Python approach, and they've got some solid points:

Language models like Gemma3 are trained on tons of code, including Python, so generating Python feels natural to them.
Python lets you write more expressive and flexible interactions. You can chain multiple actions together without needing a back-and-forth conversation.
Some code-based agents, like Hugging Face's smolagent, have shown better performance compared to JSON-based tool callers.

Gemma3 will generate the code but you have to execute yourself - it can't run it on its own - you need to be cautious. That means validating the code and setting up safeguards to avoid surprises.

A Real Example: Building a Conversational Agent

To show how this works, here's a Python script that sets up a conversational agent using Gemma3 (specifically the 4B version). This agent chats with users, detects when it needs to call a function, runs the code, and weaves the results back into the conversation. It's got functions for things like getting the current time, converting currencies, and checking exchange rates.

import asyncio
import io
import re
from contextlib import redirect_stdout
from datetime import datetime, timedelta

import requests
from ollama import AsyncClient

from config import get_settings

settings = get_settings()

MODEL = "gemma3:4b"


# extract the tool call from the response
def extract_tool_call(text):
    pattern = r"```tool_code\s*(.*?)\s*```"
    match = re.search(pattern, text, re.DOTALL)
    if match:
        code = match.group(1).strip()
        # Capture stdout in a string buffer
        f = io.StringIO()
        with redirect_stdout(f):
            result = eval(code)
        output = f.getvalue()
        r = result if output == "" else output
        return f"```tool_output\n{str(r).strip()}\n```"
    return None


def get_current_date_time() -> str:
    """
    Gets the current system time and formats it as a string.

    Returns:
        str: The current system time formatted as Weekday, Month Day, YYYY HH:MM:SS.
    """
    now = datetime.now()
    return now.strftime("%A, %B %d, %Y %H:%M:%S")


def convert(amount: float, currency: str, new_currency: str) -> None | float:
    # default ask:
    ask: float = 1.0

    # date today:
    date_today = datetime.now().strftime("%Y-%m-%d")

    # date yesterday:
    date_yesterday = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")

    # generate the url:
    url = f"https://{settings.EXCHANGE_RATE_SITE}/cc-api/currencies?base={currency}&quote={new_currency}&data_type=general_currency_pair&start_date={date_yesterday}&end_date={date_today}"

    response = requests.request("GET", url)

    # convert to json:
    rates = response.json()

    if not rates:
        return None

    if rates:
        if "error" in rates:
            return None

    for rate in rates["response"]:
        ask = rate["average_ask"]
        break

    return float(ask) * amount


def get_current_exchange_rate(currency: str, new_currency: str) -> None | float:
    # default ask:
    ask: float = 1.0

    # date today:
    date_today = datetime.now().strftime("%Y-%m-%d")

    # date yesterday:
    date_yesterday = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")

    # generate the url:
    url = f"https://{settings.EXCHANGE_RATE_SITE}/cc-api/currencies?base={currency}&quote={new_currency}&data_type=general_currency_pair&start_date={date_yesterday}&end_date={date_today}"

    response = requests.request("GET", url)

    # convert to json:
    rates = response.json()

    if not rates:
        return None

    if rates:
        if "error" in rates:
            return None

    for rate in rates["response"]:
        ask = rate["average_ask"]
        break

    return float(ask)


def get_historical_exchange_rate(
    currency: str, new_currency: str, date: str
) -> None | float:
    # default ask:
    ask: float = 1.0

    # conversion date:
    d = datetime.strptime(date, "%Y-%m-%d")

    # date yesterday:
    previous_day = (d - timedelta(days=1)).strftime("%Y-%m-%d")

    # generate the url:
    url = f"https://{settings.EXCHANGE_RATE_SITE}/cc-api/currencies?base={currency}&quote={new_currency}&data_type=general_currency_pair&start_date={previous_day}&end_date={date}"

    response = requests.request("GET", url)

    # convert to json:
    rates = response.json()

    if not rates:
        return None

    if rates:
        if "error" in rates:
            return None

    for rate in rates["response"]:
        ask = rate["average_ask"]
        break

    return float(ask)


instruction_prompt = '''You are a helpful conversational AI assistant.
At each turn, if you decide to invoke any of the function(s), it should be wrapped with ```tool_code```.
The python methods described below are imported and available, you can only use defined methods.
ONLY use the ```tool_code``` format when absolutely necessary to answer the user's question.
The generated code should be readable and efficient.

For questions that don't require any specific tools, just respond normally without tool calls.

The response to a method will be wrapped in ```tool_output``` use it to call more tools or generate a helpful, friendly response.
When using a ```tool_call``` think step by step why and how it should be used.

The following Python methods are available:

```python
def get_current_date_time() -> str:
    """Gets the current system time and formats it as a string

    Args:
        None
    """

def convert(amount: float, currency: str, new_currency: str) -> float:
    """Convert the currency with the latest exchange rate

    Args:
      amount: The amount of currency to convert
      currency: The currency to convert from
      new_currency: The currency to convert to
    """

def get_current_exchange_rate(currency: str, new_currency: str) -> float:
    """Get the latest exchange rate for the currency pair

    Args:
      currency: The currency to convert from
      new_currency: The currency to convert to
    """

def get_historical_exchange_rate(currency: str, new_currency: str, date: str) -> float:
    """Get the historical exchange rate for the currency pair on a specific date

    Args:
      currency: The currency to convert from
      new_currency: The currency to convert to
      date: The target date (in 'YYYY-MM-DD' format) for which to fetch the rate
    """

The full code is available on my GitHub Page.

I used the guidelines from Philipp Schmid who is an engineer at Google DeepMind.

Here's how it works:

The agent listens to what you say and uses Gemma3 to figure out a response.
If it spots a tool_code block in the response, it pulls out the Python code and runs it.
The result gets wrapped in a tool_output block and sent back to the model, which then gives you a final answer.

This setup lets the agent handle tasks that need real-time info or calculations, making it more than just a chatbot.

Sample Conversation

Wrapping Up

Function calling in Gemma3 opens the door to building AI assistants that can do real stuff - like accessing live data or interacting with APIs. The Python approach has its pros and cons, sparking some lively debate among developers. It's more expressive and aligns with what the model's good at, but it also means you've got to be careful with the code it generates. As AI keeps growing, this kind of capability is going to be key for making tools that actually help people in the real world.

Contact Me

Have a project in mind? Send me an email at hello@davidmuraya.com and let's bring your ideas to life. I am always available for exciting discussions.

Twitter

GitHub