# Gemma

This folder is organized into several categories, each focusing on a speicific aspect of working with Gemma models:

* [Inference and serving](#inference-and-serving) : How to load, run and deploy Gemma models for inference
* [Prompting](#prompting) : Explore various prompting techniques
* [RAG (Retrieval Augmented Generation)](#rag) : How to build RAG systems with Gemma
* [Finetuning](#finetuning) : Dive into finetuning Gemma models for specific tasks and domains
* [Alignment](#alignment) : Techniques for aligning Gemma models
* [Evaluation](#evaluation) : How to evaluate Gemma models
* [Agentic AI](#agentic-ai) : How to build an intelligent agents using Gemma models

## Inference and serving

| Notebook Name | Description |
:------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [[Gemma_1]Basics_with_HF.ipynb]([Gemma_1]Basics_with_HF.ipynb)                                                       | Load, run, finetune and deploy Gemma using [Hugging Face](https://huggingface.co/).                                                                                                     |
| [[Gemma_1]Common_use_cases.ipynb]([Gemma_1]Common_use_cases.ipynb)                                                       | Illustrate some common use cases for Gemma.                                                                                                    |
| [[Gemma_1]Inference with Flax/NNX](https://flax.readthedocs.io/en/latest/guides/gemma.html)                             | Gemma 1 inference with Flax/NNX framework (linking to Flax documentation)                                                                                                               |
| [[Gemma_1]Inference_on_TPU.ipynb]([Gemma_1]Inference_on_TPU.ipynb)                                                   | Basic inference of Gemma with JAX/Flax on TPU.                                                                                                                                          |
| [[Gemma_1]Using_with_Ollama.ipynb]([Gemma_1]Using_with_Ollama.ipynb)                                                                 | Run Gemma models using [Ollama](https://www.ollama.com/).  |
| [[Gemma_1]Using_with_OneTwo.ipynb]([Gemma_1]Using_with_OneTwo.ipynb)                                                     | Integrate Gemma with [Google OneTwo](https://github.com/google-deepmind/onetwo).                                                                                                        |
| [[Gemma_1]data_parallel_inference_in_jax_tpu.ipynb]([Gemma_1]data_parallel_inference_in_jax_tpu.ipynb)               | Parallel inference of Gemma with JAX/Flax on TPU.                                                                                                                                       |
| [[Gemma_2]Constrained_generation.ipynb]([Gemma_2]Constrained_generation.ipynb)                             | Constrained generation with Gemma models using [LlamaCpp](https://github.com/abetlen/llama-cpp-python/) and [Guidance](https://github.com/guidance-ai/guidance/tree/main/).             |
| [[Gemma_2]Deploy_in_Vertex_AI.ipynb]([Gemma_2]Deploy_in_Vertex_AI.ipynb)                                             | Deploy a Gemma model using [Vertex AI](https://cloud.google.com/vertex-ai).                                                                                                             |
| [[Gemma_2]Deploy_with_vLLM.ipynb]([Gemma_2]Deploy_with_vLLM.ipynb)                                                               | Deploy a Gemma model using [vLLM](https://github.com/vllm-project/vllm).                                                                                                                |
| [[Gemma_2]Game_Design_Brainstorming.ipynb]([Gemma_2]Game_Design_Brainstorming.ipynb)                                             | Use Gemma to brainstorm ideas during game design using Keras.                                                                                                                           |
| [[Gemma_2]Gradio_Chatbot.ipynb]([Gemma_2]Gradio_Chatbot.ipynb)                                                       | Building a Chatbot with Gemma and Gradio                                                                                                                                                |
| [[Gemma_2]Guess_the_word.ipynb]([Gemma_2]Guess_the_word.ipynb)                                                                   | Play a word guessing game with Gemma using Keras.                                                                                                                                       |
| [[Gemma_2]Keras_Quickstart.ipynb]([Gemma_2]Keras_Quickstart.ipynb)                                               | Gemma 2 pre-trained 9B model quickstart tutorial with Keras.                                                                                                                            |
| [[Gemma_2]Keras_Quickstart_Chat.ipynb]([Gemma_2]Keras_Quickstart_Chat.ipynb)                                     | Gemma 2 instruction-tuned 9B model quickstart tutorial with Keras. Referenced in this [blog](https://developers.googleblog.com/en/fine-tuning-gemma-2-with-keras-hugging-face-update/). |
| [[Gemma_2]Smart_Contract_Auditing.ipynb]([Gemma_2]Smart_Contract_Auditing.ipynb) | Audit and refactor Solidity smart contracts using Gemma 2. |
| [[Gemma_2]Synthetic_data_generation.ipynb]([Gemma_2]Synthetic_data_generation.ipynb)                   | Synthetic data generation with Gemma 2                                                                                                                                                  |
| [[Gemma_2]Using_Gemini_and_Gemma_with_RouteLLM.ipynb]([Gemma_2]Using_Gemini_and_Gemma_with_RouteLLM.ipynb)                       | Route Gemma and Gemini models using [RouteLLM](https://github.com/lm-sys/RouteLLM/).                                                                                                    |
| [[Gemma_2]Using_with_LLM_Comparator.ipynb](Gemma/[Gemma_2]Using_with_LLM_Comparator.ipynb) | Compare Gemma with another LLM using [LLM Comparator](https://github.com/pair-code/llm-comparator/). |
| [[Gemma_2]Using_with_Langfun_and_LlamaCpp.ipynb]([Gemma_2]Using_with_Langfun_and_LlamaCpp.ipynb)                                 | Leverage [Langfun](https://github.com/google/langfun) to seamlessly integrate natural language with programming using Gemma 2 and [LlamaCpp](https://github.com/ggerganov/llama.cpp).   |
| [[Gemma_2]Using_with_Langfun_and_LlamaCpp_Python_Bindings.ipynb]([Gemma_2]Using_with_Langfun_and_LlamaCpp_Python_Bindings.ipynb) | Leverage [Langfun](https://github.com/google/langfun) for smooth language-program interaction with Gemma 2 and [llama-cpp-python](https://github.com/abetlen/llama-cpp-python).         |
| [[Gemma_2]Using_with_LlamaCpp.ipynb]([Gemma_2]Using_with_LlamaCpp.ipynb)                                             | Run Gemma models using [LlamaCpp](https://github.com/abetlen/llama-cpp-python/).                                                                                                        |
| [[Gemma_2]Using_with_Llamafile.ipynb]([Gemma_2]Using_with_Llamafile.ipynb)                                           | Run Gemma models using [Llamafile](https://github.com/Mozilla-Ocho/llamafile/).                                                                                                         |
| [[Gemma_2]Using_with_LocalGemma.ipynb]([Gemma_2]Using_with_LocalGemma.ipynb)                                         | Run Gemma models using [Local Gemma](https://github.com/huggingface/local-gemma/).                                                                                                      |
| [[Gemma_2]Using_with_Mesop.ipynb]([Gemma_2]Using_with_Mesop.ipynb)                                                       | Integrate Gemma with [Google Mesop](https://google.github.io/mesop/).                                                                                                                   |
| [[Gemma_2]Using_with_Ollama_Python.ipynb]([Gemma_2]Using_with_Ollama_Python.ipynb)                                                   | Run Gemma models using [Ollama Python library](https://github.com/ollama/ollama-python).                                                                                                                              |
| [[Gemma_2]Using_with_SGLang.ipynb]([Gemma_2]Using_with_SGLang.ipynb)                                                 | Run Gemma models using [SGLang](https://github.com/sgl-project/sglang/).                                                                                                                |
| [[Gemma_2]Using_with_Xinference.ipynb]([Gemma_2]Using_with_Xinference.ipynb)                                         | Run Gemma models using [Xinference](https://github.com/xorbitsai/inference/).                                                                                                           |
| [[Gemma_2]Using_with_mistral_rs.ipynb]([Gemma_2]Using_with_mistral_rs.ipynb)                                         | Run Gemma models using [mistral.rs](https://github.com/EricLBuehler/mistral.rs/).                                                                                                       |
| [[Gemma_2]for_Japan_using_Transformers_and_PyTorch.ipynb]([Gemma_2]for_Japan_using_Transformers_and_PyTorch.ipynb) | [Gemma 2 for Japan](https://blog.google/intl/ja-jp/company-news/technology/gemma-2-2b/) |
| [[Gemma_2]on_Groq.ipynb]([Gemma_2]on_Groq.ipynb)                                                                   | Leverage the free Gemma 2 9B IT model hosted on [Groq](https://groq.com/) (super fast speed).                                                                                           |
| [[Gemma_3]Inference_images_and_videos.ipynb]([Gemma_3]Inference_images_and_videos.ipynb)                                                                                    | Inference on images and videos using Gemma 3 4B IT model.                                                                                                                                                 |
| [[Gemma_3]Using_with_Ollama_Python_Inference_with_Images.ipynb]([Gemma_3]Using_with_Ollama_Python_Inference_with_Images.ipynb)                                                                 | Run inference with images on Gemma 3 using [Ollama Python library](https://github.com/ollama/ollama-python).  |
| [[Gemma_3]Using_with_Transformersjs.ipynb]([Gemma_3]Using_with_Transformersjs.ipynb)                                                         | Run Gemma 3 with [Transformers.js](https://github.com/huggingface/transformers.js).                                                                                                                                      |
| [[Gemma_3]Activation_Hacking.ipynb]([Gemma_3]Activation_Hacking.ipynb)                                                         | Examine and modify internal states, including the residual stream, MLP activations, and attention mechanisms. |
| [[Gemma_3]Chess.ipynb]([Gemma_3]Chess.ipynb)                                                         | Gemma \| Chess: Learn, Analyze, and Discover a New Dimension! |
| [[Gemma_3]Gradio_LlamaCpp_Chatbot.ipynb]([Gemma_3]Gradio_LlamaCpp_Chatbot.ipynb)                                                         | Building a Chatbot with Gemma 3 QAT text model using Llama.cpp and Gradio. |
| [[Gemma_3]Speculative_Decoding.ipynb]([Gemma_3]Speculative_Decoding.ipynb)                                                         | Achieve 2-3x inference speedup for Gemma models using speculative decoding. |
| [[Gemma_3]Visual_Document_Extraction_to_JSON.ipynb]([Gemma_3]Visual_Document_Extraction_to_JSON.ipynb) | Demonstrate zero-shot OCR and structured JSON data extraction from images using the natively multimodal Gemma 3 4B-IT model. |
| [[Gemma_3n]Audio_understanding_with_HF.ipynb]([Gemma_3n]Audio_understanding_with_HF.ipynb)                           | Run Gemma 3n with audio input |
| [[Gemma_3n]Multimodal_understanding_with_HF.ipynb]([Gemma_3n]Multimodal_understanding_with_HF.ipynb)                 | Run Gemma 3n with image + audio input |
| [[Gemma_3n]MatFormer_Lab.ipynb]([Gemma_3n]MatFormer_Lab.ipynb)                                                       | Run Gemma 3n with MatFormers and Mix-n-Match |
| [[Gemma_3n]Using_with_Transformersjs.ipynb]([Gemma_3n]Using_with_Transformersjs.ipynb)                                                       | Run Gemma 3n with [Transformers.js](https://github.com/huggingface/transformers.js). |

## Prompting
| Notebook Name | Description |
| :------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [[Gemma_1]Advanced_Prompting_Techniques.ipynb]([Gemma_1]Advanced_Prompting_Techniques.ipynb)                                     | Illustrate advanced prompting techniques with Gemma.                                                                                                                                    |
| [[Gemma_2]LangChain_chaining.ipynb]([Gemma_2]LangChain_chaining.ipynb)                                                           | Illustrate LangChain chaining  with Gemma.                                                                                                                                              |
| [[Gemma_2]Prompt_chaining.ipynb]([Gemma_2]Prompt_chaining.ipynb)                                                                 | Illustrate prompt chaining and iterative generation with Gemma.                                                                                                                         |
| [[Gemma_3]In-context_Learning.ipynb]([Gemma_3]In-context_Learning.ipynb)                                                         | Demonstrate in-context learning with Gemma 3 long context window |

## RAG
| Notebook Name | Description |
| :------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [[Gemma_1]Minimal_RAG.ipynb]([Gemma_1]Minimal_RAG.ipynb)                                                                         | Minimal example of building a RAG system with Gemma using [Google UniSim](https://github.com/google/unisim) and [Hugging Face](https://huggingface.co/).                                |
| [[Gemma_1]RAG_with_ChromaDB.ipynb]([Gemma_1]RAG_with_ChromaDB.ipynb)                                                             | Build a Retrieval Augmented Generation (RAG) system with Gemma using [ChromaDB](https://www.trychroma.com/) and [Hugging Face](https://huggingface.co/).                                |
| [[Gemma_2]RAG_LlamaIndex.ipynb]([Gemma_2]RAG_LlamaIndex.ipynb)                                                       | RAG example with [LlamaIndex](https://www.llamaindex.ai/) using Gemma.                                                                                                                  |
| [[Gemma_2]RAG_PDF_Search_in_multiple_documents_on_Colab.ipynb]([Gemma_2]RAG_PDF_Search_in_multiple_documents_on_Colab.ipynb)     | RAG PDF Search in multiple documents using Gemma 2 2B on Google Colab.                                                                                                                  |
| [[Gemma_2]Using_with_Elasticsearch_and_LangChain.ipynb]([Gemma_2]Using_with_Elasticsearch_and_LangChain.ipynb)       | Example to demonstrate using Gemma with [Elasticsearch](https://www.elastic.co/elasticsearch/), [Ollama](https://www.ollama.com/) and [LangChain](https://www.langchain.com/).          |
| [[Gemma_2]Using_with_Firebase_Genkit_and_Ollama.ipynb]([Gemma_2]Using_with_Firebase_Genkit_and_Ollama.ipynb)                     | Example to demonstrate using Gemma with [Firebase Genkit](https://firebase.google.com/docs/genkit/) and [Ollama](https://www.ollama.com/)                                               |
| [[Gemma_2]Using_with_LangChain.ipynb]([Gemma_2]Using_with_LangChain.ipynb)                                           | Examples to demonstrate using Gemma with [LangChain](https://www.langchain.com/).                                                                                                       |
| [[Gemma_3]Local_Agentic_RAG.ipynb]([Gemma_3]Local_Agentic_RAG.ipynb)                                           | Build local Agentic RAG without any external APIs using [FastEmbed](https://github.com/qdrant/fastembed), [Ollama- Gemma3](https://ollama.com/models), and [Qdrant Vector database](https://cloud.qdrant.io)                                                                                                      |
| [[Gemma_3]RAG_with_EmbeddingGemma.ipynb]([Gemma_3]RAG_with_EmbeddingGemma.ipynb)                               | Build simple RAG with [EmbeddingGemma](https://ai.google.dev/gemma/docs/embeddinggemma) |


## Finetuning
| Notebook Name                                                                                                                      | Description |
|:-----------------------------------------------------------------------------------------------------------------------------------| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [[Gemma_1]Finetune_distributed.ipynb]([Gemma_1]Finetune_distributed.ipynb)                                                         | Chat with Gemma 7B and finetune it so that it generates responses in pirates' tone.                                                                                                     |
| [[Gemma_1]Finetune_with_LLaMA_Factory.ipynb]([Gemma_1]Finetune_with_LLaMA_Factory.ipynb)                                           | Finetune Gemma using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).                                                                                                         |
| [[Gemma_1]Finetune_with_XTuner.ipynb]([Gemma_1]Finetune_with_XTuner.ipynb)                                                         | Finetune Gemma using [XTuner](https://github.com/InternLM/xtuner).                                                                                                                      |
| [[Gemma_2]Custom_Vocabulary.ipynb]([Gemma_2]Custom_Vocabulary.ipynb)                                                               | Demonstrate how to use a custom vocabulary "&lt;unused[0-98]&gt;" tokens in Gemma.                                                                                                      |
| [[Gemma_2]Finetune_with_Axolotl.ipynb]([Gemma_2]Finetune_with_Axolotl.ipynb)                                                       | Finetune Gemma using [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl).                                                                                                    |
| [[Gemma_2]Finetune_with_CALM.ipynb]([Gemma_2]Finetune_with_CALM.ipynb)                                                             | Finetune Gemma using [CALM](https://github.com/google-deepmind/calm).                                                                                                                   |
| [[Gemma_2]Finetune_with_Function_Calling.ipynb]([Gemma_2]Finetune_with_Function_Calling.ipynb)                                     | Finetuning Gemma for Function Calling using [PyTorch/XLA](https://github.com/pytorch/xla).                                                                                              |
| [[Gemma_2]Finetune_with_JORA.ipynb]([Gemma_2]Finetune_with_JORA.ipynb)                                                             | Finetune Gemma using [JORA](https://github.com/aniquetahir/JORA).                                                                                                                       |
| [[Gemma_2]Finetune_with_LORA.ipynb]([Gemma_2]Finetune_with_LORA.ipynb)                                                             | Finetune Gemma using LORA.                                                                                                                       |
| [[Gemma_2]Finetune_with_LitGPT.ipynb]([Gemma_2]Finetune_with_LitGPT.ipynb)                                                         | Finetune Gemma using [LitGPT](https://github.com/Lightning-AI/litgpt).                                                                                                                  |
| [[Gemma_2]Finetune_with_Torch_XLA.ipynb]([Gemma_2]Finetune_with_Torch_XLA.ipynb)                                                   | Finetune Gemma using [PyTorch/XLA](https://github.com/pytorch/xla).                                                                                                                     |
| [[Gemma_2]Finetune_with_Unsloth.ipynb]([Gemma_2]Finetune_with_Unsloth.ipynb)                                                       | Finetune Gemma using [Unsloth](https://unsloth.ai/blog/gemma).                                                                                                                          |
| [[Gemma_2]Translator_of_Old_Korean_Literature.ipynb]([Gemma_2]Translator_of_Old_Korean_Literature.ipynb)                           | Use Gemma to translate old Korean literature using Keras.                                                                                                                               |
| [[Gemma_3]Full_Model_Finetune_using_HF.ipynb]([Gemma_3]Full_Model_Finetune_using_HF.ipynb)                                         | Full model fine-tune on a mobile game NPC dataset using Hugging Face Transformers and TRL |
| [[Gemma_3n]Finetuned_LoRA_Unsloth_on_Mental_Health_dataset.ipynb]([Gemma_3n]Finetuned_LoRA_Unsloth_on_Mental_Health_dataset.ipynb) | Finetuning of Gemma-3N (4B) model using [Unsloth](https://unsloth.ai/blog/gemma) on mental health counseling conversations to create an emotional first aid assistant, locally. |

## Alignment
| Notebook Name | Description |
| :------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [[Gemma_2]Aligning_DPO.ipynb]([Gemma_2]Aligning_DPO.ipynb)                                               | Demonstrate how to align a Gemma model using DPO (Direct Preference Optimization) with [Hugging Face TRL](https://huggingface.co/docs/trl/en/index).                                    |

## Evaluation
| Notebook Name | Description |
| :------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [[Gemma_2]evaluation.ipynb]([Gemma_2]evaluation.ipynb)                                                               | Demonstrate how to use Eleuther AI's LM evaluation harness to perform model evaluation on Gemma.                                                                                        |

## Agentic AI
| Notebook Name                                                                                     | Description                                                                                                                          |
| :------------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------------------------------------------------------------------- |
| [[Gemma_2]Agentic_AI.ipynb]([Gemma_2]Agentic_AI.ipynb)                                            | Demonstrate how to build an Agentic AI using Gemma 2.                                                                                |
| [[Gemma_2]Function_Calling_with_Groq_Langchain.ipynb]([Gemma_2]Function_Calling_with_Groq_Langchain.ipynb)  | Demonstrate how to create a simple agent using Langchain and groq using Gemma2.                                            |
| [[Gemma_3]Meme_Generator.ipynb]([Gemma_3]Meme_Generator.ipynb)                                    | Meme Generator using Gemma 3 4B IT model                                                                                             |
| [[Gemma_3]Function_Calling_Routing_and_Monitoring_using_Gemma_Google_Genai.ipynb]([Gemma_3]Function_Calling_Routing_and_Monitoring_using_Gemma_Google_Genai.ipynb)                             | Implement and Monitor Agentic RAG workflow                                                                                            |
| [[Gemma_3]Function_Calling_with_HF.ipynb]([Gemma_3]Function_Calling_with_HF.ipynb)                | Demonstrate how to use function calling with Gemma 3 using [Hugging Face](https://huggingface.co/).                                  |
| [[Gemma_3]Function_Calling_with_HF_document_summarizer.ipynb]([Gemma_3]Function_Calling_with_HF_document_summarizer.ipynb ) | Demonstrate how to build a document summarizer using function calling with Gemma 3 and Hugging Face.       |
| [[Gemma_3]Local_Agentic_RAG.ipynb]([Gemma_3]Local_Agentic_RAG.ipynb)                                           | Build local Agentic RAG without any external APIs using [FastEmbed](https://github.com/qdrant/fastembed), [Ollama- Gemma3](https://ollama.com/models), and [Qdrant Vector database](https://cloud.qdrant.io)                                                                                                      |