{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "FQRDCZcjtuGS" }, "source": [ "Copyright 2026 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "AgLUtgfYtpJ8" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "OzS4i5rqt1t7" }, "source": [ "# Priority and Flex Inference Tiers\n", "\n", "\n", "\n", "The Gemini API offers different `service_tiers` to help you manage cost and latency.\n", "\n", "The **Priority** and **Flex** tiers allow you to route background jobs to Flex and interactive jobs to Priority using standard synchronous endpoints. This eliminates the complexity of async job management while giving you the economic and performance benefits of specialized tiers:\n", "\n", "- **[Priority](https://ai.google.dev/gemini-api/docs/priority-inference) (`\"priority\"`):** Millisecond latency for critical apps (+75-100% cost). Traffic is strictly non-sheddable.\n", "- **[Flex](https://ai.google.dev/gemini-api/docs/flex-inference) (`\"flex\"`):** 1-15 min target latency for background tasks (-50% cost). Fully synchronous, but uses opportunistic compute.\n", "\n", "**In this notebook, you will learn:**\n", "1. Comparison of Priority and Flex\n", "2. How to use Priority tier\n", "3. How to use Flex tier\n", "4. How to adjust timeouts\n", "5. How to implement retries\n", "\n", "> **Note:** Inference tiers are paid only features. Flex is available for all tiers on the paid tier, and Priority is available for Tier 2 & 3. This notebook won't run with the Free Tier. (cf. [pricing](https://ai.google.dev/pricing) for more details)." ] }, { "cell_type": "markdown", "metadata": { "id": "gPGKO0mJ0Qgw" }, "source": [ "# Setup\n", "\n", "### Install SDK\n", "\n", "Install the SDK from [PyPI](https://github.com/googleapis/python-genai)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kWAqRRUy0bX4" }, "outputs": [], "source": [ "%pip install -U -q \"google-genai>=1.70.0\" # Minimum version 1.70 for service tiers" ] }, { "cell_type": "markdown", "metadata": { "id": "wM9RMu5u0j0c" }, "source": [ "### Setup your API key\n", "\n", "To run the following cell, your API key must be stored it in a Colab Secret named `GEMINI_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication ![image](https://storage.googleapis.com/generativeai-downloads/images/colab_icon16.png)](../quickstarts/Authentication.ipynb) for an example." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "ouhCFDWd0krv" }, "outputs": [], "source": [ "from google.colab import userdata\n", "\n", "GEMINI_API_KEY = userdata.get('GEMINI_API_KEY')" ] }, { "cell_type": "markdown", "metadata": { "id": "fc43_qJH0uhn" }, "source": [ "### Initialize SDK client\n", "\n", "Initialize a client with your API key:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "N6iv8_Cl0yNl" }, "outputs": [], "source": [ "from google import genai\n", "from google.genai import errors, types\n", "\n", "client = genai.Client(api_key=GEMINI_API_KEY)" ] }, { "cell_type": "markdown", "metadata": { "id": "nJLVGgNG02sU" }, "source": [ "### Choose a model\n", "\n", "Most Gemini 2.5 and Gemini 3 models support inference tiers. Refer to the documentation for more details:\n", "- [Flex supported models](https://ai.google.dev/gemini-api/docs/flex-inference#supported-models)\n", "- [Priority supported models](https://ai.google.dev/gemini-api/docs/priority-inference#supported-models)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "id": "zBpmImys03vG" }, "outputs": [], "source": [ "MODEL_ID = \"gemini-3-flash-preview\" # @param [\"gemini-3.1-flash-lite-preview\", \"gemini-3-flash-preview\", \"gemini-3.1-pro-preview\"] {\"allow-input\":true, isTemplate: true}" ] }, { "cell_type": "markdown", "metadata": { "id": "4481dc494c5f" }, "source": [ "## Priority and Flex comparison\n", "\n", "| Feature | Standard | Flex | Priority | Batch | Caching |\n", "| :--- | :--- | :--- | :--- | :--- | :--- |\n", "| **Pricing** | Full Price | 50% discount | 75% to 100% more than standard | 50% discount | 90% discount + Prorated token storage |\n", "| **Latency** | Seconds to minutes | Minutes (1–15 min target) | Seconds | Up to 24 hours | Faster time-to-first-token |\n", "| **Reliability** | High / Medium-high | Best-effort (Sheddable) | High (Non-sheddable) | High (for throughput) | N/A |\n", "| **Interface** | Synchronous | Synchronous | Synchronous | Asynchronous | Saved state |\n", "| **Best use case** | General application workflows | Non-urgent sequential chains | Production, user-facing apps | Massive datasets, offline evals | Recurring queries over same file |\n", "\n", "### Key benefits of Priority\n", "\n", "* **Low latency**: Designed for second response times for interactive,\n", "user-facing AI tools.\n", "* **High reliability**: Traffic is treated with the highest criticality and is\n", "strictly non-sheddable.\n", "* **Graceful degradation**: Traffic spikes exceeding dynamic limits are\n", "automatically downgraded to the Standard tier for processing instead of failing,\n", "preventing service outages.\n", "\n", "### Key benefits of Flex\n", "\n", "* **Cost efficiency**: Substantial savings for non-production evals, background agents, and data enrichment.\n", "* **Low friction**: No need to manage batch objects, job IDs, or polling; simply add a single parameter to your existing requests.\n", "* **Synchronous workflows**: Ideal for sequential API chains where the next request depends on the output of the previous one, making it more flexible than Batch for agentic workflows." ] }, { "cell_type": "markdown", "metadata": { "id": "rXMdVktv_NmR" }, "source": [ "## Priority inference\n", "\n", "The Gemini Priority API is a premium inference tier designed for\n", "business-critical workloads that require lower latency and the highest\n", "reliability at a premium price point. Priority tier traffic is prioritized above\n", "standard API and Flex tier traffic.\n", "\n", "### How Priority works\n", "\n", "Priority inference routes requests to high-criticality compute queues, offering\n", "predictable, fast performance for user-facing applications. Its primary\n", "mechanism is a graceful server-side downgrade to standard processing for traffic\n", "that exceeds dynamic limits, ensuring application stability instead of failing\n", "the request.\n", "\n", "### Priority rate limits\n", "\n", "Priority consumption holds its own rate\n", "limits even though consumption is counted towards overall interactive traffic\n", "rate limits. **Default rate limits are: 0.3x the [standard rate limit](https://aistudio.google.com/rate-limit) for each model and tier**.\n", "\n", "### How to use Priority\n", "\n", "To use the Priority tier, set the `service_tier` field in the request body to\n", "`\"priority\"`. The default tier is standard if the field is omitted." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "id": "TjM_c19ftxl0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sky appears blue because of a phenomenon called **Rayleigh scattering**.\n", "\n", "Here is the step-by-step breakdown of how it works:\n", "\n", "### 1. Sunlight is a mix of all colors\n", "Although sunlight looks white to us, it is actually composed of all the colors of the rainbow (red, orange, yellow, green, blue, indigo, and violet). Light travels as waves, and each color has a different wavelength:\n", "* **Red light** has long, lazy wavelengths.\n", "* **Blue and violet light** have short, choppy wavelengths.\n", "\n", "### 2. The atmosphere acts like an obstacle course\n", "Earth’s atmosphere is filled with gases (mostly nitrogen and oxygen) and tiny particles. When sunlight hits the atmosphere, it strikes these gas molecules.\n", "\n", "### 3. Scattering occurs\n", "Because blue light travels in shorter, smaller waves, it crashes into the gas molecules more frequently than the longer red waves do. When the blue light hits these molecules, it gets scattered in every direction. \n", "\n", "As you look up, your eyes detect this scattered blue light coming from every part of the sky.\n", "\n", "### Why isn't the sky violet?\n", "Violet light actually has an even shorter wavelength than blue light, so it scatters even more. However, we see the sky as blue for two reasons:\n", "1. **The Sun:** The Sun emits much more blue light than violet light.\n", "2. **Human Biology:** Our eyes are much more sensitive to blue than they are to violet. Our brains interpret the mixture of scattered light as a pale blue.\n", "\n", "### What about sunsets?\n", "When the sun is setting or rising, it is lower on the horizon. The sunlight has to travel through much more of the Earth's atmosphere to reach your eyes. By the time the light gets to you, most of the blue light has been scattered away completely, leaving only the longer wavelengths—the reds, oranges, and pinks—to pass through.\n" ] } ], "source": [ "try:\n", " response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Why is the sky blue?\",\n", " config={'service_tier': 'priority'},\n", " )\n", "\n", " # Validate for graceful downgrade\n", " if response.sdk_http_response.headers.get('x-gemini-service-tier') == \"standard\":\n", " print(\"Warning: Priority limit exceeded, processed at Standard tier.\")\n", "\n", " print(response.text)\n", "\n", "except errors.APIError as e:\n", " print(e.code, e.message)\n", "except Exception as e:\n", " print(f\"Error during API call: {e}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "0UOeWnk9cOWi" }, "source": [ "You'll find the service tier in the headers:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "id": "cswc3H7kcMNB" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "priority\n" ] } ], "source": [ "print(response.sdk_http_response.headers.get('x-gemini-service-tier'))" ] }, { "cell_type": "markdown", "metadata": { "id": "mHcY2-RQzvzW" }, "source": [ "## Flex inference\n", "\n", "The Gemini Flex API is an inference tier that offers a 50% cost reduction\n", "compared to standard rates, in exchange for variable latency and best-effort\n", "availability. It's designed for latency-tolerant workloads that require\n", "synchronous processing but don't need the real-time performance of the standard\n", "API.\n", "\n", "### How Flex works\n", "\n", "Gemini Flex inference bridges the gap between the standard API and the 24-hour\n", "turnaround of the [Batch API](https://ai.google.dev/gemini-api/docs/batch-api). It utilizes off-peak,\n", "\"sheddable\" compute capacity to provide a cost-effective solution for background\n", "tasks and sequential workflows.\n", "\n", "### How to use Flex\n", "\n", "To use the Flex tier, specify the `service_tier` as `\"flex\"` in the\n", "request body. By default, requests use the standard tier if this field is\n", "omitted.\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "T4FthF5Sz-vr" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The short answer is that the sky is blue because of a phenomenon called **Rayleigh scattering**.\n", "\n", "Here is the step-by-step breakdown of why it happens:\n", "\n", "### 1. Sunlight is made of all colors\n", "Although sunlight looks white to us, it is actually composed of all the colors of the rainbow (red, orange, yellow, green, blue, indigo, and violet). Light travels as waves, and each color has a different wavelength:\n", "* **Red light** has long, lazy wavelengths.\n", "* **Blue and violet light** have short, choppy wavelengths.\n", "\n", "### 2. The Atmosphere is an obstacle course\n", "Earth’s atmosphere is filled with gases (mostly nitrogen and oxygen). As sunlight reaches the atmosphere, it strikes the molecules of these gases and scatters in every direction.\n", "\n", "### 3. Blue light scatters the most\n", "Because blue light travels in shorter, smaller waves, it hits the gas molecules more frequently and is scattered much more strongly than the other colors. This \"scattered\" blue light is what your eyes see coming from every part of the sky during the day.\n", "\n", "### So, why isn't the sky violet?\n", "If violet light has an even shorter wavelength than blue light, it should technically scatter even more, making the sky look violet. However, the sky isn’t violet for two main reasons:\n", "1. **The Sun:** The Sun emits much more blue light than violet light.\n", "2. **Human Eyes:** Our eyes are much more sensitive to blue light than violet light. Our brains essentially filter out the violet and interpret the combination of scattered light as pale blue.\n", "\n", "### What about sunsets?\n", "When the sun is setting or rising, it is lower on the horizon. The light has to travel through a much thicker layer of the atmosphere to reach your eyes. By the time the light gets to you, most of the blue light has been scattered away, leaving only the longer wavelengths—the reds, oranges, and pinks—to pass through and reach your eyes.\n" ] } ], "source": [ "try:\n", " response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Why is the sky blue?\",\n", " config={'service_tier': 'flex'},\n", " )\n", "\n", " print(response.text)\n", "except errors.APIError as e:\n", " print(e.code, e.message)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "id": "lmUBlWpOcZYZ" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "flex\n" ] } ], "source": [ "print(response.sdk_http_response.headers.get('x-gemini-service-tier'))" ] }, { "cell_type": "markdown", "metadata": { "id": "zKGPbjAbD4dB" }, "source": [ "### Adjusting timeout windows\n", "\n", "You can configure per-request timeouts for the REST API and client libraries,\n", "and global timeouts only when using the client libraries.\n", "\n", "Always ensure your client-side timeout covers the intended server patience\n", "window (e.g., 600s+ for Flex wait queues). The SDKs expect timeout values in\n", "milliseconds.\n", "\n", "#### Per-request timeouts\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "id": "DSgnxLPxEMvo" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sky appears blue due to a phenomenon called **Rayleigh scattering**. \n", "\n", "Here is the step-by-step breakdown of how it works:\n", "\n", "### 1. Sunlight is a mix of all colors\n", "Although sunlight looks white to us, it is actually made up of all the colors of the rainbow (red, orange, yellow, green, blue, indigo, and violet). Light travels as waves, and each color has a different wavelength:\n", "* **Red light** has long, lazy, stretched-out wavelengths.\n", "* **Blue and violet light** have short, choppy, frequent wavelengths.\n", "\n", "### 2. The Atmosphere acts like an obstacle course\n", "Earth’s atmosphere is filled with gases (mostly nitrogen and oxygen). When sunlight hits the atmosphere, it collides with these gas molecules.\n", "\n", "### 3. Scattering happens\n", "Because blue light travels in shorter, smaller waves, it is much more likely to strike these gas molecules and get **scattered** in every direction. Red and yellow light, with their longer wavelengths, pass through the atmosphere mostly uninterrupted.\n", "\n", "So, when you look up during the day, your eyes are catching this scattered blue light coming from every part of the sky.\n", "\n", "***\n", "\n", "### Common Questions\n", "\n", "**Why isn't the sky violet?**\n", "Violet light has an even shorter wavelength than blue light and is scattered *more* strongly. However, we see the sky as blue for two reasons:\n", "1. The sun emits much more blue light than violet light.\n", "2. Human eyes are much more sensitive to blue than violet.\n", "\n", "**Why are sunsets red?**\n", "When the sun is setting or rising, it is lower on the horizon. The sunlight has to travel through much more of the Earth's atmosphere to reach your eyes. By the time the light gets to you, most of the blue light has been scattered away completely, leaving only the longer wavelengths—the reds, oranges, and pinks—to make it through.\n" ] } ], "source": [ "try:\n", " response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"why is the sky blue?\",\n", " config={\n", " \"service_tier\": \"flex\",\n", " \"http_options\": {\"timeout\": 900000}\n", " },\n", " )\n", " print(response.text)\n", "except errors.APIError as e:\n", " print(e.code, e.message)" ] }, { "cell_type": "markdown", "metadata": { "id": "3LoIAocCdaS3" }, "source": [ "Example with streaming (timeout applies to each chunk or overall, depending on implementation)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "id": "wC5S8ntMdY49" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Here are five original sci-fi movie concepts, ranging from psychological thrillers to high-concept space operas:\n", "\n", "### 1. The Echo Archive\n", "**The Concept:** In a future where memories can be digitised and \"donated\" to a collective global library, a professional \"Memory Editor\" discovers a series of memories that shouldn’t exist. They depict a secret history of the world that conflicts with official records. \n", "**The Hook:** To uncover the truth, the protagonist must navigate the fractured consciousness of strangers, but they soon realize that every time they access a forbidden memory, a piece of their own personality is permanently overwritten.\n", "\n", "### 2. Orbital Decay\n", "**The Concept:** Earth has become a prison planet, entirely encased in a sophisticated, automated satellite grid that prevents anything from leaving or entering. A low-level technician tasked with maintaining the \"Kill-Switch\" satellites discovers that the grid isn't keeping the world's threats *in*—it’s protecting the planet from something massive that has been lurking in the shadows of the moon for centuries.\n", "**The Hook:** A claustrophobic \"ticking clock\" thriller where the protagonist must decide whether to deactivate the grid—potentially freeing humanity but alerting the cosmic entity to their presence.\n", "\n", "### 3. The Symbiosis Protocol\n", "**The Concept:** After a failed terraforming mission, humanity survives on a hostile exoplanet by using neural-link technology to bond with indigenous, plant-like organisms. These organisms provide oxygen and nutrients in exchange for the human host's sensory experiences. \n", "**The Hook:** A young colonist realizes the plants are not just living off their hosts, but are slowly stitching their own consciousnesses into the human neural network, creating a collective hive-mind that intends to \"reclaim\" Earth as the ultimate fertilizer.\n", "\n", "### 4. Chronos Debt\n", "**The Concept:** Time travel exists, but it’s governed by the laws of thermodynamics: you cannot travel to the past without \"paying\" for it with an equal amount of your own future life force. As a result, only the ultra-wealthy can afford to undo their mistakes.\n", "**The Hook:** A \"Time Bounty Hunter\" is tasked with tracking down people who have defaulted on their time-debt. The conflict kicks off when they are hired to hunt down a target who hasn’t broken the law, but has instead found a way to \"bankrupt\" time itself, threatening to collapse the present.\n", "\n", "### 5. Silent Signal\n", "**The Concept:** SETI finally receives a signal from deep space, but it’s not a message—it’s a data packet containing the schematics for an object that, when built, does absolutely nothing. \n", "**The Hook:** As teams of engineers around the world frantically build different components of this mysterious machine, they realize the object is a \"psychological anchor.\" It doesn't perform a physical function; it simply alters the human perception of reality, causing those who build it to stop seeing the world as three-dimensional. It is a slow-motion, global alien invasion that doesn't use weapons, but perspective." ] } ], "source": [ "try:\n", " response = client.models.generate_content_stream(\n", " model=MODEL_ID,\n", " contents=[\"List 5 ideas for a sci-fi movie.\"],\n", " config={\n", " \"service_tier\": \"flex\",\n", " \"http_options\": {\"timeout\": 60000}\n", " },\n", " # Per-request timeout for the streaming operation\n", " )\n", " for chunk in response:\n", " print(chunk.text, end=\"\")\n", "except errors.APIError as e:\n", " print(e.code, e.message)\n", "except Exception as e:\n", " print(f\"An error occurred during streaming: {e}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "yIjwtwbzEJ_X" }, "source": [ "#### Global timeouts\n", "\n", "If you want all API calls made through a specific `genai.Client` instance\n", "(client libraries only) to have a default timeout, you can configure this when\n", "initializing the client using `http_options` and `genai.types.HttpOptions`." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "id": "eSK9T_6FFvck" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The history of AI since 2000 is a transition from symbolic, rule-based systems to probabilistic, data-driven \"deep learning.\" This evolution can be broken down into four distinct eras.\n", "\n", "### 1. The Era of Foundation and Narrow AI (2000–2010)\n", "At the turn of the millennium, AI research was focused on specialized tasks rather than general intelligence.\n", "* **The Rise of Machine Learning:** Researchers moved away from \"expert systems\" (where humans hard-coded rules) toward systems that learned from data. Statistical methods, such as Support Vector Machines (SVMs), dominated.\n", "* **Milestones:** In 2005, a Stanford robot won the DARPA Grand Challenge by navigating a desert course, proving that autonomous navigation was possible. \n", "* **The Big Data Catalyst:** The explosion of the internet began providing the massive datasets required to train more sophisticated models.\n", "\n", "### 2. The Deep Learning Breakthrough (2011–2015)\n", "This period marked the most significant shift in AI history, triggered by the revival of **Neural Networks**—biologically inspired algorithms that had been largely ignored for decades due to lack of computing power.\n", "* **The GPU Revolution:** Researchers discovered that Graphics Processing Units (GPUs)—designed for video games—were perfectly suited for the parallel computations required by deep learning.\n", "* **ImageNet and AlexNet (2012):** A deep neural network (AlexNet) crushed the competition in the ImageNet image-recognition contest. This proved that \"Deep Learning\" could outperform every other statistical method.\n", "* **Corporate Land Grab:** Realizing the power of AI, tech giants began aggressive acquisitions. Google purchased DeepMind in 2014, and other giants began hoarding talent and data.\n", "\n", "### 3. Reinforcement Learning and Generative Milestones (2016–2020)\n", "With deep learning established, AI began mastering tasks that were once thought to be purely human or impossible for computers.\n", "* **AlphaGo (2016):** DeepMind’s AlphaGo defeated the world champion at the game of Go. This was a \"Sputnik moment\" for AI, as Go relies on intuition and long-term strategy rather than simple brute-force calculation.\n", "* **Transformers (2017):** Google researchers published the paper *\"Attention Is All You Need,\"* introducing the **Transformer architecture**. This allowed models to process data in parallel and understand the context of relationships between words over long distances. It is the architectural foundation of everything from ChatGPT to modern translation tools.\n", "* **GANs:** Generative Adversarial Networks (GANs) allowed AI to start creating realistic images and audio, sparking early debates about \"deepfakes.\"\n", "\n", "### 4. The Era of Generative AI (2021–Present)\n", "The last few years have moved AI from the laboratory into the daily lives of billions of people.\n", "* **The Scaling Laws:** Researchers discovered that simply making models larger (more parameters) and feeding them more data led to \"emergent properties\"—skills the AI wasn't explicitly trained for, such as coding, reasoning, and multi-language fluency.\n", "* **The ChatGPT Explosion (2022):** OpenAI released ChatGPT, turning a sophisticated Large Language Model (LLM) into a consumer-friendly chatbot. This triggered an unprecedented global arms race between Microsoft, Google, Meta, and Anthropic.\n", "* **Multimodality:** Modern systems are no longer just text-based. Models like GPT-4o, Claude 3.5, and Midjourney can now \"see,\" \"hear,\" and \"speak,\" synthesizing information across text, images, audio, and video simultaneously.\n", "\n", "### Summary of Trends\n", "* **From Specific to General:** We have moved from AI that does one thing (play chess) to AI that does almost anything (write code, compose poetry, analyze medical data).\n", "* **The Compute Wall:** Progress is currently defined by the availability of high-end chips (Nvidia) and the enormous energy required to train \"frontier models.\"\n", "* **Ethical Pivot:** The industry has moved from being purely focused on capabilities to a major focus on \"Alignment\" and \"Safety,\" as society grapples with the risks of hallucinations, job displacement, and the potential for misuse.\n", "\n", "**In short:** The last 24 years have seen AI transform from a niche academic field into the foundational infrastructure of the modern digital economy.\n", "Machine learning is a field of artificial intelligence that enables computers to learn from data and improve at tasks without being explicitly programmed for each step.\n" ] } ], "source": [ "global_timeout_ms = 120000\n", "\n", "client_with_global_timeout = genai.Client(\n", " api_key=GEMINI_API_KEY,\n", " http_options=types.HttpOptions(timeout=global_timeout_ms)\n", ")\n", "\n", "try:\n", " # Calling generate_content using global timeout...\n", " response = client_with_global_timeout.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Summarize the history of AI development since 2000.\",\n", " config={\"service_tier\": \"flex\"},\n", " )\n", " print(response.text)\n", "\n", " # A per-request timeout will *override* the global timeout for that specific call.\n", " shorter_timeout = 30000\n", " response = client_with_global_timeout.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Provide a very brief definition of machine learning.\",\n", " config={\n", " \"service_tier\": \"flex\",\n", " \"http_options\":{\"timeout\": shorter_timeout}\n", " } # Overrides the global timeout\n", " )\n", "\n", " print(response.text)\n", "\n", "except TimeoutError:\n", " print(\n", " f\"A GenerateContent call timed out. Check if the global or per-request timeout was exceeded.\"\n", " )\n", "except errors.APIError as e:\n", " print(e.code, e.message)\n", "except Exception as e:\n", " print(f\"An error occurred: {e}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "PlPt3EmqF0T8" }, "source": [ "### Implementing retries\n", "\n", "Because Flex is sheddable and fails with 503 errors, here is an example of\n", "optionally implementing retry logic to continue with failed requests:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "id": "20371gBdFzYc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Machine learning is a field of artificial intelligence that enables computers to learn from data and improve at tasks without being explicitly programmed for them.\n" ] } ], "source": [ "import time\n", "\n", "\n", "def call_with_retry(max_retries=3, base_delay=5):\n", " for attempt in range(max_retries):\n", " try:\n", " return client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Provide a very brief definition of machine learning.\",\n", " config={\"service_tier\": \"flex\"},\n", " )\n", " except errors.APIError as e:\n", " # Check for 503 Service Unavailable or 429 Rate Limits\n", " print(e.code)\n", " if attempt < max_retries - 1:\n", " delay = base_delay * (2**attempt) # Exponential Backoff\n", " print(f\"Flex busy, retrying in {delay}s...\")\n", " time.sleep(delay)\n", " else:\n", " # Fallback to standard on last strike\n", " print(\"Flex exhausted, falling back to Standard...\")\n", " return client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Provide a very brief definition of machine learning.\",\n", " )\n", " except Exception as e:\n", " print(f\"An error occurred: {e}\")\n", "\n", "# Usage\n", "response = call_with_retry()\n", "print(response.text)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "mcUQSr2RK3tY" }, "source": [ "## Further resources\n", "\n", "To learn more, see the following resources:\n", "\n", "- [Optimization docs](https://ai.google.dev/gemini-api/docs/optimization)\n", "- [Priority docs](https://ai.google.dev/gemini-api/docs/priority-inference)\n", "- [Flex docs](https://ai.google.dev/gemini-api/docs/flex-inference)" ] } ], "metadata": { "colab": { "name": "Inference_tiers.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }