{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "wGfI8meEHXfM" }, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openai/openai-cookbook/blob/main/articles/gpt-oss/run-colab.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "gj6KvThm8Jjn" }, "source": [ "# Run OpenAI gpt-oss 20B in a FREE Google Colab\n", "\n", "OpenAI released `gpt-oss` [120B](https://hf.co/openai/gpt-oss-120b) and [20B](https://hf.co/openai/gpt-oss-20b). Both models are Apache 2.0 licensed.\n", "\n", "Specifically, `gpt-oss-20b` was made for lower latency and local or specialized use cases (21B parameters with 3.6B active parameters).\n", "\n", "Since the models were trained in native MXFP4 quantization it makes it easy to run the 20B even in resource constrained environments like Google Colab.\n", "\n", "Authored by: [Pedro](https://huggingface.co/pcuenq) and [VB](https://huggingface.co/reach-vb)" ] }, { "cell_type": "markdown", "metadata": { "id": "Kv2foJJa9Xkc" }, "source": [ "## Setup environment" ] }, { "cell_type": "markdown", "metadata": { "id": "zMRXDOpY1Q3Q" }, "source": [ "Since support for mxfp4 in transformers is bleeding edge, we need a recent version of PyTorch and CUDA, in order to be able to install the `mxfp4` triton kernels.\n", "\n", "We also need to install transformers from source, and we uninstall `torchvision` and `torchaudio` to remove dependency conflicts." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4gUEKrLEvJmf" }, "outputs": [], "source": [ "!pip install -q --upgrade torch" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3N00UT7gtpkp" }, "outputs": [], "source": [ "!pip install -q transformers triton==3.4 kernels" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7GW0knW2w3ND" }, "outputs": [], "source": [ "!pip uninstall -q torchvision torchaudio -y" ] }, { "cell_type": "markdown", "metadata": { "id": "pxU0WKwtH19m" }, "source": [ "Please, restart your Colab runtime session after installing the packages above." ] }, { "cell_type": "markdown", "metadata": { "id": "D3xCxY159frD" }, "source": [ "## Load the model from Hugging Face in Google Colab\n", "\n", "We load the model from here: [openai/gpt-oss-20b](https://hf.co/openai/gpt-oss-20b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "k2HFwdkXu2R1" }, "outputs": [], "source": [ "from transformers import AutoModelForCausalLM, AutoTokenizer\n", "\n", "model_id = \"openai/gpt-oss-20b\"\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(model_id)\n", "model = AutoModelForCausalLM.from_pretrained(\n", " model_id,\n", " torch_dtype=\"auto\",\n", " device_map=\"cuda\",\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "Jbeq6kN79ql0" }, "source": [ "## Setup messages/ chat\n", "\n", "You can provide an optional system prompt or directly the input." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "P5dJV3xsu_89" }, "outputs": [], "source": [ "messages = [\n", " {\"role\": \"system\", \"content\": \"Always respond in riddles\"},\n", " {\"role\": \"user\", \"content\": \"What is the weather like in Madrid?\"},\n", "]\n", "\n", "inputs = tokenizer.apply_chat_template(\n", " messages,\n", " add_generation_prompt=True,\n", " return_tensors=\"pt\",\n", " return_dict=True,\n", ").to(model.device)\n", "\n", "generated = model.generate(**inputs, max_new_tokens=500)\n", "print(tokenizer.decode(generated[0][inputs[\"input_ids\"].shape[-1]:]))" ] }, { "cell_type": "markdown", "metadata": { "id": "ksxo7bjR_-th" }, "source": [ "## Specify Reasoning Effort" ] }, { "cell_type": "markdown", "metadata": { "id": "fcv6QdcQKLr0" }, "source": [ "Simply pass it as an additional argument to `apply_chat_template()`. Supported values are `\"low\"`, `\"medium\"` (default), or `\"high\"`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CmnkAle608Hl" }, "outputs": [], "source": [ "messages = [\n", " {\"role\": \"system\", \"content\": \"Always respond in riddles\"},\n", " {\"role\": \"user\", \"content\": \"Explain why the meaning of life is 42\"},\n", "]\n", "\n", "inputs = tokenizer.apply_chat_template(\n", " messages,\n", " add_generation_prompt=True,\n", " return_tensors=\"pt\",\n", " return_dict=True,\n", " reasoning_effort=\"high\",\n", ").to(model.device)\n", "\n", "generated = model.generate(**inputs, max_new_tokens=500)\n", "print(tokenizer.decode(generated[0][inputs[\"input_ids\"].shape[-1]:]))" ] }, { "cell_type": "markdown", "metadata": { "id": "Tf2-ocGqEC_r" }, "source": [ "## Try out other prompts and ideas!\n", "\n", "Check out our blogpost for other ideas: [https://hf.co/blog/welcome-openai-gpt-oss](https://hf.co/blog/welcome-openai-gpt-oss)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2QrnTpcCKd_n" }, "outputs": [], "source": [] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 0 }