{ "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.11.0" } }, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duoan/TorchCode/blob/master/templates/13_gpt2_block.ipynb)\n", "\n", "# ๐Ÿ”ด Hard: GPT-2 Transformer Block\n", "\n", "Implement a full **GPT-2 style Transformer block** โ€” combining everything you've learned.\n", "\n", "### Architecture (Pre-Norm)\n", "```\n", "x = x + causal_self_attention(ln1(x))\n", "x = x + mlp(ln2(x))\n", "```\n", "\n", "### Signature\n", "```python\n", "class GPT2Block(nn.Module):\n", " def __init__(self, d_model: int, num_heads: int): ...\n", " def forward(self, x: torch.Tensor) -> torch.Tensor: ...\n", "```\n", "\n", "### Requirements\n", "- Inherit from `nn.Module`\n", "- `self.ln1`, `self.ln2`: `nn.LayerNorm(d_model)`\n", "- `self.W_q`, `self.W_k`, `self.W_v`, `self.W_o`: `nn.Linear` for attention\n", "- `self.mlp`: `nn.Sequential(Linear(d, 4d), GELU(), Linear(4d, d))`\n", "- Attention must be **causal** (mask future positions)\n", "- Pre-norm architecture (LayerNorm *before* attention and MLP)\n", "- Residual connections around both attention and MLP" ], "outputs": [] }, { "cell_type": "code", "metadata": {}, "source": [ "# Install torch-judge in Colab (no-op in JupyterLab/Docker)\n", "try:\n", " import google.colab\n", " get_ipython().run_line_magic('pip', 'install -q torch-judge')\n", "except ImportError:\n", " pass\n" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "metadata": {}, "source": [ "import torch\n", "import torch.nn as nn\n", "import math" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "metadata": {}, "source": [ "# โœ๏ธ YOUR IMPLEMENTATION HERE\n", "\n", "class GPT2Block(nn.Module):\n", " def __init__(self, d_model, num_heads):\n", " super().__init__()\n", " pass # Initialize layers\n", "\n", " def forward(self, x):\n", " pass # Pre-norm + causal attention + MLP with residuals" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "metadata": {}, "source": [ "# ๐Ÿงช Debug\n", "torch.manual_seed(0)\n", "block = GPT2Block(d_model=64, num_heads=4)\n", "x = torch.randn(2, 8, 64)\n", "out = block(x)\n", "print(\"Output shape:\", out.shape) # (2, 8, 64)\n", "print(\"Is nn.Module?\", isinstance(block, nn.Module))\n", "print(\"Params:\", sum(p.numel() for p in block.parameters()))" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "metadata": {}, "source": [ "from torch_judge import check\n", "check('gpt2_block')" ], "outputs": [], "execution_count": null } ] }