--- name: torchcode-pytorch-interview-practice description: LeetCode-style PyTorch interview practice environment with auto-grading for implementing softmax, attention, GPT-2 and more from scratch. triggers: - implement pytorch operator from scratch - practice pytorch interview questions - torchcode problem - implement softmax layernorm attention from scratch - pytorch coding interview prep - run torchcode judge - check my pytorch implementation - implement transformer components from scratch --- # TorchCode — PyTorch Interview Practice > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. TorchCode is a Jupyter-based, self-hosted coding practice environment for ML engineers. It provides 40 curated problems covering PyTorch fundamentals and architectures (softmax, LayerNorm, MultiHeadAttention, GPT-2, etc.) with an automated judge that gives instant pass/fail feedback, gradient verification, and timing — like LeetCode but for tensors. --- ## Installation & Setup ### Option 1: Online (zero install) - **Hugging Face Spaces**: https://huggingface.co/spaces/duoan/TorchCode - **Google Colab**: Every notebook has an "Open in Colab" badge ### Option 2: pip (for use inside Colab or existing environment) ```bash pip install torch-judge ``` ### Option 3: Docker (pre-built image) ```bash docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latest # Open http://localhost:8888 ``` ### Option 4: Build locally ```bash git clone https://github.com/duoan/TorchCode.git cd TorchCode make run # Open http://localhost:8888 ``` `make run` auto-detects Docker or Podman and falls back to local build if the registry image is unavailable (common on Apple Silicon/arm64). --- ## Judge API The `torch_judge` package provides the core API used in every notebook. ```python from torch_judge import check, status, hint, reset_progress # List all 40 problems and your progress status() # Run tests for a specific problem check("relu") check("softmax") check("layernorm") check("attention") check("gpt2") # Get a hint without spoilers hint("softmax") # Reset progress for a problem reset_progress("relu") ``` ### `check()` return values - Colored pass/fail per test case - Correctness check against PyTorch reference implementation - Gradient verification (autograd compatibility) - Timing measurement --- ## Problem Set Overview ### Difficulty levels: Easy → Medium → Hard | # | Problem | Key Concepts | |---|---------|--------------| | 1 | ReLU | Activation functions, element-wise ops | | 2 | Softmax | Numerical stability, exp/log tricks | | 3 | Linear Layer | `y = xW^T + b`, Kaiming init, `nn.Parameter` | | 4 | LayerNorm | Normalization, affine transform | | 5 | Self-Attention | QKV projections, scaled dot-product | | 6 | Multi-Head Attention | Head splitting, concatenation | | 7 | BatchNorm | Batch vs layer statistics, train/eval | | 8 | RMSNorm | LLaMA-style norm | | 16 | Cross-Entropy Loss | Log-softmax, logsumexp trick | | 17 | Dropout | Train/eval mode, inverted scaling | | 18 | Embedding | Lookup table, `weight[indices]` | | 19 | GELU | `torch.erf`, Gaussian error linear unit | | 20 | Kaiming Init | `std = sqrt(2/fan_in)` | | 21 | Gradient Clipping | Norm-based clipping | | 31 | Gradient Accumulation | Micro-batching, loss scaling | | 40 | Linear Regression | Normal equation, GD from scratch | --- ## Working Through a Problem Each problem notebook has the same structure: ``` templates/ 01_relu.ipynb # Blank template — your workspace 02_softmax.ipynb ... solutions/ 01_relu.ipynb # Reference solution (study after attempt) ``` ### Typical notebook workflow ```python # Cell 1: Import judge from torch_judge import check, hint import torch import torch.nn as nn # Cell 2: Your implementation def my_relu(x: torch.Tensor) -> torch.Tensor: # TODO: implement ReLU without using torch.relu or F.relu raise NotImplementedError # Cell 3: Run the judge check("relu") ``` --- ## Real Implementation Examples ### ReLU (Problem 1 — Easy) ```python def my_relu(x: torch.Tensor) -> torch.Tensor: return torch.clamp(x, min=0) # Alternative: return x * (x > 0) # Alternative: return torch.where(x > 0, x, torch.zeros_like(x)) ``` ### Softmax (Problem 2 — Easy, numerically stable) ```python def my_softmax(x: torch.Tensor, dim: int = -1) -> torch.Tensor: # Subtract max for numerical stability (prevents overflow) x_max = x.max(dim=dim, keepdim=True).values x_shifted = x - x_max exp_x = torch.exp(x_shifted) return exp_x / exp_x.sum(dim=dim, keepdim=True) ``` ### LayerNorm (Problem 4 — Medium) ```python def my_layer_norm( x: torch.Tensor, weight: torch.Tensor, # gamma (scale) bias: torch.Tensor, # beta (shift) eps: float = 1e-5 ) -> torch.Tensor: mean = x.mean(dim=-1, keepdim=True) var = x.var(dim=-1, keepdim=True, unbiased=False) x_norm = (x - mean) / torch.sqrt(var + eps) return weight * x_norm + bias ``` ### RMSNorm (Problem 8 — Medium, LLaMA-style) ```python def rms_norm(x: torch.Tensor, weight: torch.Tensor, eps: float = 1e-6) -> torch.Tensor: rms = torch.sqrt((x ** 2).mean(dim=-1, keepdim=True) + eps) return (x / rms) * weight ``` ### Scaled Dot-Product Self-Attention (Problem 5 — Medium) ```python import torch.nn.functional as F import math def scaled_dot_product_attention( Q: torch.Tensor, # (B, heads, T, head_dim) K: torch.Tensor, V: torch.Tensor, mask: torch.Tensor = None ) -> torch.Tensor: d_k = Q.size(-1) scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k) if mask is not None: scores = scores.masked_fill(mask == 0, float('-inf')) attn_weights = F.softmax(scores, dim=-1) return torch.matmul(attn_weights, V) ``` ### Multi-Head Attention (Problem 6 — Medium) ```python class MyMultiHeadAttention(nn.Module): def __init__(self, d_model: int, num_heads: int): super().__init__() assert d_model % num_heads == 0 self.num_heads = num_heads self.head_dim = d_model // num_heads self.d_model = d_model self.W_q = nn.Linear(d_model, d_model) self.W_k = nn.Linear(d_model, d_model) self.W_v = nn.Linear(d_model, d_model) self.W_o = nn.Linear(d_model, d_model) def forward(self, x: torch.Tensor, mask: torch.Tensor = None) -> torch.Tensor: B, T, C = x.shape def split_heads(t): return t.view(B, T, self.num_heads, self.head_dim).transpose(1, 2) Q = split_heads(self.W_q(x)) K = split_heads(self.W_k(x)) V = split_heads(self.W_v(x)) attn_out = scaled_dot_product_attention(Q, K, V, mask) # (B, heads, T, head_dim) -> (B, T, d_model) attn_out = attn_out.transpose(1, 2).contiguous().view(B, T, C) return self.W_o(attn_out) ``` ### Cross-Entropy Loss (Problem 16 — Easy) ```python def cross_entropy_loss(logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor: # logits: (B, C), targets: (B,) with class indices # Use logsumexp trick for numerical stability log_sum_exp = torch.logsumexp(logits, dim=-1) # (B,) log_probs = logits[torch.arange(len(targets)), targets] # (B,) return (log_sum_exp - log_probs).mean() ``` ### Dropout (Problem 17 — Easy) ```python class MyDropout(nn.Module): def __init__(self, p: float = 0.5): super().__init__() self.p = p def forward(self, x: torch.Tensor) -> torch.Tensor: if not self.training or self.p == 0: return x mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p)) return x * mask / (1 - self.p) # inverted scaling ``` ### Kaiming Init (Problem 20 — Easy) ```python def kaiming_init(weight: torch.Tensor) -> torch.Tensor: fan_in = weight.size(1) std = math.sqrt(2.0 / fan_in) with torch.no_grad(): weight.normal_(0, std) return weight ``` ### Gradient Clipping (Problem 21 — Easy) ```python def clip_grad_norm(parameters, max_norm: float) -> float: params = [p for p in parameters if p.grad is not None] total_norm = torch.sqrt(sum(p.grad.data.norm() ** 2 for p in params)) clip_coef = max_norm / (total_norm + 1e-6) if clip_coef < 1: for p in params: p.grad.data.mul_(clip_coef) return total_norm.item() ``` ### Gradient Accumulation (Problem 31 — Easy) ```python def train_with_accumulation(model, optimizer, dataloader, accumulation_steps=4): optimizer.zero_grad() for i, (inputs, targets) in enumerate(dataloader): outputs = model(inputs) loss = criterion(outputs, targets) / accumulation_steps # scale loss loss.backward() if (i + 1) % accumulation_steps == 0: optimizer.step() optimizer.zero_grad() ``` --- ## Common Patterns & Tips ### Numerical stability pattern Always subtract the max before `exp()`: ```python # WRONG — can overflow for large values exp_x = torch.exp(x) # CORRECT — numerically stable exp_x = torch.exp(x - x.max(dim=-1, keepdim=True).values) ``` ### Causal attention mask (for GPT-style models) ```python def causal_mask(T: int, device) -> torch.Tensor: return torch.tril(torch.ones(T, T, device=device)).unsqueeze(0).unsqueeze(0) ``` ### nn.Module skeleton (used in many problems) ```python class MyLayer(nn.Module): def __init__(self, ...): super().__init__() self.weight = nn.Parameter(torch.empty(...)) self.bias = nn.Parameter(torch.zeros(...)) self._init_weights() def _init_weights(self): nn.init.kaiming_uniform_(self.weight) def forward(self, x: torch.Tensor) -> torch.Tensor: ... ``` ### Train vs eval mode pattern ```python def forward(self, x): if self.training: # use batch statistics mean = x.mean(dim=0) var = x.var(dim=0, unbiased=False) # update running stats self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var else: # use running statistics mean = self.running_mean var = self.running_var return (x - mean) / torch.sqrt(var + self.eps) * self.weight + self.bias ``` --- ## Project Structure ``` TorchCode/ ├── templates/ # Blank notebooks for each problem (your workspace) │ ├── 01_relu.ipynb │ ├── 02_softmax.ipynb │ └── ... ├── solutions/ # Reference solutions (study after attempting) │ └── ... ├── torch_judge/ # Auto-grading package │ ├── __init__.py # check(), status(), hint(), reset_progress() │ └── tasks/ # Per-problem test cases ├── Dockerfile ├── Makefile └── pyproject.toml # torch-judge package definition ``` --- ## Troubleshooting ### Docker image not available for Apple Silicon (arm64) ```bash # make run auto-falls back to local build, or force it: make build make start ``` ### `check()` not found in Colab ```bash !pip install torch-judge # then restart runtime ``` ### Notebook reset to blank template Use the toolbar "Reset" button in JupyterLab to reset any notebook to its original blank state — useful for re-practicing a problem. ### Gradient check fails but output is correct Ensure your implementation uses PyTorch operations (not NumPy) so autograd works: ```python # WRONG — breaks autograd import numpy as np result = np.exp(x.numpy()) # CORRECT — autograd compatible result = torch.exp(x) ``` ### Viewing reference solution After attempting a problem, open the matching file in `solutions/`: ``` solutions/02_softmax.ipynb ``` --- ## Key Concepts Tested | Concept | Problems | |---------|----------| | Numerical stability | Softmax, Cross-Entropy, LogSumExp | | Autograd / `nn.Parameter` | Linear, LayerNorm, all nn.Module problems | | Train vs eval behavior | BatchNorm, Dropout | | Broadcasting | LayerNorm, RMSNorm, attention masking | | Shape manipulation | Multi-Head Attention (view, transpose, contiguous) | | Weight initialization | Kaiming Init, Linear Layer | | Memory-efficient training | Gradient Accumulation, Gradient Clipping |