{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"Part 2: Language Core (Control Flow & Comprehensions)\"\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sambaiga/ds-mlops-path/blob/main/tutorials/01-python-basics/02-control-flow.ipynb) [![Download Notebook](https://img.shields.io/badge/Download-Notebook-blue.svg?logo=jupyter&logoColor=white)](https://raw.githubusercontent.com/sambaiga/ds-mlops-path/main/tutorials/01-python-basics/02-control-flow.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "**DS-MLOps Python Foundations**\n",
    "\n",
    "**Python 3.12+ | Author: Anthony Faustine**\n",
    "\n",
    "## Before you begin\n",
    "\n",
    "This notebook assumes you have completed Part 1 (`01-python-core.ipynb`). If you have not, start there. Part 2 picks up immediately where Part 1 left off, using the same **university analytics platform** scenario, and covers everything that decides *what runs and how many times*: conditionals, pattern matching, loops, and comprehensions.\n",
    "\n",
    "> Callout markers used throughout this notebook are explained on the [book cover page](../../index.qmd#callout-guide)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "::: {.callout-note collapse=\"true\" icon=false}\n",
    "## Learning Objectives\n",
    "\n",
    "By the end of Part 2 you will be able to:\n",
    "\n",
    "| # | Skill | Covered in |\n",
    "|---|---|---|\n",
    "| 1 | Write `if` / `elif` / `else` and structural `match` / `case` | Sec. 1 |\n",
    "| 2 | Replace manual index counters with `for`, `enumerate`, and `zip` | Sec. 2 |\n",
    "| 3 | Use `while`, `break`, and `continue` for indefinite loops | Sec. 3 |\n",
    "| 4 | Replace loops with list, dict, set, and generator comprehensions | Sec. 4 |\n",
    ":::\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "## 1. Control Flow: if / elif / else & match / case\n",
    "\n",
    "So far every cell runs its lines from top to bottom, once, in order. **Control flow** lets you change that:\n",
    "- `if / elif / else`: run one branch based on a condition\n",
    "- `match / case`: route structured data to different handlers (Python 3.10+)\n",
    "\n",
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> Key Concept: match / case (Python 3.10+)</span><br><br>\n",
    "Structural pattern matching goes beyond simple equality checks. It can match on the <b>shape of data</b>, destructuring dicts, lists, and class instances in the same step. Use it when branching on the shape or value of structured data, not just numeric thresholds.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def classify_grade(score: float) -> str:\n",
    "    \"\"\"Return letter grade for a numeric score.\"\"\"\n",
    "    if score >= 90:\n",
    "        return \"A: Excellent\"\n",
    "    elif score >= 80:\n",
    "        return \"B: Good\"\n",
    "    elif score >= 70:\n",
    "        return \"C: Satisfactory\"\n",
    "    elif score >= 60:\n",
    "        return \"D: Needs improvement\"\n",
    "    else:\n",
    "        return \"F: See instructor\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "Test the function across the full grade range:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "for s in [95.0, 83.5, 71.0, 62.0, 45.0]:\n",
    "    print(f\"  {s:5.1f} -> {classify_grade(s)}\")\n",
    "\n",
    "# Ternary expression: one-liner for simple binary choices\n",
    "score = 87.0\n",
    "status = \"pass\" if score >= 70 else \"fail\"\n",
    "band = \"high\" if score >= 90 else (\"mid\" if score >= 70 else \"low\")\n",
    "print(f\"\\n{score} -> {status}, {band}\")"
   ]
  },
  {
   "cell_type": "raw",
   "id": "8",
   "metadata": {
    "raw_mimetype": "text/markdown"
   },
   "source": [
    "> **Decision flow: if / elif / else**\n",
    "\n",
    "```{mermaid}\n",
    "flowchart TD\n",
    "    A[\"evaluate condition\"] --> B{if condition1}\n",
    "    B -->|True| C[\"execute if block\"]\n",
    "    B -->|False| D{elif condition2}\n",
    "    D -->|True| E[\"execute elif block\"]\n",
    "    D -->|False| F{else?}\n",
    "    F -->|present| G[\"execute else block\"]\n",
    "    F -->|absent| H[\"skip all\"]\n",
    "    C & E & G & H --> I[\"continue program\"]\n",
    "\n",
    "    style C fill:#EBF5F0,stroke:#059669,color:#065F46\n",
    "    style E fill:#EAF3FA,stroke:#0369A1,color:#0C4A6E\n",
    "    style G fill:#F5F3FF,stroke:#7C3AED,color:#3B0764\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "### match / case: Structural Pattern Matching (Python 3.10+)\n",
    "`match` goes beyond simple equality checks. It can destructure **the shape of data**, extracting values from dicts and lists in one step. Define the routing function first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# match / case: structural pattern matching (Python 3.10+)\n",
    "def process_event(event: dict[str, object]) -> str:\n",
    "    \"\"\"Route a training event to the right handler.\"\"\"\n",
    "    match event:\n",
    "        case {\"type\": \"epoch\", \"epoch\": e, \"loss\": l} if float(str(l)) < 0.05:\n",
    "            return f\"Epoch {e}: converged (loss={l:.3f})\"\n",
    "        case {\"type\": \"epoch\", \"epoch\": e, \"loss\": l}:\n",
    "            return f\"Epoch {e}: loss={l:.3f}\"\n",
    "        case {\"type\": \"error\", \"message\": msg}:\n",
    "            return f\"ERROR: {msg}\"\n",
    "        case {\"type\": t}:\n",
    "            return f\"Unhandled event type: {t!r}\"\n",
    "        case _:\n",
    "            return \"Malformed event\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "Run a variety of event shapes through the dispatcher to see each `case` arm triggered. The `case _:` arm is a catch-all that always matches:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "events: list[dict[str, object]] = [\n",
    "    {\"type\": \"epoch\", \"epoch\": 1, \"loss\": 0.823},\n",
    "    {\"type\": \"epoch\", \"epoch\": 20, \"loss\": 0.041},\n",
    "    {\"type\": \"error\", \"message\": \"OOM on GPU 0\"},\n",
    "    {\"type\": \"checkpoint\"},\n",
    "    {\"status\": \"idle\"},\n",
    "]\n",
    "\n",
    "for ev in events:\n",
    "    print(process_event(ev))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 6 - Match on HTTP-style Status Codes</span><br><br>\n",
    "<b>Goal:</b> Write a <code>describe_status(code)</code> function using <code>match/case</code> that returns a short description.\n",
    "<pre style='background:#FCE8DA;padding:10px;border-radius:4px;font-size:0.9em'>describe_status(200)  -> '200 OK'\n",
    "describe_status(404)  -> '404 Not Found'\n",
    "describe_status(500)  -> '500 Server Error'\n",
    "describe_status(301)  -> '3xx Redirect'\n",
    "describe_status(999)  -> 'Unknown code'</pre>\n",
    "<b>Hint:</b> Use <code>case 2xx</code> patterns are not valid. Use guard conditions instead: <code>case c if 200 <= c < 300</code>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "def describe_status(code: int) -> str:\n",
    "    \"\"\"Return a short description for an HTTP-style status code.\"\"\"\n",
    "    match code:\n",
    "        case _:\n",
    "            return \"unknown\"  # TODO: replace with specific case patterns\n",
    "\n",
    "\n",
    "for c in [200, 404, 500, 301, 999]:\n",
    "    print(describe_status(c))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "## 2. Control Flow: for Loops\n",
    "\n",
    "A **`for` loop** repeats a block of code once for each item in a collection. It is the primary tool for processing datasets, running training epochs, and iterating over files.\n",
    "\n",
    "```python\n",
    "for score in [78, 85, 92]:   # repeat once per score\n",
    "    print(score)              # output: 78, then 85, then 92\n",
    "```\n",
    "\n",
    "The indented block (4 spaces) is the **loop body**: it runs once per item.\n",
    "\n",
    "Python `for` loops iterate over any **iterable**. The built-ins `range()`, `enumerate()`, and `zip()` cover the most common patterns in data work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# range(start, stop, step): generates integers lazily (no list in memory)\n",
    "MAX_EPOCHS: int = 5\n",
    "loss: float = 1.0\n",
    "\n",
    "for epoch in range(1, MAX_EPOCHS + 1):\n",
    "    loss *= 0.75\n",
    "    print(f\"  Epoch {epoch}/{MAX_EPOCHS}  loss={loss:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "`enumerate()` pairs each element with its index, counting from `start=1` by default (or any integer you choose), eliminating the need for manual `i += 1` counters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "# enumerate(): loop with automatic index; avoids manual counter variables\n",
    "students: list[str] = [\"Alice\", \"Carol\", \"Dan\", \"Bob\"]\n",
    "\n",
    "print(\"Leaderboard:\")\n",
    "for rank, name in enumerate(students, start=1):\n",
    "    print(f\"  #{rank}  {name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "`zip()` stitches two or more iterables together element-by-element. Pairs stop when the **shortest** input is exhausted. Build a `dict` from two parallel lists using `dict(zip(keys, values))`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "# zip(): iterate two or more iterables in lockstep\n",
    "# strict=True raises ValueError if the iterables have different lengths\n",
    "names: list[str] = [\"Alice\", \"Bob\", \"Carol\"]\n",
    "scores: list[float] = [92.0, 74.5, 88.0]\n",
    "\n",
    "print(\"Score sheet:\")\n",
    "for name, score in zip(names, scores, strict=True):\n",
    "    grade = \"pass\" if score >= 70 else \"fail\"\n",
    "    print(f\"  {name:<8} {score:5.1f}  {grade}\")\n",
    "\n",
    "# Build a dict from two parallel lists\n",
    "metric_names: list[str] = [\"accuracy\", \"precision\", \"recall\"]\n",
    "metric_vals: list[float] = [0.923, 0.911, 0.934]\n",
    "report: dict[str, float] = dict(zip(metric_names, metric_vals, strict=True))\n",
    "print()\n",
    "print(f\"Report: {report}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "### tqdm: Progress Bars for Long Loops\n",
    "\n",
    "When a loop processes thousands of files or training examples, you need to know how long it will take. `tqdm` wraps any iterable and displays a live progress bar with elapsed time, rate, and ETA, with zero code changes to the loop body:\n",
    "\n",
    "```python\n",
    "pip install tqdm   # if not already installed\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "from tqdm import tqdm\n",
    "\n",
    "# Wrap any iterable with tqdm() - the loop body is unchanged\n",
    "scores: list[float] = []\n",
    "for i in tqdm(range(1_000), desc=\"Simulating scores\", unit=\"rec\"):\n",
    "    scores.append(50 + (i % 50))  # dummy computation\n",
    "\n",
    "print(f\"Generated {len(scores)} scores, mean = {sum(scores) / len(scores):.1f}\")\n",
    "\n",
    "# tqdm also works with enumerate and zip\n",
    "labels: list[str] = [\"pass\" if s >= 70 else \"fail\" for s in tqdm(scores, desc=\"Labelling\", leave=False)]\n",
    "print(f\"pass rate: {labels.count('pass') / len(labels):.1%}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23",
   "metadata": {},
   "source": [
    "## 3. Control Flow: while, break, continue\n",
    "\n",
    "A **`while` loop** repeats a block as long as a condition is `True`. Unlike `for` (which iterates a fixed collection), `while` runs an **indefinite** number of times until either the condition becomes `False` or a `break` statement is hit.\n",
    "\n",
    "```python\n",
    "loss = 1.0\n",
    "while loss > 0.05:    # keep running until loss is small enough\n",
    "    loss *= 0.7       # shrink loss by 30% each iteration\n",
    "```\n",
    "\n",
    "Use `while` when you do not know in advance how many iterations are needed: waiting for convergence, retrying a failing operation, or consuming a data stream.\n",
    "\n",
    "- `break`: exit the loop immediately\n",
    "- `continue`: skip the rest of this iteration\n",
    "- `else` on a loop: runs **only** if no `break` was hit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": [
    "# while: train until convergence or budget exhausted\n",
    "loss: float = 1.0\n",
    "epoch: int = 0\n",
    "MAX_EPOCHS: int = 30\n",
    "THRESHOLD: float = 0.05\n",
    "\n",
    "while loss > THRESHOLD and epoch < MAX_EPOCHS:\n",
    "    loss *= 0.7\n",
    "    epoch += 1\n",
    "\n",
    "print(f\"Stopped at epoch {epoch}: loss={loss:.4f}\")\n",
    "print(f\"Converged: {loss <= THRESHOLD}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "### break and continue\n",
    "`break` exits the innermost loop immediately. Use it when a sentinel value or error condition means further iteration is pointless:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {},
   "outputs": [],
   "source": [
    "# break: exit the loop immediately when a sentinel is found\n",
    "readings: list[float | None] = [36.5, 36.9, 37.4, None, 38.1, 37.8]\n",
    "clean: list[float] = []\n",
    "\n",
    "for r in readings:\n",
    "    if r is None:\n",
    "        print(\"Sensor error : stopping collection\")\n",
    "        break\n",
    "    clean.append(r)\n",
    "\n",
    "print(f\"Clean readings: {clean}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27",
   "metadata": {},
   "source": [
    "`continue` skips the rest of the current iteration and jumps to the next one. Ideal for filtering bad data without a nested `if/else`. The `else` clause on a loop runs only if no `break` occurred:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28",
   "metadata": {},
   "outputs": [],
   "source": [
    "# continue: skip the rest of this iteration and move to the next\n",
    "raw: list[object] = [85.0, \"n/a\", None, 92.0, \"\", 78.5, -1.0, 95.0]\n",
    "valid: list[float] = []\n",
    "\n",
    "for item in raw:\n",
    "    if not isinstance(item, int | float) or float(str(item)) < 0:\n",
    "        continue  # skip bad items\n",
    "    valid.append(float(str(item)))\n",
    "\n",
    "print(f\"Valid scores: {valid}\")\n",
    "\n",
    "# loop else: runs only when the loop was NOT exited via break\n",
    "required_fields: list[str] = [\"name\", \"gpa\", \"major\"]\n",
    "record: dict[str, str] = {\"name\": \"Alice\", \"gpa\": \"3.95\", \"major\": \"CS\"}\n",
    "\n",
    "for field in required_fields:\n",
    "    if field not in record:\n",
    "        print(f\"Missing required field: {field!r}\")\n",
    "        break\n",
    "else:\n",
    "    print(\"All required fields present\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29",
   "metadata": {},
   "source": [
    "## 4. Comprehensions\n",
    "\n",
    "A **comprehension** builds a new collection by transforming or filtering an existing one, all in a single expression. It replaces the verbose `for` + `.append()` pattern:\n",
    "\n",
    "```python\n",
    "# Loop version (3 lines):\n",
    "squares = []\n",
    "for n in range(5):\n",
    "    squares.append(n ** 2)   # [0, 1, 4, 9, 16]\n",
    "\n",
    "# Comprehension (1 line, identical result):\n",
    "squares = [n ** 2 for n in range(5)]\n",
    "```\n",
    "\n",
    "Comprehensions are faster than equivalent loops and are considered idiomatic Python.\n",
    "\n",
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> Key Concept: Concise, Readable Collection Construction</span><br><br>\n",
    "Comprehensions build new collections by <b>transforming or filtering</b> an iterable in a single expression. They are faster than equivalent <code>for</code> + <code>.append()</code> loops and are idiomatic Python.\n",
    "\n",
    "<table style='margin-top:8px;border-collapse:collapse'>\n",
    "<tr><td style='padding:4px 12px;font-family:monospace'>[expr for x in it if cond]</td><td style='padding:4px 12px'>list</td></tr>\n",
    "<tr><td style='padding:4px 12px;font-family:monospace'>{k: v for x in it if cond}</td><td style='padding:4px 12px'>dict</td></tr>\n",
    "<tr><td style='padding:4px 12px;font-family:monospace'>{expr for x in it if cond}</td><td style='padding:4px 12px'>set</td></tr>\n",
    "<tr><td style='padding:4px 12px;font-family:monospace'>(expr for x in it if cond)</td><td style='padding:4px 12px'>generator (lazy, no list in memory)</td></tr>\n",
    "</table>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30",
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_scores: list[float] = [78.0, 85.5, 92.0, 88.5, 95.0, 67.0, 81.0]\n",
    "\n",
    "# Transform: min-max normalise to [0, 1]\n",
    "lo, hi = min(raw_scores), max(raw_scores)\n",
    "normed: list[float] = [(s - lo) / (hi - lo) for s in raw_scores]\n",
    "print(f\"Normalised: {[round(n, 2) for n in normed]}\")\n",
    "\n",
    "# Filter: keep only passing scores\n",
    "passing: list[float] = [s for s in raw_scores if s >= 70]\n",
    "print(f\"Passing   : {passing}\")\n",
    "\n",
    "# Filter + transform: label each score\n",
    "labels: list[str] = [f\"{s:.0f} (pass)\" if s >= 70 else f\"{s:.0f} (FAIL)\" for s in raw_scores]\n",
    "print(f\"Labelled  : {labels}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31",
   "metadata": {},
   "source": [
    "A two-clause comprehension flattens a nested collection. Read `[s for batch in batches for s in batch]` left-to-right: \"outer loop, inner loop, collect `s`\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Flatten a nested structure with a two-clause comprehension\n",
    "batches: list[list[float]] = [[85.0, 91.0], [74.0, 88.5], [95.0, 79.0]]\n",
    "flat: list[float] = [s for batch in batches for s in batch]\n",
    "print(f\"Flattened : {flat}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33",
   "metadata": {},
   "source": [
    "### Dict, Set, and Generator Comprehensions\n",
    "The `[...]` syntax extends to dicts (`{k: v for ...}`), sets (`{expr for ...}`), and lazy generators (`(expr for ...)`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34",
   "metadata": {},
   "outputs": [],
   "source": [
    "students: list[dict[str, object]] = [\n",
    "    {\"name\": \"Alice\", \"score\": 92.0, \"major\": \"CS\"},\n",
    "    {\"name\": \"Bob\", \"score\": 74.5, \"major\": \"Math\"},\n",
    "    {\"name\": \"Carol\", \"score\": 88.0, \"major\": \"CS\"},\n",
    "    {\"name\": \"Dan\", \"score\": 61.0, \"major\": \"Physics\"},\n",
    "]\n",
    "\n",
    "# Dict comprehension: build a name -> score lookup\n",
    "score_lookup: dict[str, float] = {str(s[\"name\"]): float(str(s[\"score\"])) for s in students}\n",
    "print(f\"Lookup : {score_lookup}\")\n",
    "\n",
    "# Dict comprehension with filter: honours students only\n",
    "honours: dict[str, float] = {str(s[\"name\"]): float(str(s[\"score\"])) for s in students if float(str(s[\"score\"])) >= 80}\n",
    "print(f\"Honours: {honours}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35",
   "metadata": {},
   "source": [
    "Set comprehensions deduplicate automatically. Generator expressions compute values **lazily**: they use O(1) memory regardless of input size, making them ideal inside `sum()`, `any()`, and `all()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36",
   "metadata": {},
   "outputs": [],
   "source": [
    "students: list[dict[str, object]] = [\n",
    "    {\"name\": \"Alice\", \"score\": 92.0, \"major\": \"CS\"},\n",
    "    {\"name\": \"Bob\", \"score\": 74.5, \"major\": \"Math\"},\n",
    "    {\"name\": \"Carol\", \"score\": 88.0, \"major\": \"CS\"},\n",
    "    {\"name\": \"Dan\", \"score\": 61.0, \"major\": \"Physics\"},\n",
    "]\n",
    "\n",
    "# Set comprehension: unique majors\n",
    "majors: set[str] = {str(s[\"major\"]) for s in students}\n",
    "print(f\"Majors : {sorted(majors)}\")\n",
    "\n",
    "# Generator expression: lazy evaluation; ideal inside sum/any/all\n",
    "total: float = sum(float(str(s[\"score\"])) for s in students)\n",
    "any_fail: bool = any(float(str(s[\"score\"])) < 70 for s in students)\n",
    "all_pass: bool = all(float(str(s[\"score\"])) >= 60 for s in students)\n",
    "\n",
    "print(f\"Mean          : {total / len(students):.1f}\")\n",
    "print(f\"Any fail (<70): {any_fail}\")\n",
    "print(f\"All pass (>=60): {all_pass}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 7 - Cohort Score Report</span><br><br>\n",
    "<b>Goal:</b> Using a single comprehension for each, produce the outputs below from <code>records</code>.\n",
    "<pre style='background:#FCE8DA;padding:10px;border-radius:4px;font-size:0.9em'>records = [\n",
    "    {'name': 'Alice', 'scores': [88, 92, 85]},\n",
    "    {'name': 'Bob',   'scores': [62, 70, 58]},\n",
    "    {'name': 'Carol', 'scores': [91, 95, 89]},\n",
    "]\n",
    "\n",
    "# 1. List of averages (one float per student)\n",
    "averages = [82.33, 63.33, 91.67]\n",
    "\n",
    "# 2. Dict mapping name -> average (rounded to 2 dp)\n",
    "avg_map = {'Alice': 88.33, 'Bob': 63.33, 'Carol': 91.67}\n",
    "\n",
    "# 3. Set of unique student names who scored >= 80 average\n",
    "top = {'Alice', 'Carol'}</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38",
   "metadata": {},
   "outputs": [],
   "source": [
    "records: list[dict[str, object]] = [\n",
    "    {\"name\": \"Alice\", \"scores\": [88, 92, 85]},\n",
    "    {\"name\": \"Bob\", \"scores\": [62, 70, 58]},\n",
    "    {\"name\": \"Carol\", \"scores\": [91, 95, 89]},\n",
    "]\n",
    "\n",
    "# TODO: 1. list of averages\n",
    "averages: list[float] = ...\n",
    "\n",
    "# TODO: 2. name -> average dict\n",
    "avg_map: dict[str, float] = ...\n",
    "\n",
    "# TODO: 3. set of names with average >= 80\n",
    "top: set[str] = ...\n",
    "\n",
    "print(f\"averages: {averages}\")\n",
    "print(f\"avg_map : {avg_map}\")\n",
    "print(f\"top     : {top}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39",
   "metadata": {},
   "source": [
    "## Capstone: Monte Carlo Pi Estimation\n",
    "\n",
    "This activity ties together everything from Part 1 and Part 2: variables, lists, for loops, random numbers, functions, and comprehensions, to estimate the value of π using a simulation technique called **Monte Carlo integration**.\n",
    "\n",
    "### The idea\n",
    "\n",
    "Imagine a unit circle (radius = 1) inscribed in a 2×2 square. A random point `(x, y)` with `x, y ∈ [−1, 1]` falls inside the circle if `x² + y² ≤ 1`.\n",
    "\n",
    "The ratio of the circle's area to the square's area is π/4. If we throw millions of random points and count how many land inside the circle, the proportion converges to π/4, so `π ≈ 4 × (hits / total)`.\n",
    "\n",
    "```\n",
    "  ┌──────────────┐\n",
    "  │    ·  ●  ·   │   ● inside circle  → hit\n",
    "  │  ●       ●   │   · outside        → miss\n",
    "  │    circle    │\n",
    "  │  ●       ●   │\n",
    "  │    ·  ●  ·   │\n",
    "  └──────────────┘\n",
    "   π/4 ≈ hits/total\n",
    "```\n",
    "\n",
    "This is a real technique used in finance, physics, and ML for problems that are too complex to solve analytically."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40",
   "metadata": {},
   "source": [
    "**Step 1:** Write a helper that checks whether a point is inside the unit circle:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41",
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "\n",
    "\n",
    "def in_unit_circle(x: float, y: float) -> bool:\n",
    "    \"\"\"Return True if (x, y) lies inside the unit circle (radius = 1).\"\"\"\n",
    "    return x**2 + y**2 <= 1.0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42",
   "metadata": {},
   "source": [
    "**Step 2:** Simulate random points and count how many land inside the circle. `random.seed()` makes results reproducible. Always set a seed before any simulation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43",
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "random.seed(42)  # fix seed for reproducibility\n",
    "\n",
    "N_POINTS: int = 1_000_000\n",
    "inside: int = sum(\n",
    "    1\n",
    "    for _ in range(N_POINTS)\n",
    "    if in_unit_circle(random.uniform(-1, 1), random.uniform(-1, 1))  # noqa: S311\n",
    ")\n",
    "\n",
    "pi_estimate: float = 4 * inside / N_POINTS\n",
    "print(f\"Points       : {N_POINTS:,}\")\n",
    "print(f\"Hits (inside): {inside:,}\")\n",
    "print(f\"pi estimate  : {pi_estimate:.5f}\")\n",
    "print(f\"math.pi      : {math.pi:.5f}\")\n",
    "print(f\"Error        : {abs(pi_estimate - math.pi):.5f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44",
   "metadata": {},
   "source": [
    "**Step 3:** See how the estimate improves as `N` grows: the law of large numbers at work:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45",
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import random\n",
    "\n",
    "random.seed(0)\n",
    "\n",
    "for n in [100, 1_000, 10_000, 100_000, 1_000_000]:\n",
    "    hits = sum(\n",
    "        1\n",
    "        for _ in range(n)\n",
    "        if in_unit_circle(random.uniform(-1, 1), random.uniform(-1, 1))  # noqa: S311\n",
    "    )\n",
    "    est = 4 * hits / n\n",
    "    error = abs(est - math.pi)\n",
    "    print(f\"  n={n:>9,}  pi={est:.5f}  error={error:.5f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46",
   "metadata": {},
   "source": [
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> What you just used</span><br><br>\n",
    "<ul>\n",
    "<li><b>Variables & types</b>: <code>N_POINTS: int</code>, <code>pi_estimate: float</code></li>\n",
    "<li><b>Functions</b>: <code>in_unit_circle()</code> with type hints</li>\n",
    "<li><b>for loop</b>: iterating N times, building results</li>\n",
    "<li><b>Comprehension</b>: <code>sum(1 for _ in range(n) if ...)</code></li>\n",
    "<li><b>random module</b>: <code>seed()</code> for reproducibility, <code>uniform()</code> for sampling</li>\n",
    "<li><b>math module</b>: <code>math.pi</code> as ground truth</li>\n",
    "</ul>\n",
    "This exact pattern (sample randomly, count outcomes, estimate a ratio) appears in A/B testing, Bayesian inference, and reinforcement learning.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47",
   "metadata": {},
   "source": [
    "## Further Reading\n",
    "\n",
    "| Resource | Why it matters |\n",
    "|---|---|\n",
    "| [PEP 636 — Structural Pattern Matching](https://peps.python.org/pep-0636/) | Official tutorial for `match`/`case`, with worked examples from the Python core team |\n",
    "| Ramalho, L. (2022). *Fluent Python*, 2nd ed. O'Reilly. | Chapter 10 covers pattern matching in depth, including class patterns and guards |\n",
    "| [Real Python — Python `for` Loops](https://realpython.com/python-for-loop/) | Clear treatment of `enumerate`, `zip`, and the iterator protocol behind every loop |\n",
    "| [Real Python — List Comprehensions](https://realpython.com/list-comprehension-python/) | When to use comprehensions vs explicit loops, and how to avoid making them unreadable |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "| Concept | Key rule |\n",
    "|---|---|\n",
    "| `match`/`case` | Structural pattern matching on values, dicts, lists (3.10+) |\n",
    "| `enumerate` / `zip` | Always prefer these over manual index counters |\n",
    "| `while` / `break` / `continue` | For indefinite loops, early exit, and skipping bad data |\n",
    "| Comprehensions | `[expr for x in it if cond]`; use generators `(...)` inside `sum()` / `any()` / `all()` |\n",
    "\n",
    "**Next:** `03-python-patterns.ipynb`, covering functions, lambdas, `*args`/`**kwargs`, dataclasses, modules, exception handling, and file I/O with `pathlib`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}