{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "---\n",
    "title: \"Part 10: Combining, Reshaping & Time Series\"\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sambaiga/ds-mlops-path/blob/main/tutorials/01-python-basics/10-combining-reshaping.ipynb) [![Download Notebook](https://img.shields.io/badge/Download-Notebook-blue.svg?logo=jupyter&logoColor=white)](https://raw.githubusercontent.com/sambaiga/ds-mlops-path/main/tutorials/01-python-basics/10-combining-reshaping.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "**DS-MLOps Data Analysis**\n",
    "\n",
    "**Python 3.12+ | Author: Anthony Faustine**\n",
    "\n",
    "## Before you begin\n",
    "\n",
    "This notebook continues from Part 9 (`09-pandas-operations.ipynb`). Real analyses rarely live in one table: a second file with reference data shows up, a result needs splitting by category, and a wide summary table is easier to read than a long one. This part covers four multi-table operations and adds time-indexed data as a fifth theme: concatenation, merging, `groupby`, pivoting, and working with dates and `resample`.\n",
    "\n",
    "Alongside the student exam results, we use a small synthetic table mapping each `school_id` to a region, and a second dataset of daily attendance records covering five schools over a full school term.\n",
    "\n",
    "Part 11 (`11-polars.ipynb`) continues with Polars and the expression API.\n",
    "\n",
    "::: {.callout-note collapse=\"true\" icon=false}\n",
    "## Topics covered\n",
    "\n",
    "| Topic | Why it matters |\n",
    "|---|---|\n",
    "| **`pd.concat`** | Stack DataFrames on top of each other, or side by side |\n",
    "| **`pd.merge`** | Combine two tables on a shared key, the same idea as a SQL join |\n",
    "| **`groupby`** | Split a dataset into groups, apply a function to each, combine the results |\n",
    "| **`pivot_table`** | Reshape a long result into a wide, readable summary table |\n",
    "| **`Timestamp` and `to_datetime`** | Parse text into dates; the building block for all time-indexed work |\n",
    "| **`DatetimeIndex`** | An index made of dates unlocks date-based slicing instead of position-only |\n",
    "| **`resample`** | Change the time granularity of a series: daily to weekly, weekly to monthly |\n",
    ":::\n",
    "\n",
    "> Callout markers used throughout this notebook are explained on the [book cover page](../../index.qmd#callout-guide)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "::: {.callout-note collapse=\"true\" icon=false}\n",
    "## Learning Objectives\n",
    "\n",
    "By the end of Part 10 you will be able to:\n",
    "\n",
    "| # | Skill | Covered in |\n",
    "|---|---|---|\n",
    "| 1 | Stack DataFrames with `pd.concat`, by row and by column | Sec. 1 |\n",
    "| 2 | Combine two tables on a shared key with `pd.merge`, and choose the right `how` | Sec. 2 |\n",
    "| 2b | Predict row counts after a merge and diagnose unexpected fan-out | Sec. 2 |\n",
    "| 3 | Split a dataset into groups and aggregate each one with `groupby` | Sec. 3 |\n",
    "| 4 | Reshape a grouped result into a wide summary with `pivot_table` | Sec. 4 |\n",
    "| 5 | Parse text into datetime values with `to_datetime` and extract date components | Sec. 5 |\n",
    "| 6 | Build a `DatetimeIndex` and use it to slice a time series by date | Sec. 6 |\n",
    "| 7 | Select rows with a partial date string or a date range | Sec. 7 |\n",
    "| 8 | Change the time granularity of a series with `resample` | Sec. 8 |\n",
    ":::\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\"data/university_analytics.csv\")\n",
    "df[\"average_marks\"] = (df[\"midterm_score\"] + df[\"final_score\"] + df[\"project_score\"]) / 3\n",
    "\n",
    "# courses.csv is a thin reference table — one row per course — useful for joins\n",
    "courses = pd.read_csv(\"data/courses.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "## 1. Concatenating DataFrames with `pd.concat`\n",
    "\n",
    "`pd.concat` stacks DataFrames together. By default it stacks rows on top of each other (`axis=0`), the operation you need when the same columns show up in two separate files or two separate batches:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "male_students = df[df[\"gender\"] == \"M\"]\n",
    "female_students = df[df[\"gender\"] == \"F\"]\n",
    "\n",
    "recombined = pd.concat([male_students, female_students], axis=0)\n",
    "print(f\"male: {len(male_students)}, female: {len(female_students)}, combined: {len(recombined)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "Passing `axis=1` stacks columns side by side instead, matching rows up by index. This is the shape you get when a calculation is done separately and needs joining back onto the original table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "pass_fail = (df[\"average_marks\"] >= 0.6).rename(\"passed\")\n",
    "with_pass_fail = pd.concat([df[[\"student_id\", \"average_marks\"]], pass_fail], axis=1)\n",
    "with_pass_fail.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "<div style='background:#FEF2F2;border-left:5px solid #DC2626;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#991B1B;font-weight:bold'><i class=\"bi bi-bug-fill\"></i> Common Mistake: Concatenating by column without matching indexes first</span><br><br>\n",
    "<code>pd.concat([df_a, df_b], axis=1)</code> lines rows up by index position, not by any shared key. If <code>df_a</code> and <code>df_b</code> were filtered, sorted, or reset differently beforehand, row 0 in one might not be row 0 in the other, and the result silently combines the wrong rows. <code>pd.merge</code> (Sec. 2) is the safer choice whenever there is an actual key column to join on.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 1 - Split and Recombine</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Split <code>df</code> into two DataFrames by <code>caste</code>: one for <code>\"BC\"</code> and one for everything else, using <code>.isin()</code> or a boolean mask. Concatenate them back with <code>pd.concat</code> and confirm the row count matches the original.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>bc_only = df[df[\"program\"] == \"BC\"]\n",
    "not_bc = df[df[\"program\"] != \"BC\"]\n",
    "recombined = pd.concat([bc_only, not_bc], axis=0)\n",
    "len(recombined) == len(df)</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: split df by program == \"Data Science\", concat back, and confirm the row count matches\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "## 2. Joining Data with `pd.merge`\n",
    "\n",
    "`pd.merge` combines two tables on a shared key, the same idea as a SQL join. The `courses` table loaded above has one row per `course_id`; merging it onto `df` attaches metadata (credits, department, instructor) to every enrollment row:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_with_meta = pd.merge(df, courses, on=\"course_id\", how=\"left\")\n",
    "df_with_meta[[\"student_id\", \"course_id\", \"course_x\", \"department\", \"credits\"]].head()"
   ]
  },
  {
   "cell_type": "raw",
   "id": "14",
   "metadata": {
    "raw_mimetype": "text/markdown"
   },
   "source": [
    "> **Join types: which rows survive each merge strategy**\n",
    "\n",
    "```{mermaid}\n",
    "flowchart LR\n",
    "    subgraph LEFT [\"Left: students\"]\n",
    "        LR1[\"S001 | Alice\"]\n",
    "        LR2[\"S002 | Bob\"]\n",
    "        LR3[\"S003 | Carol\"]\n",
    "    end\n",
    "    subgraph RIGHT [\"Right: grades\"]\n",
    "        RR1[\"S001 | 85\"]\n",
    "        RR2[\"S002 | 72\"]\n",
    "        RR4[\"S004 | 90\"]\n",
    "    end\n",
    "    LR1 & RR1 --> INN[\"inner: S001 (match only)\"]\n",
    "    LR2 & RR2 --> INN2[\"inner: S002 (match only)\"]\n",
    "    LR1 & LR2 & LR3 --> LEFT2[\"left: S001, S002, S003\n",
    "(Carol grade = NaN)\"]\n",
    "    RR1 & RR2 & RR4 --> RIGHT2[\"right: S001, S002, S004\n",
    "(Alice/Bob keep grade, S004 name = NaN)\"]\n",
    "    LR1 & LR2 & LR3 & RR4 --> OUTER[\"outer: all 4 students, NaN where no match\"]\n",
    "\n",
    "    style INN fill:#EBF5F0,stroke:#059669,color:#065F46\n",
    "    style INN2 fill:#EBF5F0,stroke:#059669,color:#065F46\n",
    "    style LEFT2 fill:#EAF3FA,stroke:#0369A1,color:#0C4A6E\n",
    "    style RIGHT2 fill:#F5F3FF,stroke:#7C3AED,color:#3B0764\n",
    "    style OUTER fill:#FEF2F2,stroke:#DC2626,color:#991B1B\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "The `how` argument decides which rows survive when a key on one side has no match on the other. `courses` here has a row for every `course_id` in `df`, so every `how` gives the same result; the difference only shows up once the two tables disagree:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "<div style='background:#EAF7F0;border-left:5px solid #059669;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#059669;font-weight:bold'><i class=\"bi bi-journal-code\"></i> Example: When <code>how</code> actually changes the result</span><br><br>\n",
    "Merging <code>df</code> against only half of <code>courses</code> with <code>how=\"inner\"</code> keeps only enrollments whose course is in that half. The same merge with <code>how=\"left\"</code> keeps every student, with <code>region</code> set to <code>NaN</code> wherever the school was missing from the smaller table.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "half_courses = courses[courses[\"department\"] == \"Computing\"]\n",
    "\n",
    "inner_result = pd.merge(df, half_courses, on=\"course_id\", how=\"inner\")\n",
    "left_result = pd.merge(df, half_courses, on=\"course_id\", how=\"left\")\n",
    "missing = left_result[\"department\"].isna().sum()\n",
    "print(f\"inner: {len(inner_result)} rows, left: {len(left_result)} rows, unmatched dept: {missing}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "The four join types differ only in which rows they keep, never in how rows are matched:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ark.plot.diagrams import merge_join_types_diagram\n",
    "\n",
    "merge_join_types_diagram();"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "<div style='background:#FEF2F2;border-left:5px solid #DC2626;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#991B1B;font-weight:bold'><i class=\"bi bi-bug-fill\"></i> Common Mistake: Forgetting that an unmatched key produces <code>NaN</code>, not a dropped row</span><br><br>\n",
    "With <code>how=\"left\"</code>, an enrollment whose <code>course_id</code> has no match in <code>courses</code> still gets a row, just with <code>NaN</code> in every column that came from <code>courses</code>. Code that assumes every row has a real <code>region</code> after a left merge will silently miscount or mis-group those rows later, usually in a <code>groupby</code> a few cells down.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "### Duplicate keys: when merge produces more rows than you expect\n",
    "\n",
    "Every example so far used a key where the right table had at most one match per left row (many-to-one). When the right table has more than one match, the result has more rows than either input, and this is correct but often surprising.\n",
    "\n",
    "Concrete example: if a student is enrolled in three courses, a left merge of the enrollment table against a course details table produces three rows for that student, one per course. That is the right answer, but code that assumed row counts stay the same will silently operate on a bigger table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a small example: two students, one appearing in two courses\n",
    "students_small = pd.DataFrame(\n",
    "    {\n",
    "        \"student_id\": [\"S0001\", \"S0001\", \"S0002\"],\n",
    "        \"course_id\": [\"C01\", \"C02\", \"C01\"],\n",
    "        \"score\": [85, 72, 91],\n",
    "    }\n",
    ")\n",
    "\n",
    "course_info = pd.DataFrame(\n",
    "    {\n",
    "        \"course_id\": [\"C01\", \"C02\"],\n",
    "        \"course_name\": [\"Statistics\", \"Algorithms\"],\n",
    "        \"credits\": [3, 4],\n",
    "    }\n",
    ")\n",
    "\n",
    "merged_dup = pd.merge(students_small, course_info, on=\"course_id\", how=\"left\")\n",
    "print(f\"students_small rows: {len(students_small)}\")\n",
    "print(f\"after merge rows   : {len(merged_dup)}\")\n",
    "merged_dup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23",
   "metadata": {},
   "source": [
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> Key Concept: A many-to-one merge preserves row count; a one-to-many merge multiplies it</span><br><br>\n",
    "<b>Many-to-one (the common case):</b> each row on the left matches at most one row on the right. Row count stays the same.<br><br>\n",
    "<b>One-to-many:</b> one row on the left matches several rows on the right. Each match becomes its own output row. A student enrolled in 3 courses becomes 3 rows after merging on a course-detail table.<br><br>\n",
    "<b>Many-to-many:</b> both sides have duplicates on the key. Every left row pairs with every matching right row: 3 left rows matching 4 right rows produces 12 output rows. This is almost never what you want and is usually a sign of a design error in the data or the query.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verify: check how many rows the real df gains after merging on course_id\n",
    "before = len(df)\n",
    "after = len(pd.merge(df, courses, on=\"course_id\", how=\"left\"))\n",
    "print(f\"Before merge: {before} rows\")\n",
    "print(f\"After  merge: {after} rows\")\n",
    "print(f\"Difference  : {after - before} extra rows (students with multiple course matches)\")\n",
    "\n",
    "# Diagnose: how many course_ids appear more than once in courses?\n",
    "dupe_courses = courses[courses.duplicated(\"course_id\", keep=False)]\n",
    "print(f\"\\nDuplicate course_ids in courses.csv: {dupe_courses['course_id'].nunique()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "<div style='background:#FEF2F2;border-left:5px solid #DC2626;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#991B1B;font-weight:bold'><i class=\"bi bi-bug-fill\"></i> Common Mistake: Summing a column after an unexpected fan-out</span><br><br>\n",
    "If <code>df</code> has one row per student and <code>groupby(\"student_id\")[\"score\"].sum()</code> gives the right answer before a merge, but the merge doubles some rows, the same <code>sum()</code> after the merge silently counts those scores twice. The most reliable diagnostic is to compare <code>len(df)</code> before and after any merge. A row count increase without a planned one-to-many join is a bug, not a feature.\n",
    "<pre style='background:#F4F5F6;padding:10px;border-radius:4px;font-size:0.9em'># Always verify after merging\n",
    "before = len(df)\n",
    "merged = pd.merge(df, other_table, on=\"key\", how=\"left\")\n",
    "assert len(merged) == before, f\"Unexpected fan-out: {before} -> {len(merged)} rows\"</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 2 - Count Enrollments Per Department</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Merge <code>df</code> with <code>courses</code> using <code>how=\"left\"</code>, then use <code>.value_counts()</code> on the resulting <code>department</code> column.egion</code> column to see how many student rows fall in each region.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>merged = pd.merge(df, courses, on=\"course_id\", how=\"left\")\n",
    "merged[\"department\"].value_counts()</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: merge df with courses (how=\"left\"), then value_counts on department\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28",
   "metadata": {},
   "source": [
    "## 3. Group By Operations\n",
    "\n",
    "`groupby` splits a dataset into groups by a key, applies a function to each group independently, and combines the results back into one table, one row per group:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ark.plot.diagrams import groupby_split_apply_combine_diagram\n",
    "\n",
    "groupby_split_apply_combine_diagram();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby(\"program\")[\"average_marks\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31",
   "metadata": {},
   "source": [
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> Key Concept: <code>groupby</code> computes nothing until you aggregate</span><br><br>\n",
    "<code>df.groupby(\"program\")</code> on its own returns a <code>DataFrameGroupBy</code> object, a plan for splitting the data, not a result. Nothing is actually computed until a method like <code>.mean()</code>, <code>.sum()</code>, or <code>.agg()</code> is called on it, the same lazy-then-compute pattern you will see again in Part 11's Polars notebook.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32",
   "metadata": {},
   "source": [
    "`.agg()` works after `groupby` exactly as it did in Part 2, computing several statistics per group in one call:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby(\"program\")[\"average_marks\"].agg([\"mean\", \"std\", \"count\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34",
   "metadata": {},
   "source": [
    "Grouping by more than one column splits into one group per combination, here gender within caste:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby([\"program\", \"gender\"])[\"average_marks\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36",
   "metadata": {},
   "source": [
    "<div style='background:#F5F3FF;border-left:5px solid #7C3AED;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#5B21B6;font-weight:bold'><i class=\"bi bi-lightbulb-fill\"></i> Pro Tip: <code>observed=True</code> when grouping a <code>category</code> column</span><br><br>\n",
    "Grouping a column with <code>category</code> dtype (Part 2, Sec. 2) by default includes every category the dtype knows about, even ones with zero rows in the current data, padding the result with empty groups. Passing <code>observed=True</code> to <code>groupby</code> keeps only the categories that actually appear.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 3 - Dropout Rate by Caste</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Group <code>df</code> by <code>caste</code> and compute, for each group, what fraction of students have <code>continue_drop == \"drop\"</code>. <code>(series == \"drop\").mean()</code> gives a fraction directly, no separate count and divide needed.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>df.groupby(\"program\")[\"continue_drop\"].apply(lambda s: (s == \"drop\").mean())</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: group by program, compute the pass fraction per program\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39",
   "metadata": {},
   "source": [
    "## 4. Pivoting Data\n",
    "\n",
    "`pivot_table` is `groupby` plus a reshape, in one call: it groups by one column, splits further by another, and lays the second grouping out as columns instead of stacking it into more rows, much easier to scan as a summary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.pivot_table(df, index=\"program\", columns=\"gender\", values=\"average_marks\", aggfunc=\"mean\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41",
   "metadata": {},
   "source": [
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> Key Concept: <code>pivot_table</code> is <code>groupby</code> with a different output shape</span><br><br>\n",
    "<code>df.groupby([\"program\", \"gender\"])[\"average_marks\"].mean()</code> from Sec. 3 and the <code>pivot_table</code> call above compute the exact same numbers. <code>groupby</code> returns them stacked into a long Series with a two-level index; <code>pivot_table</code> lays the second key out across columns instead. Reach for <code>pivot_table</code> specifically when the result is meant to be read by a person, not processed further by code.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42",
   "metadata": {},
   "source": [
    "`aggfunc` accepts a list too, giving more than one statistic per cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.pivot_table(df, index=\"program\", columns=\"gender\", values=\"average_marks\", aggfunc=[\"mean\", \"count\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 4 - Region by Program Pivot</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Merge <code>df</code> with <code>courses</code> (Sec. 2), then build a pivot table with <code>region</code> as the index, <code>program</code> as the columns, and the mean <code>average_marks</code> in each cell.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>merged = pd.merge(df, courses, on=\"course_id\", how=\"left\")\n",
    "pd.pivot_table(merged, index=\"region\", columns=\"program\", values=\"average_marks\", aggfunc=\"mean\")</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: merge with courses, then pivot region x program on mean final_score\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46",
   "metadata": {},
   "source": [
    "## Capstone: Regional Performance Report\n",
    "\n",
    "Combine every operation from this notebook into one report: a merge to bring in region, a groupby to summarize, and a pivot to lay the summary out for reading."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Capstone Exercise - Regional Performance Report</span><br><br>\n",
    "\n",
    "<b>Goal:</b>\n",
    "<ol>\n",
    "<li>Merge <code>df</code> with <code>courses</code> on <code>course_id</code>, keeping every student (Sec. 2)</li>\n",
    "<li>Group the merged table by <code>region</code> and compute the mean <code>average_marks</code> and the drop fraction, using the function from Activity 3 (Sec. 3)</li>\n",
    "<li>Build a pivot table with <code>region</code> as rows and <code>caste</code> as columns, mean <code>average_marks</code> in each cell (Sec. 4)</li>\n",
    "</ol>\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>merged = pd.merge(df, courses, on=\"course_id\", how=\"left\")\n",
    "\n",
    "regional_summary = merged.groupby(\"region\").agg(\n",
    "    mean_marks=(\"average_marks\", \"mean\"),\n",
    "    drop_fraction=(\"continue_drop\", lambda s: (s == \"drop\").mean()),\n",
    ")\n",
    "\n",
    "region_caste_pivot = pd.pivot_table(merged, index=\"region\", columns=\"program\", values=\"average_marks\", aggfunc=\"mean\")</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: build the regional performance report described above\n",
    "..."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "49",
   "metadata": {},
   "source": [
    "## 5. Time Series Data\n",
    "\n",
    "The student exam results dataset has no date column. The attendance dataset does: daily records for five schools across a full school term. Loading it alongside the exam data gives Part 10 two tables to work with and shows how the same pandas idioms scale to a new problem shape."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50",
   "metadata": {},
   "outputs": [],
   "source": [
    "attendance = pd.read_csv(\"data/daily_attendance.csv\")\n",
    "attendance.dtypes"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "51",
   "metadata": {},
   "source": [
    "## 5. Date and Time Data Types\n",
    "\n",
    "The `date` column above read in as plain text, `str` dtype, not a date pandas can do arithmetic on. `pd.to_datetime` converts it to pandas' dedicated datetime dtype, `datetime64`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52",
   "metadata": {},
   "outputs": [],
   "source": [
    "attendance[\"date\"] = pd.to_datetime(attendance[\"date\"])\n",
    "attendance.dtypes"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "53",
   "metadata": {},
   "source": [
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> Key Concept: Pandas 3 infers the resolution it needs, not always nanoseconds</span><br><br>\n",
    "Earlier pandas versions always stored datetimes as <code>datetime64[ns]</code>, nanosecond precision, whether the data needed it or not. Pandas 3's <code>to_datetime</code> infers a resolution from what is actually in the data: day-level strings like the ones here become <code>datetime64[s]</code> or coarser, not nanoseconds. <code>attendance[\"date\"].dtype</code> shows whichever resolution was inferred for this column.\n",
    "</div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "54",
   "metadata": {},
   "source": [
    "A single value out of a datetime column is a `Timestamp`, pandas' equivalent of Python's `datetime.datetime`, with the same year, month, day, and weekday attributes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55",
   "metadata": {},
   "outputs": [],
   "source": [
    "first_day = attendance[\"date\"].iloc[0]\n",
    "print(type(first_day))\n",
    "print(f\"year={first_day.year}, month={first_day.month}, day_name={first_day.day_name()}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "56",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 5 - Parse and Inspect</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Convert a list of three date strings, <code>[\"2025-01-06\", \"2025-02-14\", \"2025-03-28\"]</code>, to <code>Timestamp</code> values with <code>pd.to_datetime</code>, then print the day name of each one.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>dates = pd.to_datetime([\"2025-01-06\", \"2025-02-14\", \"2025-03-28\"])\n",
    "for d in dates:\n",
    "    print(d.day_name())</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: convert the three date strings and print each day name\n",
    "..."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "58",
   "metadata": {},
   "source": [
    "## 6. The `DatetimeIndex`\n",
    "\n",
    "Setting a datetime column as the index turns it into a `DatetimeIndex`, which unlocks date-based slicing instead of only position-based or exact-label lookups. Each school's rows are pulled out first, since a `DatetimeIndex` only makes sense for one time series at a time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "59",
   "metadata": {},
   "outputs": [],
   "source": [
    "school_300 = attendance[attendance[\"school_id\"] == 300].set_index(\"date\")\n",
    "school_300.index"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "60",
   "metadata": {},
   "source": [
    "<div style='background:#FEF2F2;border-left:5px solid #DC2626;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#991B1B;font-weight:bold'><i class=\"bi bi-bug-fill\"></i> Common Mistake: Setting the index before filtering to one entity</span><br><br>\n",
    "<code>attendance.set_index(\"date\")</code> on the full table produces a <code>DatetimeIndex</code> with the same date repeated once per school, since every school has a row for every day. Slicing that index by date then returns rows from every school mixed together for that date, not a clean single time series. Filter to one entity first, exactly as <code>school_300</code> does above, then set the index.\n",
    "</div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "61",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 6 - Build Another School's Series</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Filter <code>attendance</code> to <code>school_id == 302</code>, set <code>date</code> as the index, and confirm the result's index is a <code>DatetimeIndex</code> with <code>isinstance(result.index, pd.DatetimeIndex)</code>.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>school_302 = attendance[attendance[\"school_id\"] == 302].set_index(\"date\")\n",
    "isinstance(school_302.index, pd.DatetimeIndex)</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: filter to school_id 302, set date as index, confirm DatetimeIndex\n",
    "..."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "63",
   "metadata": {},
   "source": [
    "## 7. Selecting Data from a Time Series\n",
    "\n",
    "A `DatetimeIndex` accepts a partial date string in `.loc`, matching every row that falls inside it. `\"2025-02\"` selects the whole month without spelling out the first and last day:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64",
   "metadata": {},
   "outputs": [],
   "source": [
    "school_300.loc[\"2025-02\"].head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "65",
   "metadata": {},
   "source": [
    "A slice with two partial dates selects everything between them, inclusive of both ends:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66",
   "metadata": {},
   "outputs": [],
   "source": [
    "school_300.loc[\"2025-02-01\":\"2025-02-07\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "67",
   "metadata": {},
   "source": [
    "<div style='background:#EAF7F0;border-left:5px solid #059669;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#059669;font-weight:bold'><i class=\"bi bi-journal-code\"></i> Example: Comparing the size of two date ranges</span><br><br>\n",
    "<code>len(school_300.loc[\"2025-01\"])</code> against <code>len(school_300.loc[\"2025-02\"])</code> confirms the row count for each month matches its number of business days, the same `bdate_range` weekday-only pattern used to build this dataset in the first place.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"January business days  : {len(school_300.loc['2025-01'])}\")\n",
    "print(f\"February business days : {len(school_300.loc['2025-02'])}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "69",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 7 - Filter the Last Two Weeks of Term</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Select every row in <code>school_300</code> from <code>\"2025-03-15\"</code> to <code>\"2025-03-28\"</code> inclusive, and print the mean <code>attendance_rate</code> over that range.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>last_two_weeks = school_300.loc[\"2025-03-15\":\"2025-03-28\"]\n",
    "last_two_weeks[\"attendance_rate\"].mean()</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: select 2025-03-15 through 2025-03-28 and print the mean attendance_rate\n",
    "..."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "71",
   "metadata": {},
   "source": [
    "## 8. The Power of Pandas: `resample`\n",
    "\n",
    "`resample` changes the time granularity of a series: daily data summarized into weekly or monthly figures, the same split-apply-combine idea from Part 3's `groupby`, except the groups are time intervals instead of category values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72",
   "metadata": {},
   "outputs": [],
   "source": [
    "weekly_attendance = school_300[\"attendance_rate\"].resample(\"W\").mean()\n",
    "weekly_attendance.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "73",
   "metadata": {},
   "source": [
    "<div style='background:#EAF3FA;border-left:5px solid #0369A1;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#0369A1;font-weight:bold'><i class=\"bi bi-info-circle-fill\"></i> Key Concept: <code>resample</code> groups by time interval, <code>groupby</code> groups by value</span><br><br>\n",
    "<code>df.groupby(\"program\")</code> (Part 3) splits rows by whatever value is already in the <code>caste</code> column. <code>series.resample(\"W\")</code> splits rows by which week their <code>DatetimeIndex</code> label falls into, intervals that did not exist as a column at all until <code>resample</code> created them. Both still end with an aggregation like <code>.mean()</code> to combine each group into one number.\n",
    "</div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "74",
   "metadata": {},
   "source": [
    "Monthly resampling on the same series needs only a different frequency string. In pandas 3 the old `\"M\"` alias was removed; the replacement is `\"ME\"` (month-end), which anchors each bucket to the last calendar day of the month:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75",
   "metadata": {},
   "outputs": [],
   "source": [
    "monthly_attendance = school_300[\"attendance_rate\"].resample(\"ME\").mean()\n",
    "monthly_attendance"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "76",
   "metadata": {},
   "source": [
    "<div style='background:#F5F3FF;border-left:5px solid #7C3AED;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#5B21B6;font-weight:bold'><i class=\"bi bi-lightbulb-fill\"></i> Pro Tip: Resampling the whole DataFrame keeps every entity separate, with care</span><br><br>\n",
    "<code>attendance.set_index(\"date\").groupby(\"school_id\")[\"attendance_rate\"].resample(\"W\").mean()</code> resamples each school's series independently in one call, instead of looping over schools and resampling each one by hand. <code>groupby</code> before <code>resample</code> is what keeps the schools from being averaged together.\n",
    "</div>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "77",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Activity 8 - Monthly Comparison Across Schools</span><br><br>\n",
    "\n",
    "<b>Goal:</b> Set <code>date</code> as the index on the full <code>attendance</code> table, group by <code>school_id</code>, and resample to monthly means in one chained call. Use <code>\"ME\"</code> (month-end) — the pandas 3 replacement for the old <code>\"M\"</code> alias.\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>attendance.set_index(\"date\").groupby(\"school_id\")[\"attendance_rate\"].resample(\"ME\").mean()</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: set date as index, group by school_id, resample monthly, take the mean\n",
    "..."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "79",
   "metadata": {},
   "source": [
    "## Capstone: Term-End Attendance Report\n",
    "\n",
    "Combine every operation from this notebook: parsing dates, building a per-school time series, slicing by date, and resampling, into one short report comparing the start and end of term."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "80",
   "metadata": {},
   "source": [
    "<div style='background:#EBF5F0;border-left:5px solid #009E73;padding:14px 18px;border-radius:6px;margin:16px 0'>\n",
    "<span style='color:#065F46;font-weight:bold'><i class=\"bi bi-puzzle-fill\"></i> Capstone Exercise - Term-End Attendance Report</span><br><br>\n",
    "\n",
    "<b>Goal:</b>\n",
    "<ol>\n",
    "<li>Set <code>date</code> as the index on the full <code>attendance</code> table (Sec. 2)</li>\n",
    "<li>Group by <code>school_id</code> and resample to weekly means (Sec. 4)</li>\n",
    "<li>From the result, select the first week of January and the last week of March for every school, using a partial date string (Sec. 3)</li>\n",
    "<li>Report which school had the largest drop in attendance between those two weeks</li>\n",
    "</ol>\n",
    "<pre style='background:#FFF8E1;padding:10px;border-radius:4px;font-size:0.9em'>weekly = attendance.set_index(\"date\").groupby(\"school_id\")[\"attendance_rate\"].resample(\"W\").mean()\n",
    "\n",
    "first_week = weekly.loc[:, \"2025-01-06\":\"2025-01-12\"]\n",
    "last_week = weekly.loc[:, \"2025-03-24\":\"2025-03-28\"]\n",
    "drop_per_school = first_week.groupby(\"school_id\").mean() - last_week.groupby(\"school_id\").mean()\n",
    "drop_per_school.sort_values(ascending=False)</pre>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81",
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: build the term-end attendance report described above\n",
    "..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82",
   "metadata": {},
   "source": [
    "## Further Reading\n",
    "\n",
    "| Resource | Why it matters |\n",
    "|---|---|\n",
    "| McKinney, W. (2022). *Python for Data Analysis*, 3rd ed. O'Reilly. | Chapter 8 (data wrangling) and Chapter 10 (aggregation) are the canonical references for `concat`, `merge`, and `groupby` |\n",
    "| [pandas documentation — Merge, join, concatenate and compare](https://pandas.pydata.org/docs/user_guide/merging.html) | Full API reference with worked examples for every `how=` variant and join edge-case |\n",
    "| Wickham, H. (2014). [Tidy data](https://doi.org/10.18637/jss.v059.i10). *Journal of Statistical Software* 59(10). | Free PDF — defines when data is \"tidy\" and explains `.melt()` / `.pivot()` as transformations toward tidy form |\n",
    "| [pandas documentation — Reshaping and pivot tables](https://pandas.pydata.org/docs/user_guide/reshaping.html) | `.pivot_table()`, `.stack()`, `.unstack()`, and `.crosstab()` in one place |\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "83",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "| Concept | Key rule |\n",
    "|---|---|\n",
    "| `pd.concat(axis=0)` | Stack rows; use when the same columns appear in separate batches |\n",
    "| `pd.concat(axis=1)` | Stack columns by index position, not by key; prefer `merge` when there is a real key |\n",
    "| `pd.merge(..., on=key, how=...)` | Combine two tables on a shared key, the same idea as a SQL join |\n",
    "| `how=\"inner\"` | Keep only rows with a match on both sides |\n",
    "| `how=\"left\"` | Keep every row from the left table; unmatched rows get `NaN` |\n",
    "| Many-to-one merge | Row count stays the same; each left row matches at most one right row |\n",
    "| One-to-many merge | Row count grows; each left row can match multiple right rows |\n",
    "| Fan-out check | Always compare `len(before)` vs `len(after)` when a row count increase is not expected |\n",
    "| `groupby(...)` | Returns a plan, not a result; nothing computes until you aggregate |\n",
    "| `.agg([...])` after `groupby` | Several statistics per group, in one call |\n",
    "| `pivot_table` | `groupby` with the second key laid out as columns instead of stacked as rows |\n",
    "| `pd.to_datetime` | Parses text into pandas' `datetime64` dtype; pandas 3 infers the resolution |\n",
    "| `Timestamp` | A single datetime value, with `.year`, `.month`, `.day_name()`, and similar attributes |\n",
    "| `DatetimeIndex` | Set a datetime column as the index to unlock date-based slicing |\n",
    "| Filter before indexing | Set a `DatetimeIndex` on one entity's rows, not a table mixing several entities |\n",
    "| `.loc[\"2025-02\"]` | A partial date string selects every row inside that period |\n",
    "| `.loc[start:end]` | A date range slice is inclusive of both ends |\n",
    "| `.resample(freq)` | Groups rows by time interval instead of by value, then needs an aggregation like `.mean()` |\n",
    "| `groupby(...).resample(...)` | Resample each entity's series independently in one chained call |\n",
    "\n",
    "**Next:** `11-polars.ipynb`, covering Polars' DataFrame, expression API, and lazy evaluation."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}