{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "---\n", "title: \"Part 15: Type Annotations\"\n", "---" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/sambaiga/ds-mlops-path/blob/main/tutorials/02-dev-tools/03-type-annotations.ipynb) [](https://raw.githubusercontent.com/sambaiga/ds-mlops-path/main/tutorials/02-dev-tools/03-type-annotations.ipynb)\n" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "**DS-MLOps Dev Tools**\n", "\n", "**Python 3.12+ | Author: Anthony Faustine**\n", "\n", "## Before you begin\n", "\n", "This notebook assumes you have completed [Part 13: Project Setup with uv](01-uv-project-setup.qmd) and [Part 14: Code Quality with ruff](02-code-quality-ruff.qmd). The `grade-predictor` project from those chapters is the codebase we annotate here.\n", "\n", "This is the one notebook in Part 3, because type annotations are pure Python: running annotated functions live shows the gap between what Python accepts at runtime and what a type checker would flag statically. Every example is executable.\n", "\n", "> Callout markers used throughout this notebook are explained on the [book cover page](../../index.qmd#callout-guide).\n" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "::: {.callout-note collapse=\"true\" icon=false}\n", "## Learning Objectives\n", "\n", "By the end of Part 15 you will be able to:\n", "\n", "| # | Skill | Covered in |\n", "|---|---|---|\n", "| 1 | Explain why type annotations matter in DS code | Sec. 1 |\n", "| 2 | Write annotated function signatures with basic types | Sec. 2 |\n", "| 3 | Annotate numpy arrays with `NDArray` and pandas DataFrames | Sec. 3 |\n", "| 4 | Use `TypeAlias` and `Protocol` for complex DS types | Sec. 4 |\n", "| 5 | Interpret `ty check` output and fix type errors | Sec. 5 |\n", "| 6 | Apply gradual typing: where to start and what to skip | Sec. 6 |\n", ":::\n" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## 1. Why Type Annotations Matter" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "Two versions of the same function:\n", "\n", "```python\n", "# Without annotations\n", "def compute_grade(midterm, final, project, weights):\n", " ...\n", "\n", "# With annotations\n", "def compute_grade(\n", " midterm: float,\n", " final: float,\n", " project: float,\n", " weights: tuple[float, float, float] = (0.30, 0.45, 0.25),\n", ") -> float:\n", " ...\n", "```\n", "\n", "The annotated version is self-documenting: any editor with a type checker installed will warn you the moment you pass `\"82\"` instead of `82.0`. The unannotated version silently computes `\"82\" * 0.30 = \"82828282828282828282828282828282828282828282828282828282828282\"`. That is a real Python behavior, not a hypothetical.\n", "\n", "Python does not enforce annotations at runtime. That is the job of a static type checker. The annotation is documentation that a machine can check.\n" ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "
ty reads them and flags type mismatches before the code runs.\n",
"float | None parameter for a nullable score, one that returns dict[str, float], and one that takes a list[str] of column names.\n",
"def normalize_score(raw, min_val, max_val): ...\n",
"def compute_cohort_summary(scores): ... # returns dict\n",
"def select_columns(df, columns): ... # columns is list[str]\n",
"pd.DataFrame is practical; pandera adds column typespd.DataFrame is a useful annotation even though it carries no column information. The next step is pandera.typing.DataFrame[Schema], which encodes column names and dtypes at the type level. Start with pd.DataFrame and graduate to pandera when you need column-level guarantees in a data pipeline.\n",
"NDArray[np.float64] and returns a normalized array, and one that takes a pd.DataFrame and returns a filtered pd.DataFrame. Confirm both run correctly on the sample DataFrame above.\n",
"def normalize_features(X: NDArray[np.float64]) -> NDArray[np.float64]: ...\n",
"def filter_passing(df: pd.DataFrame, threshold: float = 50.0) -> pd.DataFrame: ...\n",
"evaluate(model: Predictor, ...) accepts any object with predict and fit methods: sklearn's LinearRegression, XGBRegressor, a custom class. No import of sklearn needed in the type signature. This is structural subtyping, and it keeps your utility functions independent of any specific ML library.\n",
"core.py. Run uv run ty check src/. Fix every error (not warning) that ty reports in your own code. Confirm the output is clean before moving on.\n",
"uv run ty check src/\n",
"# Fix each error line by line\n",
"uv run ty check src/ # should report 0 errors\n",
"core.py: compute_grade, grade_to_letter, flag_at_risk, add_average_marksNDArray[np.float64] for any numpy array parameterspd.DataFrame and pd.Series for pandas typesuv run ty check src/ and bring it to zero errorsgit commit -m \"feat(types): fully annotate core.py\"uv run ty check src/\n",
"# Fix all errors\n",
"uv run ty check src/ # zero errors\n",
"