> [!NOTE] > **📌 Early release (2026)** > > MLSys·im shipped with the **2026** MLSysBook refresh. The analytical modeling framework, APIs, and lab integrations are **actively iterated** as we harden the package and teaching workflows. > > **Feedback** — [GitHub issues](https://github.com/harvard-edge/cs249r_book/issues) or pull requests. > > [](https://github.com/harvard-edge/cs249r_book/tree/dev) [](https://mlsysbook.ai)
A first-principles analytical modeling framework for ML systems.
Designed for education and early design-space reasoning before empirical benchmarking.
| Layer | Domain | Key Components |
|---|---|---|
| Layer A | Workload Representationmlsysim.models |
FLOPs, parameters, and intensity. e.g., Llama3_70B, ResNet50 |
| Layer B | Hardware Registrymlsysim.hardware |
Concrete specs for real-world silicon. e.g., H100, TPUv5p, Jetson |
| Layer C | Infrastructuremlsysim.infra |
Grid profiles and datacenter sustainability. e.g., PUE, Carbon Intensity, WUE |
| Layer D | Systems & Topologymlsysim.systems |
Fleet configurations and network fabrics. e.g., Doorbell, AutoDrive Scenarios |
| Layer E | Execution & Resolversmlsysim.core.solver |
The 3-tier math engine: Models, Solvers, and Optimizers (Design space search). |
| Concern | Why it matters | Where to learn more |
|---|---|---|
| Data drift / distribution shift | The #1 cause of production ML failures — model accuracy degrades silently as input distributions change | Sculley et al. (2015), "Hidden Technical Debt in ML Systems" |
| Model versioning & rollback | Production requires running multiple versions, A/B testing, and safe rollback | Huyen (2022), Designing Machine Learning Systems |
| Monitoring & observability | You cannot manage what you cannot measure — prediction distributions, latency percentiles, error rates | Google SRE Book (2016); Huyen (2022) |
| Feature store freshness | Stale features silently degrade real-time models (recommendations, fraud detection) | Uber Michelangelo (2017) |
| Software bugs & misconfigurations | Most outages are caused by software, not hardware | Barroso et al. (2018) |
| Human factors | Team velocity, on-call burden, and organizational alignment often dominate outcomes | Brooks (1975), The Mythical Man-Month |
| Scenario | Efficiency | Rationale |
|---|---|---|
| Training (Megatron-LM, large Transformer) | 0.40–0.55 | Well-optimized GEMM + FlashAttention |
| Training (PyTorch eager, small model) | 0.08–0.15 | Kernel launch overhead dominates |
| Inference decode, batch=1 | 0.01–0.05 | Memory-bound; compute nearly idle |
| Inference decode, batch=32+ | 0.15–0.35 | Batch amortizes weight loading |
| Inference prefill, long context | 0.30–0.50 | Compute-bound GEMM + attention |
| TinyML (TFLite Micro on ESP32) | 0.05–0.15 | Interpreter overhead, no tensor cores |
Vijay Janapa Reddi 🧑‍💻 🎨 ✍️ 🧠maintenance |
Peter Koellner 🪲 ✍️ |
Rocky 🪲 🧑‍💻 |
Zeljko Hrcek 🧑‍💻 |