{ "cells": [ { "cell_type": "markdown", "id": "f7ab275c-1750-46cf-9740-10162e827152", "metadata": {}, "source": [ "# Overview\n", "\n", "The key goal of research data science is to learn from data. One of the\n", "most powerful methods of learning from data is **statistical\n", "modelling**.\n", "\n", "In modelling you build mathematical descriptions of the processes that\n", "generate your data. By doing so, researchers can see beyond the data and\n", "peek into the phenomeon that gave rise to your data.\n", "\n", "This module provides a high-level introduction to statistical modelling.\n", "We aim to demystify the key concepts involved, providing a foundational\n", "approach to modelling that one could apply to any modelling problem. In\n", "doing so we will also cover the main pitfalls that any modeller needs to\n", "contend with.\n", "\n", "Abstract concepts are better understood through application. Here we use\n", "simple models (linear and logistic regression) to bring modelling\n", "concepts to life, but the intended take-homes are not specific to any\n", "particular modelling technique.\n", "\n", "The module is structured as follows:\n", "\n", "- **The what and why of statistical modelling**. We begin by defining\n", " what modelling is and motivating the power of modelling.\n", "- **Fitting models**. Here we go through the components of a model,\n", " including describing how to fit one to data.\n", "- **Building a simple model**. We then carefully build a model based\n", " on the understanding of our data, taking care to understand the\n", " model.\n", "- **Evaluation a a model**. It is not enough to have a model that is\n", " fitted to your data. The model has to be useful. The final section\n", " will cover how to evaluate your model and iteratively improve upon\n", " your model.\n", "\n", "**References:**\n", "\n", "We will include more specific references as we move through the module.\n", "But useful accessible introductions to modelling that has inspired much\n", "of this module’s content are Poldrack’s [Statistical Thinking for the\n", "21st\n", "Century](https://web.stanford.edu/group/poldracklab/statsthinking21/index.html),\n", "Holmes and Huber’s [Modern Statistics for Modern\n", "Biology](https://web.stanford.edu/class/bios221/book/Chap-Models.html),\n", "as well as the introductory sections of Richard McElreath’s wonderfully\n", "readable [Statistical\n", "Rethinking](https://xcelab.net/rm/statistical-rethinking/) and Bishop’s\n", "classic [Machine Learning for Pattern\n", "Recognition](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf)\n", "textbook." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.4 64-bit", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.4" }, "vscode": { "interpreter": { "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" } } }, "nbformat": 4, "nbformat_minor": 5 }