{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# \ud83d\udcdd Exercise M4.02\n", "\n", "In the previous notebook, we showed that we can add new features based on the\n", "original feature `x` to make the model more expressive, for instance `x ** 2` or\n", "`x ** 3`. In that case we only used a single feature in `data`.\n", "\n", "The aim of this notebook is to train a linear regression algorithm on a\n", "dataset with more than a single feature. In such a \"multi-dimensional\" feature\n", "space we can derive new features of the form `x1 * x2`, `x2 * x3`, etc.\n", "Products of features are usually called \"non-linear\" or \"multiplicative\"\n", "interactions between features.\n", "\n", "Feature engineering can be an important step of a model pipeline as long as\n", "the new features are expected to be predictive. For instance, think of a\n", "classification model to decide if a patient has risk of developing a heart\n", "disease. This would depend on the patient's Body Mass Index which is defined\n", "as `weight / height ** 2`.\n", "\n", "We load the dataset penguins dataset. We first use a set of 3 numerical\n", "features to predict the target, i.e. the body mass of the penguin." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Note
\n", "If you want a deeper overview regarding this dataset, you can refer to the\n", "Appendix - Datasets description section at the end of this MOOC.
\n", "