{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# \ud83d\udcdd Exercise M1.01" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imagine we are interested in predicting penguins species based on two of their\n", "body measurements: culmen length and culmen depth. First we want to do some\n", "data exploration to get a feel for the data.\n", "\n", "What are the features? What is the target?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data is located in `../datasets/penguins_classification.csv`, load it with\n", "`pandas` into a `DataFrame`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show a few samples of the data.\n", "\n", "How many features are numerical? How many features are categorical?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are the different penguins species available in the dataset and how many\n", "samples of each species are there? Hint: select the right column and use the\n", "[`value_counts`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html)\n", "method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot histograms for the numerical features" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show features distribution for each class. Hint: use\n", "[`seaborn.pairplot`](https://seaborn.pydata.org/generated/seaborn.pairplot.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at these distributions, how hard do you think it would be to classify\n", "the penguins only using `\"culmen depth\"` and `\"culmen length\"`?" ] } ], "metadata": { "jupytext": { "main_language": "python" }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }