{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \ud83d\udcdd Exercise M1.01"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Imagine we are interested in predicting penguins species based on two of their\n",
    "body measurements: culmen length and culmen depth. First we want to do some\n",
    "data exploration to get a feel for the data.\n",
    "\n",
    "What are the features? What is the target?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is located in `../datasets/penguins_classification.csv`, load it with\n",
    "`pandas` into a `DataFrame`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Show a few samples of the data.\n",
    "\n",
    "How many features are numerical? How many features are categorical?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are the different penguins species available in the dataset and how many\n",
    "samples of each species are there? Hint: select the right column and use the\n",
    "[`value_counts`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html)\n",
    "method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot histograms for the numerical features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Show features distribution for each class. Hint: use\n",
    "[`seaborn.pairplot`](https://seaborn.pydata.org/generated/seaborn.pairplot.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looking at these distributions, how hard do you think it would be to classify\n",
    "the penguins only using `\"culmen depth\"` and `\"culmen length\"`?"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}