{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# \ud83d\udcdd Exercise M1.02\n", "\n", "The goal of this exercise is to fit a similar model as in the previous\n", "notebook to get familiar with manipulating scikit-learn objects and in\n", "particular the `.fit/.predict/.score` API." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load the adult census dataset with only numerical variables" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "adult_census = pd.read_csv(\"../datasets/adult-census-numeric.csv\")\n", "data = adult_census.drop(columns=\"class\")\n", "target = adult_census[\"class\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous notebook we used `model = KNeighborsClassifier()`. All\n", "scikit-learn models can be created without arguments. This is convenient\n", "because it means that you don't need to understand the full details of a model\n", "before starting to use it.\n", "\n", "One of the `KNeighborsClassifier` parameters is `n_neighbors`. It controls the\n", "number of neighbors we are going to use to make a prediction for a new data\n", "point.\n", "\n", "What is the default value of the `n_neighbors` parameter?\n", "\n", "**Hint**: Look at the documentation on the [scikit-learn\n", "website](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)\n", "or directly access the description inside your notebook by running the\n", "following cell. This opens a pager pointing to the documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.neighbors import KNeighborsClassifier\n", "\n", "KNeighborsClassifier?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a `KNeighborsClassifier` model with `n_neighbors=50`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fit this model on the data and target loaded above" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use your model to make predictions on the first 10 data points inside the\n", "data. Do they match the actual target values?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the accuracy on the training data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now load the test data from `\"../datasets/adult-census-numeric-test.csv\"` and\n", "compute the accuracy on the test data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write your code here." ] } ], "metadata": { "jupytext": { "main_language": "python" }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }