{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise with bank marketing data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "- Data from the UCI Machine Learning Repository: [data](https://github.com/justmarkham/DAT8/blob/master/data/bank-additional.csv), [data dictionary](https://archive.ics.uci.edu/ml/datasets/Bank+Marketing)\n", "- **Goal:** Predict whether a customer will purchase a bank product marketed over the phone\n", "- `bank-additional.csv` is already in our repo, so there is no need to download the data from the UCI website" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Read the data into Pandas" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | age | \n", "job | \n", "marital | \n", "education | \n", "default | \n", "housing | \n", "loan | \n", "contact | \n", "month | \n", "day_of_week | \n", "... | \n", "campaign | \n", "pdays | \n", "previous | \n", "poutcome | \n", "emp.var.rate | \n", "cons.price.idx | \n", "cons.conf.idx | \n", "euribor3m | \n", "nr.employed | \n", "y | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "30 | \n", "blue-collar | \n", "married | \n", "basic.9y | \n", "no | \n", "yes | \n", "no | \n", "cellular | \n", "may | \n", "fri | \n", "... | \n", "2 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "-1.8 | \n", "92.893 | \n", "-46.2 | \n", "1.313 | \n", "5099.1 | \n", "no | \n", "
1 | \n", "39 | \n", "services | \n", "single | \n", "high.school | \n", "no | \n", "no | \n", "no | \n", "telephone | \n", "may | \n", "fri | \n", "... | \n", "4 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.855 | \n", "5191.0 | \n", "no | \n", "
2 | \n", "25 | \n", "services | \n", "married | \n", "high.school | \n", "no | \n", "yes | \n", "no | \n", "telephone | \n", "jun | \n", "wed | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.4 | \n", "94.465 | \n", "-41.8 | \n", "4.962 | \n", "5228.1 | \n", "no | \n", "
3 | \n", "38 | \n", "services | \n", "married | \n", "basic.9y | \n", "no | \n", "unknown | \n", "unknown | \n", "telephone | \n", "jun | \n", "fri | \n", "... | \n", "3 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.4 | \n", "94.465 | \n", "-41.8 | \n", "4.959 | \n", "5228.1 | \n", "no | \n", "
4 | \n", "47 | \n", "admin. | \n", "married | \n", "university.degree | \n", "no | \n", "yes | \n", "no | \n", "cellular | \n", "nov | \n", "mon | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "-0.1 | \n", "93.200 | \n", "-42.0 | \n", "4.191 | \n", "5195.8 | \n", "no | \n", "
5 rows × 21 columns
\n", "\n", " | count | \n", "mean | \n", "
---|---|---|
month | \n", "\n", " | \n", " |
dec | \n", "22 | \n", "0.545455 | \n", "
mar | \n", "48 | \n", "0.583333 | \n", "
sep | \n", "64 | \n", "0.406250 | \n", "
oct | \n", "69 | \n", "0.362319 | \n", "
apr | \n", "215 | \n", "0.167442 | \n", "
nov | \n", "446 | \n", "0.096413 | \n", "
jun | \n", "530 | \n", "0.128302 | \n", "
aug | \n", "636 | \n", "0.100629 | \n", "
jul | \n", "711 | \n", "0.082982 | \n", "
may | \n", "1378 | \n", "0.065312 | \n", "