{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 04\n", "\n", "\n", "# Part 1 - Linear Regression\n", "\n", "Estimate a regression using the Income data\n", "\n", "\n", "## Forecast of income\n", "\n", "We'll be working with a dataset from US Census indome ([data dictionary](https://archive.ics.uci.edu/ml/datasets/Adult)).\n", "\n", "Many businesses would like to personalize their offer based on customer’s income. High-income customers could be, for instance, exposed to premium products. As a customer’s income is not always explicitly known, predictive model could estimate income of a person based on other information.\n", "\n", "Our goal is to create a predictive model that will be able to output an estimation of a person income." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Age | \n", "Workclass | \n", "fnlwgt | \n", "Education | \n", "Education-Num | \n", "Martial Status | \n", "Occupation | \n", "Relationship | \n", "Race | \n", "Sex | \n", "Capital Gain | \n", "Capital Loss | \n", "Hours per week | \n", "Country | \n", "Income | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "39 | \n", "State-gov | \n", "77516 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "Adm-clerical | \n", "Not-in-family | \n", "White | \n", "Male | \n", "2174 | \n", "0 | \n", "40 | \n", "United-States | \n", "51806.0 | \n", "
1 | \n", "50 | \n", "Self-emp-not-inc | \n", "83311 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "13 | \n", "United-States | \n", "68719.0 | \n", "
2 | \n", "38 | \n", "Private | \n", "215646 | \n", "HS-grad | \n", "9 | \n", "Divorced | \n", "Handlers-cleaners | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "51255.0 | \n", "
3 | \n", "53 | \n", "Private | \n", "234721 | \n", "11th | \n", "7 | \n", "Married-civ-spouse | \n", "Handlers-cleaners | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "47398.0 | \n", "
4 | \n", "28 | \n", "Private | \n", "338409 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Wife | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "Cuba | \n", "30493.0 | \n", "
\n", " | State | \n", "Account Length | \n", "Area Code | \n", "Phone | \n", "Int'l Plan | \n", "VMail Plan | \n", "VMail Message | \n", "Day Mins | \n", "Day Calls | \n", "Day Charge | \n", "... | \n", "Eve Calls | \n", "Eve Charge | \n", "Night Mins | \n", "Night Calls | \n", "Night Charge | \n", "Intl Mins | \n", "Intl Calls | \n", "Intl Charge | \n", "CustServ Calls | \n", "Churn? | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "KS | \n", "128 | \n", "415 | \n", "382-4657 | \n", "no | \n", "yes | \n", "25 | \n", "265.1 | \n", "110 | \n", "45.07 | \n", "... | \n", "99 | \n", "16.78 | \n", "244.7 | \n", "91 | \n", "11.01 | \n", "10.0 | \n", "3 | \n", "2.70 | \n", "1 | \n", "False. | \n", "
1 | \n", "OH | \n", "107 | \n", "415 | \n", "371-7191 | \n", "no | \n", "yes | \n", "26 | \n", "161.6 | \n", "123 | \n", "27.47 | \n", "... | \n", "103 | \n", "16.62 | \n", "254.4 | \n", "103 | \n", "11.45 | \n", "13.7 | \n", "3 | \n", "3.70 | \n", "1 | \n", "False. | \n", "
2 | \n", "NJ | \n", "137 | \n", "415 | \n", "358-1921 | \n", "no | \n", "no | \n", "0 | \n", "243.4 | \n", "114 | \n", "41.38 | \n", "... | \n", "110 | \n", "10.30 | \n", "162.6 | \n", "104 | \n", "7.32 | \n", "12.2 | \n", "5 | \n", "3.29 | \n", "0 | \n", "False. | \n", "
3 | \n", "OH | \n", "84 | \n", "408 | \n", "375-9999 | \n", "yes | \n", "no | \n", "0 | \n", "299.4 | \n", "71 | \n", "50.90 | \n", "... | \n", "88 | \n", "5.26 | \n", "196.9 | \n", "89 | \n", "8.86 | \n", "6.6 | \n", "7 | \n", "1.78 | \n", "2 | \n", "False. | \n", "
4 | \n", "OK | \n", "75 | \n", "415 | \n", "330-6626 | \n", "yes | \n", "no | \n", "0 | \n", "166.7 | \n", "113 | \n", "28.34 | \n", "... | \n", "122 | \n", "12.61 | \n", "186.9 | \n", "121 | \n", "8.41 | \n", "10.1 | \n", "3 | \n", "2.73 | \n", "3 | \n", "False. | \n", "
5 rows × 21 columns
\n", "