{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Agenda\n", "\n", "- Define the problem and the approach\n", "- Data basics: loading data, looking at your data, basic commands\n", "- Handling missing values\n", "- Intro to scikit-learn\n", "- Grouping and aggregating data\n", "- Feature selection\n", "-
Fitting and evaluating a model
\n", "- Deploying your work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#In this workbook you will\n", "- Create your own classifier for predicting delinquent customers\n", "- Build basic reports to help interpret the effectiveness of your model\n", "- Convert the results of your model to a credit score" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np\n", "import pylab as pl" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "train = pd.read_csv(\"./data/credit-data-trainingset.csv\")\n", "test = pd.read_csv(\"./data/credit-data-testset.csv\")\n", "test.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", " | serious_dlqin2yrs | \n", "revolving_utilization_of_unsecured_lines | \n", "age | \n", "number_of_time30-59_days_past_due_not_worse | \n", "debt_ratio | \n", "monthly_income | \n", "number_of_open_credit_lines_and_loans | \n", "number_of_times90_days_late | \n", "number_real_estate_loans_or_lines | \n", "number_of_time60-89_days_past_due_not_worse | \n", "number_of_dependents | \n", "monthly_income_imputed | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "0.233810 | \n", "30 | \n", "0 | \n", "0.036050 | \n", "3300 | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "6017 | \n", "
1 | \n", "1 | \n", "0.964673 | \n", "40 | \n", "3 | \n", "0.382965 | \n", "13700 | \n", "9 | \n", "3 | \n", "1 | \n", "1 | \n", "2 | \n", "2850 | \n", "
2 | \n", "0 | \n", "0.061086 | \n", "78 | \n", "0 | \n", "2058.000000 | \n", "2500 | \n", "10 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "2500 | \n", "
3 | \n", "0 | \n", "0.075427 | \n", "32 | \n", "0 | \n", "0.085512 | \n", "7916 | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "4145 | \n", "
4 | \n", "0 | \n", "0.046560 | \n", "58 | \n", "0 | \n", "0.241622 | \n", "2416 | \n", "9 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "2850 | \n", "
Predicted | \n", "0 | \n", "1 | \n", "
---|---|---|
Actual | \n", "\n", " | \n", " |
0 | \n", "35051 | \n", "18 | \n", "
1 | \n", "2494 | \n", "22 | \n", "
\n", " | serious_dlqin2yrs | \n", "revolving_utilization_of_unsecured_lines | \n", "age | \n", "number_of_time30-59_days_past_due_not_worse | \n", "debt_ratio | \n", "monthly_income | \n", "number_of_open_credit_lines_and_loans | \n", "number_of_times90_days_late | \n", "number_real_estate_loans_or_lines | \n", "number_of_time60-89_days_past_due_not_worse | \n", "number_of_dependents | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0.766127 | \n", "45 | \n", "2 | \n", "0.802982 | \n", "9120 | \n", "13 | \n", "0 | \n", "6 | \n", "0 | \n", "2 | \n", "
1 | \n", "0 | \n", "0.957151 | \n", "40 | \n", "0 | \n", "0.121876 | \n", "2600 | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "
2 | \n", "0 | \n", "0.658180 | \n", "38 | \n", "1 | \n", "0.085113 | \n", "3042 | \n", "2 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "0 | \n", "0.907239 | \n", "49 | \n", "1 | \n", "0.024926 | \n", "63588 | \n", "7 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
4 | \n", "0 | \n", "0.213179 | \n", "74 | \n", "0 | \n", "0.375607 | \n", "3500 | \n", "3 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "