{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Linear Regression - Data Cleanup\n", "================================" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we get into the detailed steps involved in data cleanup of the Lending Club dataset. We glossed over this in the Data Exploration lesson and left it as an exercise. This is where we actually do the cleanup step by step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "1. Remove '%'\n", " * strip the last letter\n", " * map to float\n", "\n", "\n", "2. Remove ' months' suffix\n", " * strip the tailing letters\n", " * map to int\n", "\n", "3. Remove bad data\n", " * n/a values\n", " * outlier income\n", " * histograms\n", " * describe\n", "\n", "4. Map FICO.Range to single int\n", " * split on '-'\n", " * use bottom value\n", " * convert to int\n", " * show how to do mid-point\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Amount.RequestedAmount.Funded.By.InvestorsInterest.RateLoan.LengthLoan.PurposeDebt.To.Income.RatioStateHome.OwnershipMonthly.IncomeFICO.RangeOpen.CREDIT.LinesRevolving.CREDIT.BalanceInquiries.in.the.Last.6.MonthsEmployment.Length
81174 20000 20000.00 8.90% 36 months debt_consolidation 14.90% SC MORTGAGE 6541.67 735-739 14 14272 2 < 1 year
99592 19200 19200.00 12.12% 36 months debt_consolidation 28.36% TX MORTGAGE 4583.33 715-719 12 11140 1 2 years
80059 35000 35000.00 21.98% 60 months debt_consolidation 23.81% CA MORTGAGE 11500.00 690-694 14 21977 1 2 years
15825 10000 9975.00 9.99% 36 months debt_consolidation 14.30% KS MORTGAGE 3833.33 695-699 10 9346 0 5 years
33182 12000 12000.00 11.71% 36 months credit_card 18.78% NJ RENT 3195.00 695-699 11 14469 0 9 years
62403 6000 6000.00 15.31% 36 months other 20.05% CT OWN 4891.67 670-674 17 10391 2 3 years
48808 10000 10000.00 7.90% 36 months debt_consolidation 26.09% MA RENT 2916.67 720-724 10 15957 0 10+ years
22090 33500 33450.00 17.14% 60 months credit_card 14.70% LA MORTGAGE 13863.42 705-709 12 27874 0 10+ years
76404 14675 14675.00 14.33% 36 months credit_card 26.92% CA RENT 3150.00 685-689 9 7246 1 8 years
15867 7000 7000.00 6.91% 36 months credit_card 7.10% CA RENT 5000.00 715-719 8 7612 0 3 years
94971 2000 2000.00 19.72% 36 months moving 10.29% FL RENT 3575.00 670-674 10 12036 0 6 years
36911 10625 10625.00 14.27% 36 months debt_consolidation 12.54% CA MORTGAGE 4250.00 665-669 14 10767 0 < 1 year
41200 28000 27975.00 21.67% 60 months debt_consolidation 13.07% CT MORTGAGE 14166.67 670-674 12 10311 0 1 year
83869 35000 34950.00 8.90% 36 months debt_consolidation 20.46% CT RENT 9166.67 735-739 19 21536 0 1 year
53853 9600 9600.00 7.62% 36 months debt_consolidation 3.45% DC RENT 11250.00 725-729 13 4606 0 < 1 year
21399 25000 24975.00 15.65% 60 months debt_consolidation 21.99% CA RENT 5416.67 730-734 6 13929 0 9 years
62127 10000 10000.00 12.12% 36 months debt_consolidation 17.72% CA RENT 9000.00 695-699 18 20317 0 7 years
23446 14000 13900.25 10.37% 60 months debt_consolidation 11.95% OH RENT 4333.33 740-744 6 7419 0 9 years
44987 10000 10000.00 9.76% 36 months credit_card 7.13% FL RENT 2733.33 730-734 7 6112 2 3 years
17977 5200 5175.00 9.99% 60 months debt_consolidation 10.29% AL MORTGAGE 3750.00 760-764 10 16094 0 < 1 year
86099 22000 21975.00 21.98% 36 months debt_consolidation 11.19% TX MORTGAGE 6666.67 665-669 9 23124 0 10+ years
99483 30000 30000.00 19.05% 60 months credit_card 21.25% FL MORTGAGE 6250.00 695-699 12 34927 0 6 years
28798 6500 6500.00 17.99% 60 months car 19.63% FL RENT 4100.00 665-669 11 11697 1 2 years
24168 17400 17400.00 11.99% 36 months credit_card 12.47% AZ RENT 6833.33 695-699 7 26587 0 7 years
10356 4000 4000.00 16.82% 60 months vacation 13.71% GA MORTGAGE 4500.00 670-674 5 20804 0 3 years
46027 7200 7200.00 7.90% 36 months debt_consolidation 24.82% TX RENT 5416.67 705-709 8 12017 0 7 years
2238 8000 8000.00 14.42% 36 months debt_consolidation 24.63% MA RENT 2964.17 675-679 9 8928 2 6 years
65278 8000 8000.00 15.31% 36 months debt_consolidation 15.46% CA MORTGAGE 2916.67 675-679 13 7152 1 5 years
4227 3000 3000.00 8.59% 36 months other 3.72% MA MORTGAGE 4167.00 765-769 4 7074 0 5 years
50182 14500 14500.00 7.90% 36 months debt_consolidation 4.85% GA MORTGAGE 3958.33 760-764 4 9598 0 4 years
.............................................
84265 20000 20000.00 22.95% 60 months debt_consolidation 7.10% NJ RENT 6750.00 665-669 6 16104 1 7 years
80231 19000 19000.00 7.90% 36 months debt_consolidation 9.76% RI MORTGAGE 5166.67 770-774 18 43617 2 10+ years
49533 17300 17250.00 22.45% 60 months wedding 3.58% PA MORTGAGE 5500.00 685-689 11 2306 3 4 years
102514 7000 711.54 15.13% 36 months major_purchase 18.91% CO MORTGAGE 3833.00 650-654 13 12634 0 2 years
78618 7200 7200.00 18.75% 36 months debt_consolidation 16.21% MI RENT 8333.33 660-664 8 28916 0 10+ years
86953 10000 10000.00 14.09% 36 months major_purchase 9.71% CO RENT 4583.33 675-679 6 3859 0 < 1 year
80129 4000 3925.00 14.09% 36 months credit_card 12.27% FL MORTGAGE 8583.33 675-679 9 36943 1 10+ years
85216 17500 17500.00 8.90% 36 months debt_consolidation 10.94% UT MORTGAGE 25000.00 730-734 9 34545 0 6 years
38247 20000 20000.00 11.71% 36 months credit_card 9.58% SD MORTGAGE 5416.67 725-729 12 18267 1 4 years
91245 16200 16200.00 15.80% 60 months debt_consolidation 7.92% PA MORTGAGE 4833.33 680-684 12 12313 2 10+ years
53041 10000 10000.00 6.03% 36 months small_business 13.03% FL RENT 5000.00 760-764 8 3952 0 2 years
63051 27000 27000.00 6.62% 36 months debt_consolidation 12.21% OH MORTGAGE 9250.00 810-814 12 4211 0 5 years
14446 4500 4475.00 7.51% 36 months small_business 20.27% VA MORTGAGE 7075.50 720-724 15 68618 2 10+ years
68628 15875 15875.00 14.33% 36 months small_business 17.44% MD MORTGAGE 3416.67 675-679 11 15891 0 2 years
98758 15000 15000.00 10.16% 36 months credit_card 28.28% OH MORTGAGE 6666.67 690-694 15 14880 0 10+ years
13070 25000 24950.00 10.75% 36 months debt_consolidation 20.48% OR MORTGAGE 7083.33 765-769 10 25429 0 6 years
45836 7000 7000.00 17.27% 36 months other 18.38% NY OWN 2464.37 665-669 9 7089 0 3 years
52330 15000 15000.00 19.99% 36 months wedding 18.05% CA RENT 8000.00 660-664 6 45976 1 2 years
48243 17000 17000.00 15.81% 36 months debt_consolidation 17.01% CO RENT 3833.33 685-689 6 15484 1 6 years
63256 19075 19075.00 18.75% 36 months debt_consolidation 15.23% NY RENT 5166.67 670-674 17 13749 3 10+ years
42124 10000 10000.00 11.71% 36 months debt_consolidation 8.40% CA RENT 4500.00 710-714 8 8404 1 3 years
78043 8475 8475.00 7.62% 36 months debt_consolidation 15.88% CA RENT 3983.33 720-724 9 6882 0 n/a
925 6400 6350.00 10.08% 36 months debt_consolidation 8.11% NJ MORTGAGE 5166.67 710-714 5 5815 2 10+ years
74047 30000 30000.00 23.28% 60 months other 12.10% IL MORTGAGE 7083.33 675-679 16 17969 1 10+ years
49957 24000 23975.00 14.65% 36 months debt_consolidation 15.29% WA MORTGAGE 6666.67 685-689 13 17521 0 5 years
23735 30000 29950.00 16.77% 60 months debt_consolidation 19.23% NY MORTGAGE 9250.00 705-709 15 45880 1 8 years
65882 16000 16000.00 14.09% 60 months home_improvement 21.54% MD OWN 8903.25 740-744 18 18898 1 10+ years
55610 10000 10000.00 13.99% 36 months debt_consolidation 4.89% PA MORTGAGE 2166.67 680-684 4 4544 0 10+ years
38576 6000 6000.00 12.42% 36 months major_purchase 16.66% NJ RENT 3500.00 675-679 8 7753 0 5 years
3116 9000 5242.75 13.79% 36 months debt_consolidation 6.76% NY RENT 3875.00 670-674 7 7589 0 10+ years
\n", "

2500 rows × 14 columns

\n", "
```\n",
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 2500 entries, 81174 to 3116\n",
"Data columns (total 14 columns):\n",
"Amount.Requested                  2500  non-null values\n",
"Amount.Funded.By.Investors        2500  non-null values\n",
"Interest.Rate                     2500  non-null values\n",
"Loan.Length                       2500  non-null values\n",
"Loan.Purpose                      2500  non-null values\n",
"Debt.To.Income.Ratio              2500  non-null values\n",
"State                             2500  non-null values\n",
"Home.Ownership                    2500  non-null values\n",
"Monthly.Income                    2499  non-null values\n",
"FICO.Range                        2500  non-null values\n",
"Open.CREDIT.Lines                 2498  non-null values\n",
"Revolving.CREDIT.Balance          2498  non-null values\n",
"Inquiries.in.the.Last.6.Months    2498  non-null values\n",
"Employment.Length                 2500  non-null values\n",
"dtypes: float64(5), int64(1), object(8)\n",
"```
```\n",
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 2500 entries, 0 to 2499\n",
"Data columns (total 15 columns):\n",
"index                             2500  non-null values\n",
"Amount.Requested                  2500  non-null values\n",
"Amount.Funded.By.Investors        2500  non-null values\n",
"Interest.Rate                     2500  non-null values\n",
"Loan.Length                       2500  non-null values\n",
"Loan.Purpose                      2500  non-null values\n",
"Debt.To.Income.Ratio              2500  non-null values\n",
"State                             2500  non-null values\n",
"Home.Ownership                    2500  non-null values\n",
"Monthly.Income                    2499  non-null values\n",
"FICO.Range                        2500  non-null values\n",
"Open.CREDIT.Lines                 2498  non-null values\n",
"Revolving.CREDIT.Balance          2498  non-null values\n",
"Inquiries.in.the.Last.6.Months    2498  non-null values\n",
"Employment.Length                 2500  non-null values\n",
"dtypes: float64(5), int64(2), object(8)\n",
"```
" ], "text/plain": [ "\n", "Int64Index: 2500 entries, 0 to 2499\n", "Data columns (total 15 columns):\n", "index 2500 non-null values\n", "Amount.Requested 2500 non-null values\n", "Amount.Funded.By.Investors 2500 non-null values\n", "Interest.Rate 2500 non-null values\n", "Loan.Length 2500 non-null values\n", "Loan.Purpose 2500 non-null values\n", "Debt.To.Income.Ratio 2500 non-null values\n", "State 2500 non-null values\n", "Home.Ownership 2500 non-null values\n", "Monthly.Income 2499 non-null values\n", "FICO.Range 2500 non-null values\n", "Open.CREDIT.Lines 2498 non-null values\n", "Revolving.CREDIT.Balance 2498 non-null values\n", "Inquiries.in.the.Last.6.Months 2498 non-null values\n", "Employment.Length 2500 non-null values\n", "dtypes: float64(5), int64(2), object(8)" ] }, "execution_count": 347, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ldb" ] }, { "cell_type": "code", "execution_count": 348, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ldb['Interest.Rate'] = flrate" ] }, { "cell_type": "code", "execution_count": 349, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
```\n",
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 5 entries, 0 to 4\n",
"Data columns (total 15 columns):\n",
"index                             5  non-null values\n",
"Amount.Requested                  5  non-null values\n",
"Amount.Funded.By.Investors        5  non-null values\n",
"Interest.Rate                     5  non-null values\n",
"Loan.Length                       5  non-null values\n",
"Loan.Purpose                      5  non-null values\n",
"Debt.To.Income.Ratio              5  non-null values\n",
"State                             5  non-null values\n",
"Home.Ownership                    5  non-null values\n",
"Monthly.Income                    5  non-null values\n",
"FICO.Range                        5  non-null values\n",
"Open.CREDIT.Lines                 5  non-null values\n",
"Revolving.CREDIT.Balance          5  non-null values\n",
"Inquiries.in.the.Last.6.Months    5  non-null values\n",
"Employment.Length                 5  non-null values\n",
"dtypes: float64(6), int64(2), object(7)\n",
"```
```\n",
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 2498 entries, 0 to 2499\n",
"Data columns (total 15 columns):\n",
"index                             2498  non-null values\n",
"Amount.Requested                  2498  non-null values\n",
"Amount.Funded.By.Investors        2498  non-null values\n",
"Interest.Rate                     2498  non-null values\n",
"Loan.Length                       2498  non-null values\n",
"Loan.Purpose                      2498  non-null values\n",
"Debt.To.Income.Ratio              2498  non-null values\n",
"State                             2498  non-null values\n",
"Home.Ownership                    2498  non-null values\n",
"Monthly.Income                    2498  non-null values\n",
"FICO.Range                        2498  non-null values\n",
"Open.CREDIT.Lines                 2497  non-null values\n",
"Revolving.CREDIT.Balance          2497  non-null values\n",
"Inquiries.in.the.Last.6.Months    2497  non-null values\n",
"Employment.Length                 2498  non-null values\n",
"dtypes: float64(6), int64(2), object(7)\n",
"```
" ], "text/plain": [ "\n", "Int64Index: 2498 entries, 0 to 2499\n", "Data columns (total 15 columns):\n", "index 2498 non-null values\n", "Amount.Requested 2498 non-null values\n", "Amount.Funded.By.Investors 2498 non-null values\n", "Interest.Rate 2498 non-null values\n", "Loan.Length 2498 non-null values\n", "Loan.Purpose 2498 non-null values\n", "Debt.To.Income.Ratio 2498 non-null values\n", "State 2498 non-null values\n", "Home.Ownership 2498 non-null values\n", "Monthly.Income 2498 non-null values\n", "FICO.Range 2498 non-null values\n", "Open.CREDIT.Lines 2497 non-null values\n", "Revolving.CREDIT.Balance 2497 non-null values\n", "Inquiries.in.the.Last.6.Months 2497 non-null values\n", "Employment.Length 2498 non-null values\n", "dtypes: float64(6), int64(2), object(7)" ] }, "execution_count": 377, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ldlt100" ] }, { "cell_type": "code", "execution_count": 378, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2498" ] }, "execution_count": 378, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ldlt100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now drop any rows that have 'NA' values ie data not available. \n", "In database terminiology these would be 'NULL' values." ] }, { "cell_type": "code", "execution_count": 379, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ldb2 = ldlt100.dropna()" ] }, { "cell_type": "code", "execution_count": 380, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
```\n",
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 2497 entries, 0 to 2499\n",
"Data columns (total 15 columns):\n",
"index                             2497  non-null values\n",
"Amount.Requested                  2497  non-null values\n",
"Amount.Funded.By.Investors        2497  non-null values\n",
"Interest.Rate                     2497  non-null values\n",
"Loan.Length                       2497  non-null values\n",
"Loan.Purpose                      2497  non-null values\n",
"Debt.To.Income.Ratio              2497  non-null values\n",
"State                             2497  non-null values\n",
"Home.Ownership                    2497  non-null values\n",
"Monthly.Income                    2497  non-null values\n",
"FICO.Range                        2497  non-null values\n",
"Open.CREDIT.Lines                 2497  non-null values\n",
"Revolving.CREDIT.Balance          2497  non-null values\n",
"Inquiries.in.the.Last.6.Months    2497  non-null values\n",
"Employment.Length                 2497  non-null values\n",
"dtypes: float64(6), int64(2), object(7)\n",
"```