{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " \n", "## [mlcourse.ai](https://mlcourse.ai) - Open Machine Learning Course\n", "\n", "\n", "Author: Vitaly Radchenko. All content is distributed under the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#
Assignment # 5 (demo)
\n", "##
Logistic Regression and Random Forest in the credit scoring problem
\n", "\n", "**Same assignment as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/a5-demo-logit-and-rf-for-credit-scoring-sol) + [solution](https://www.kaggle.com/kashnitsky/a5-demo-logit-and-rf-for-credit-scoring-sol).**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this assignment, you will build models and answer questions using data on credit scoring.\n", "\n", "Please write your code in the cells with the \"Your code here\" placeholder. Then, answer the questions in the [form](https://docs.google.com/forms/d/1gKt0DA4So8ohKAHZNCk58ezvg7K_tik26d9QND7WC6M/edit).\n", "\n", "Let's start with a warm-up exercise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 1.** There are 5 jurors in a courtroom. Each of them can correctly identify the guilt of the defendant with 70% probability, independent of one another. What is the probability that the jurors will jointly reach the correct verdict if the final decision is by majority vote?\n", "\n", "1. 70.00%\n", "2. 83.20%\n", "3. 83.70%\n", "4. 87.50%" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! Let's move on to machine learning.\n", "\n", "## Credit scoring problem setup\n", "\n", "#### Problem\n", "\n", "Predict whether the customer will repay their credit within 90 days. This is a binary classification problem; we will assign customers into good or bad categories based on our prediction.\n", "\n", "#### Data description\n", "\n", "| Feature | Variable Type | Value Type | Description |\n", "|:--------|:--------------|:-----------|:------------|\n", "| age | Input Feature | integer | Customer age |\n", "| DebtRatio | Input Feature | real | Total monthly loan payments (loan, alimony, etc.) / Total monthly income percentage |\n", "| NumberOfTime30-59DaysPastDueNotWorse | Input Feature | integer | The number of cases when client has overdue 30-59 days (not worse) on other loans during the last 2 years |\n", "| NumberOfTimes90DaysLate | Input Feature | integer | Number of cases when customer had 90+dpd overdue on other credits |\n", "| NumberOfTime60-89DaysPastDueNotWorse | Input Feature | integer | Number of cased when customer has 60-89dpd (not worse) during the last 2 years |\n", "| NumberOfDependents | Input Feature | integer | The number of customer dependents |\n", "| SeriousDlqin2yrs | Target Variable | binary:
0 or 1 | Customer hasn't paid the loan debt within 90 days |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's set up our environment:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Disable warnings in Anaconda\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "import numpy as np\n", "import pandas as pd\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from matplotlib import rcParams\n", "rcParams['figure.figsize'] = 11, 8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's write the function that will replace *NaN* values with the median for each column." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def fill_nan(table):\n", " for col in table.columns:\n", " table[col] = table[col].fillna(table[col].median())\n", " return table " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, read the data:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "