{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Baseline for the challenge DCRCL\n", "### Author - Pulkit Gera" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/DCRCL_baseline.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!pip install numpy\n", "!pip install pandas\n", "!pip install sklearn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import necessary packages\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split \n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn import metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download data\n", "The first step is to download out train test data. We will be training a classifier on the train data and make predictions on test data. We submit our predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_dcrcl/data/public/test.csv\n", "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_dcrcl/data/public/train.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data\n", "We use pandas library to load our data. Pandas loads them into dataframes which helps us analyze our data easily. Learn more about it [here](https://www.tutorialspoint.com/python_pandas/index.htm)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train_data = pd.read_csv('train.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyse Data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | LIMIT_BAL | \n", "SEX | \n", "EDUCATION | \n", "MARRIAGE | \n", "AGE | \n", "PAY_0 | \n", "PAY_2 | \n", "PAY_3 | \n", "PAY_4 | \n", "PAY_5 | \n", "... | \n", "BILL_AMT4 | \n", "BILL_AMT5 | \n", "BILL_AMT6 | \n", "PAY_AMT1 | \n", "PAY_AMT2 | \n", "PAY_AMT3 | \n", "PAY_AMT4 | \n", "PAY_AMT5 | \n", "PAY_AMT6 | \n", "default payment next month | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "30000 | \n", "2 | \n", "2 | \n", "1 | \n", "38 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "22810 | \n", "25772 | \n", "26360 | \n", "1650 | \n", "1700 | \n", "1400 | \n", "3355 | \n", "1146 | \n", "0 | \n", "0 | \n", "
1 | \n", "170000 | \n", "1 | \n", "4 | \n", "1 | \n", "28 | \n", "0 | \n", "0 | \n", "0 | \n", "-1 | \n", "-1 | \n", "... | \n", "11760 | \n", "0 | \n", "4902 | \n", "14000 | \n", "5695 | \n", "11760 | \n", "0 | \n", "4902 | \n", "6000 | \n", "0 | \n", "
2 | \n", "340000 | \n", "1 | \n", "1 | \n", "2 | \n", "38 | \n", "0 | \n", "0 | \n", "0 | \n", "-1 | \n", "-1 | \n", "... | \n", "1680 | \n", "1920 | \n", "9151 | \n", "5000 | \n", "7785 | \n", "1699 | \n", "1920 | \n", "9151 | \n", "187000 | \n", "0 | \n", "
3 | \n", "140000 | \n", "2 | \n", "2 | \n", "2 | \n", "29 | \n", "0 | \n", "0 | \n", "0 | \n", "2 | \n", "0 | \n", "... | \n", "65861 | \n", "64848 | \n", "64936 | \n", "3000 | \n", "8600 | \n", "6 | \n", "2500 | \n", "2500 | \n", "2500 | \n", "0 | \n", "
4 | \n", "130000 | \n", "2 | \n", "2 | \n", "1 | \n", "42 | \n", "2 | \n", "2 | \n", "2 | \n", "0 | \n", "0 | \n", "... | \n", "126792 | \n", "103497 | \n", "96991 | \n", "6400 | \n", "0 | \n", "4535 | \n", "3900 | \n", "4300 | \n", "3700 | \n", "1 | \n", "
5 rows × 24 columns
\n", "\n", " | LIMIT_BAL | \n", "SEX | \n", "EDUCATION | \n", "MARRIAGE | \n", "AGE | \n", "PAY_0 | \n", "PAY_2 | \n", "PAY_3 | \n", "PAY_4 | \n", "PAY_5 | \n", "... | \n", "BILL_AMT4 | \n", "BILL_AMT5 | \n", "BILL_AMT6 | \n", "PAY_AMT1 | \n", "PAY_AMT2 | \n", "PAY_AMT3 | \n", "PAY_AMT4 | \n", "PAY_AMT5 | \n", "PAY_AMT6 | \n", "default payment next month | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "... | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "2.550000e+04 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "25500.000000 | \n", "
mean | \n", "167436.458039 | \n", "1.604667 | \n", "1.852824 | \n", "1.551961 | \n", "35.503333 | \n", "-0.016275 | \n", "-0.131882 | \n", "-0.166706 | \n", "-0.218667 | \n", "-0.264157 | \n", "... | \n", "43139.224941 | \n", "40252.920588 | \n", "38846.415529 | \n", "5690.801373 | \n", "5.986709e+03 | \n", "5246.605294 | \n", "4829.790078 | \n", "4810.296706 | \n", "5187.016549 | \n", "0.220902 | \n", "
std | \n", "129837.118639 | \n", "0.488932 | \n", "0.791803 | \n", "0.522754 | \n", "9.235048 | \n", "1.126813 | \n", "1.196710 | \n", "1.192883 | \n", "1.168375 | \n", "1.132166 | \n", "... | \n", "64214.508636 | \n", "60789.101393 | \n", "59397.443604 | \n", "17070.733348 | \n", "2.402498e+04 | \n", "18117.236738 | \n", "16021.336645 | \n", "15505.873498 | \n", "17568.450557 | \n", "0.414863 | \n", "
min | \n", "10000.000000 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "21.000000 | \n", "-2.000000 | \n", "-2.000000 | \n", "-2.000000 | \n", "-2.000000 | \n", "-2.000000 | \n", "... | \n", "-170000.000000 | \n", "-81334.000000 | \n", "-209051.000000 | \n", "0.000000 | \n", "0.000000e+00 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
25% | \n", "50000.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "28.000000 | \n", "-1.000000 | \n", "-1.000000 | \n", "-1.000000 | \n", "-1.000000 | \n", "-1.000000 | \n", "... | \n", "2360.000000 | \n", "1779.250000 | \n", "1280.000000 | \n", "1000.000000 | \n", "8.635000e+02 | \n", "390.000000 | \n", "292.750000 | \n", "256.750000 | \n", "113.750000 | \n", "0.000000 | \n", "
50% | \n", "140000.000000 | \n", "2.000000 | \n", "2.000000 | \n", "2.000000 | \n", "34.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "19033.000000 | \n", "18085.000000 | \n", "17129.000000 | \n", "2100.000000 | \n", "2.010000e+03 | \n", "1800.000000 | \n", "1500.000000 | \n", "1500.000000 | \n", "1500.000000 | \n", "0.000000 | \n", "
75% | \n", "240000.000000 | \n", "2.000000 | \n", "2.000000 | \n", "2.000000 | \n", "42.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "54084.750000 | \n", "50080.750000 | \n", "49110.500000 | \n", "5006.000000 | \n", "5.000000e+03 | \n", "4507.000000 | \n", "4001.250000 | \n", "4024.000000 | \n", "4000.000000 | \n", "0.000000 | \n", "
max | \n", "1000000.000000 | \n", "2.000000 | \n", "6.000000 | \n", "3.000000 | \n", "79.000000 | \n", "8.000000 | \n", "8.000000 | \n", "8.000000 | \n", "8.000000 | \n", "8.000000 | \n", "... | \n", "891586.000000 | \n", "927171.000000 | \n", "961664.000000 | \n", "873552.000000 | \n", "1.684259e+06 | \n", "896040.000000 | \n", "621000.000000 | \n", "426529.000000 | \n", "527143.000000 | \n", "1.000000 | \n", "
8 rows × 24 columns
\n", "\n", " | Actual | \n", "Predicted | \n", "
---|---|---|
6913 | \n", "0 | \n", "0 | \n", "
11124 | \n", "0 | \n", "0 | \n", "
25100 | \n", "1 | \n", "0 | \n", "
2764 | \n", "0 | \n", "0 | \n", "
23216 | \n", "0 | \n", "0 | \n", "
17269 | \n", "0 | \n", "0 | \n", "
3073 | \n", "0 | \n", "0 | \n", "
8184 | \n", "0 | \n", "0 | \n", "
2595 | \n", "0 | \n", "0 | \n", "
5483 | \n", "0 | \n", "0 | \n", "
6508 | \n", "0 | \n", "0 | \n", "
11776 | \n", "0 | \n", "0 | \n", "
5306 | \n", "0 | \n", "0 | \n", "
18846 | \n", "0 | \n", "0 | \n", "
19854 | \n", "0 | \n", "0 | \n", "
2463 | \n", "0 | \n", "0 | \n", "
5304 | \n", "0 | \n", "0 | \n", "
23739 | \n", "0 | \n", "0 | \n", "
20427 | \n", "0 | \n", "0 | \n", "
20263 | \n", "0 | \n", "0 | \n", "
9578 | \n", "0 | \n", "0 | \n", "
14164 | \n", "0 | \n", "0 | \n", "
5107 | \n", "0 | \n", "0 | \n", "
5160 | \n", "0 | \n", "0 | \n", "
8450 | \n", "0 | \n", "0 | \n", "