{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Baseline submission for the challenge DBSRA\n", "### Author - Pulkit Gera" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/DBSRA_baseline.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!pip install numpy\n", "!pip install pandas\n", "!pip install sklearn" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split \n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.preprocessing import LabelEncoder\n", "from sklearn import metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download data\n", "The first step is to download out train test data. We will be training a classifier on the train data and make predictions on test data. We submit our predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_dbsra/data/public/test.csv\n", "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_dbsra/data/public/train.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data\n", "We use pandas library to load our data. Pandas loads them into dataframes which helps us analyze our data easily. Learn more about it [here](https://www.tutorialspoint.com/python_pandas/index.htm)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train_data = pd.read_csv('train.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean and Analyse Data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | race | \n", "gender | \n", "age | \n", "weight | \n", "admission_type_id | \n", "discharge_disposition_id | \n", "admission_source_id | \n", "time_in_hospital | \n", "payer_code | \n", "medical_specialty | \n", "... | \n", "citoglipton | \n", "insulin | \n", "glyburide-metformin | \n", "glipizide-metformin | \n", "glimepiride-pioglitazone | \n", "metformin-rosiglitazone | \n", "metformin-pioglitazone | \n", "change | \n", "diabetesMed | \n", "readmitted | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "AfricanAmerican | \n", "Female | \n", "[70-80) | \n", "? | \n", "1 | \n", "1 | \n", "7 | \n", "2 | \n", "? | \n", "InternalMedicine | \n", "... | \n", "No | \n", "Steady | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Yes | \n", "1 | \n", "
1 | \n", "Caucasian | \n", "Female | \n", "[90-100) | \n", "? | \n", "3 | \n", "1 | \n", "1 | \n", "8 | \n", "SP | \n", "Pulmonology | \n", "... | \n", "No | \n", "Down | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Ch | \n", "Yes | \n", "1 | \n", "
2 | \n", "Caucasian | \n", "Female | \n", "[80-90) | \n", "? | \n", "1 | \n", "2 | \n", "7 | \n", "1 | \n", "MC | \n", "Osteopath | \n", "... | \n", "No | \n", "Steady | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Yes | \n", "0 | \n", "
3 | \n", "Caucasian | \n", "Male | \n", "[60-70) | \n", "? | \n", "3 | \n", "1 | \n", "6 | \n", "6 | \n", "MC | \n", "Radiologist | \n", "... | \n", "No | \n", "Steady | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "Ch | \n", "Yes | \n", "0 | \n", "
4 | \n", "? | \n", "Female | \n", "[70-80) | \n", "? | \n", "1 | \n", "3 | \n", "6 | \n", "3 | \n", "UN | \n", "InternalMedicine | \n", "... | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "No | \n", "0 | \n", "
5 rows × 48 columns
\n", "\n", " | race | \n", "gender | \n", "age | \n", "weight | \n", "admission_type_id | \n", "discharge_disposition_id | \n", "admission_source_id | \n", "time_in_hospital | \n", "payer_code | \n", "medical_specialty | \n", "... | \n", "citoglipton | \n", "insulin | \n", "glyburide-metformin | \n", "glipizide-metformin | \n", "glimepiride-pioglitazone | \n", "metformin-rosiglitazone | \n", "metformin-pioglitazone | \n", "change | \n", "diabetesMed | \n", "readmitted | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "7 | \n", "1 | \n", "1 | \n", "1 | \n", "7 | \n", "2 | \n", "0 | \n", "19 | \n", "... | \n", "0 | \n", "2 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "
1 | \n", "3 | \n", "0 | \n", "9 | \n", "1 | \n", "3 | \n", "1 | \n", "1 | \n", "8 | \n", "15 | \n", "51 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
2 | \n", "3 | \n", "0 | \n", "8 | \n", "1 | \n", "1 | \n", "2 | \n", "7 | \n", "1 | \n", "8 | \n", "30 | \n", "... | \n", "0 | \n", "2 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "
3 | \n", "3 | \n", "1 | \n", "6 | \n", "1 | \n", "3 | \n", "1 | \n", "6 | \n", "6 | \n", "8 | \n", "52 | \n", "... | \n", "0 | \n", "2 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
4 | \n", "0 | \n", "0 | \n", "7 | \n", "1 | \n", "1 | \n", "3 | \n", "6 | \n", "3 | \n", "16 | \n", "19 | \n", "... | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
5 rows × 48 columns
\n", "\n", " | Actual | \n", "Predicted | \n", "
---|---|---|
26342 | \n", "1 | \n", "0 | \n", "
59142 | \n", "1 | \n", "0 | \n", "
57537 | \n", "1 | \n", "0 | \n", "
58128 | \n", "0 | \n", "0 | \n", "
29821 | \n", "1 | \n", "0 | \n", "
62897 | \n", "0 | \n", "0 | \n", "
43572 | \n", "0 | \n", "0 | \n", "
62329 | \n", "2 | \n", "0 | \n", "
44309 | \n", "0 | \n", "0 | \n", "
20882 | \n", "0 | \n", "0 | \n", "
49075 | \n", "0 | \n", "0 | \n", "
20668 | \n", "0 | \n", "0 | \n", "
76856 | \n", "1 | \n", "0 | \n", "
32858 | \n", "1 | \n", "1 | \n", "
74292 | \n", "1 | \n", "0 | \n", "
80549 | \n", "1 | \n", "0 | \n", "
8588 | \n", "1 | \n", "0 | \n", "
57768 | \n", "1 | \n", "0 | \n", "
10658 | \n", "1 | \n", "0 | \n", "
51569 | \n", "0 | \n", "0 | \n", "
59914 | \n", "1 | \n", "0 | \n", "
32874 | \n", "0 | \n", "0 | \n", "
54656 | \n", "1 | \n", "0 | \n", "
77456 | \n", "0 | \n", "0 | \n", "
35300 | \n", "0 | \n", "0 | \n", "
\n", " | race | \n", "gender | \n", "age | \n", "weight | \n", "admission_type_id | \n", "discharge_disposition_id | \n", "admission_source_id | \n", "time_in_hospital | \n", "payer_code | \n", "medical_specialty | \n", "... | \n", "examide | \n", "citoglipton | \n", "insulin | \n", "glyburide-metformin | \n", "glipizide-metformin | \n", "glimepiride-pioglitazone | \n", "metformin-rosiglitazone | \n", "metformin-pioglitazone | \n", "change | \n", "diabetesMed | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "3 | \n", "0 | \n", "7 | \n", "1 | \n", "1 | \n", "1 | \n", "6 | \n", "11 | \n", "15 | \n", "16 | \n", "... | \n", "0 | \n", "0 | \n", "2 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
1 | \n", "3 | \n", "1 | \n", "5 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "6 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
2 | \n", "3 | \n", "0 | \n", "6 | \n", "1 | \n", "3 | \n", "6 | \n", "1 | \n", "4 | \n", "6 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
3 | \n", "3 | \n", "1 | \n", "3 | \n", "1 | \n", "2 | \n", "1 | \n", "1 | \n", "12 | \n", "4 | \n", "10 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
4 | \n", "1 | \n", "0 | \n", "6 | \n", "1 | \n", "1 | \n", "2 | \n", "7 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
5 rows × 47 columns
\n", "