{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Baseline Submission for the Challenge SPCRT\n", "### Author - Pulkit Gera" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/SPCRT_baseline.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!pip install numpy\n", "!pip install pandas\n", "!pip install sklearn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import necessary packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split \n", "from sklearn.linear_model import LinearRegression\n", "from sklearn import metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download data\n", "The first step is to download out train test data. We will be training a classifier on the train data and make predictions on test data. We submit our predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_spcrt/data/public/test.csv\n", "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_spcrt/data/public/train.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data\n", "We use pandas library to load our data. Pandas loads them into dataframes which helps us analyze our data easily. Learn more about it [here](https://www.tutorialspoint.com/python_pandas/index.html)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train_data = pd.read_csv('train.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean and analyse the data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | number_of_elements | \n", "mean_atomic_mass | \n", "wtd_mean_atomic_mass | \n", "gmean_atomic_mass | \n", "wtd_gmean_atomic_mass | \n", "entropy_atomic_mass | \n", "wtd_entropy_atomic_mass | \n", "range_atomic_mass | \n", "wtd_range_atomic_mass | \n", "std_atomic_mass | \n", "... | \n", "wtd_mean_Valence | \n", "gmean_Valence | \n", "wtd_gmean_Valence | \n", "entropy_Valence | \n", "wtd_entropy_Valence | \n", "range_Valence | \n", "wtd_range_Valence | \n", "std_Valence | \n", "wtd_std_Valence | \n", "critical_temp | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "3 | \n", "86.299100 | \n", "65.789610 | \n", "64.984139 | \n", "49.765400 | \n", "0.836621 | \n", "1.013759 | \n", "146.88130 | \n", "20.950610 | \n", "63.713516 | \n", "... | \n", "3.500000 | \n", "3.301927 | \n", "3.464102 | \n", "1.088900 | \n", "0.971342 | \n", "1 | \n", "1.400000 | \n", "0.471405 | \n", "0.500000 | \n", "4.50 | \n", "
1 | \n", "5 | \n", "72.952854 | \n", "56.414763 | \n", "59.186241 | \n", "35.639703 | \n", "1.445795 | \n", "1.041520 | \n", "122.90607 | \n", "35.383159 | \n", "40.250192 | \n", "... | \n", "2.257143 | \n", "2.168944 | \n", "2.219783 | \n", "1.594167 | \n", "1.087480 | \n", "1 | \n", "1.131429 | \n", "0.400000 | \n", "0.437059 | \n", "7.60 | \n", "
2 | \n", "6 | \n", "82.318112 | \n", "99.033554 | \n", "53.069787 | \n", "71.259834 | \n", "1.427749 | \n", "1.324091 | \n", "192.98100 | \n", "40.196140 | \n", "70.933858 | \n", "... | \n", "4.300000 | \n", "3.203101 | \n", "3.772087 | \n", "1.647214 | \n", "1.510613 | \n", "5 | \n", "1.580000 | \n", "1.950783 | \n", "1.791647 | \n", "3.01 | \n", "
3 | \n", "4 | \n", "57.444449 | \n", "60.476650 | \n", "56.067907 | \n", "58.936797 | \n", "1.362775 | \n", "1.128041 | \n", "34.84360 | \n", "27.021980 | \n", "12.367487 | \n", "... | \n", "3.650000 | \n", "3.309751 | \n", "3.442623 | \n", "1.333736 | \n", "1.089489 | \n", "3 | \n", "1.800000 | \n", "1.118034 | \n", "1.194780 | \n", "14.10 | \n", "
4 | \n", "4 | \n", "76.517718 | \n", "56.808817 | \n", "59.310096 | \n", "35.773432 | \n", "1.197273 | \n", "0.981880 | \n", "122.90607 | \n", "34.833160 | \n", "44.289459 | \n", "... | \n", "2.264286 | \n", "2.213364 | \n", "2.226222 | \n", "1.368922 | \n", "1.048834 | \n", "1 | \n", "1.100000 | \n", "0.433013 | \n", "0.440952 | \n", "36.80 | \n", "
5 rows × 82 columns
\n", "\n", " | number_of_elements | \n", "mean_atomic_mass | \n", "wtd_mean_atomic_mass | \n", "gmean_atomic_mass | \n", "wtd_gmean_atomic_mass | \n", "entropy_atomic_mass | \n", "wtd_entropy_atomic_mass | \n", "range_atomic_mass | \n", "wtd_range_atomic_mass | \n", "std_atomic_mass | \n", "... | \n", "wtd_mean_Valence | \n", "gmean_Valence | \n", "wtd_gmean_Valence | \n", "entropy_Valence | \n", "wtd_entropy_Valence | \n", "range_Valence | \n", "wtd_range_Valence | \n", "std_Valence | \n", "wtd_std_Valence | \n", "critical_temp | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "... | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "18073.000000 | \n", "
mean | \n", "4.116527 | \n", "87.495853 | \n", "72.915281 | \n", "71.193951 | \n", "58.444208 | \n", "1.165612 | \n", "1.064409 | \n", "115.732133 | \n", "33.213727 | \n", "44.442844 | \n", "... | \n", "3.152312 | \n", "3.056546 | \n", "3.054714 | \n", "1.296028 | \n", "1.054028 | \n", "2.044708 | \n", "1.481685 | \n", "0.841078 | \n", "0.676041 | \n", "34.492796 | \n", "
std | \n", "1.439625 | \n", "29.586564 | \n", "33.320437 | \n", "30.920472 | \n", "36.470563 | \n", "0.365019 | \n", "0.401233 | \n", "54.718595 | \n", "26.886071 | \n", "20.068666 | \n", "... | \n", "1.189356 | \n", "1.043451 | \n", "1.172383 | \n", "0.392761 | \n", "0.380274 | \n", "1.242861 | \n", "0.976455 | \n", "0.485247 | \n", "0.455984 | \n", "34.307997 | \n", "
min | \n", "1.000000 | \n", "6.941000 | \n", "6.941000 | \n", "5.685033 | \n", "3.193745 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000210 | \n", "
25% | \n", "3.000000 | \n", "72.451240 | \n", "52.177725 | \n", "58.001648 | \n", "35.258590 | \n", "0.969858 | \n", "0.777619 | \n", "78.353150 | \n", "16.830450 | \n", "32.890369 | \n", "... | \n", "2.118056 | \n", "2.279705 | \n", "2.092115 | \n", "1.060857 | \n", "0.778998 | \n", "1.000000 | \n", "0.920286 | \n", "0.471405 | \n", "0.308515 | \n", "5.400000 | \n", "
50% | \n", "4.000000 | \n", "84.841880 | \n", "60.786693 | \n", "66.361592 | \n", "39.898482 | \n", "1.199541 | \n", "1.146366 | \n", "122.906070 | \n", "26.658401 | \n", "45.129500 | \n", "... | \n", "2.618182 | \n", "2.615321 | \n", "2.433589 | \n", "1.368922 | \n", "1.165410 | \n", "2.000000 | \n", "1.062667 | \n", "0.800000 | \n", "0.500000 | \n", "20.000000 | \n", "
75% | \n", "5.000000 | \n", "100.351275 | \n", "85.994130 | \n", "78.019689 | \n", "73.097796 | \n", "1.444537 | \n", "1.360442 | \n", "155.006000 | \n", "38.360375 | \n", "59.663892 | \n", "... | \n", "4.030000 | \n", "3.741657 | \n", "3.920517 | \n", "1.589027 | \n", "1.331926 | \n", "3.000000 | \n", "1.920000 | \n", "1.200000 | \n", "1.021023 | \n", "63.000000 | \n", "
max | \n", "9.000000 | \n", "208.980400 | \n", "208.980400 | \n", "208.980400 | \n", "208.980400 | \n", "1.983797 | \n", "1.958203 | \n", "207.972460 | \n", "205.589910 | \n", "101.019700 | \n", "... | \n", "7.000000 | \n", "7.000000 | \n", "7.000000 | \n", "2.141963 | \n", "1.949739 | \n", "6.000000 | \n", "6.992200 | \n", "3.000000 | \n", "3.000000 | \n", "185.000000 | \n", "
8 rows × 82 columns
\n", "\n", " | Coefficient | \n", "
---|---|
number_of_elements | \n", "-4.202422 | \n", "
mean_atomic_mass | \n", "0.833105 | \n", "
wtd_mean_atomic_mass | \n", "-0.881193 | \n", "
gmean_atomic_mass | \n", "-0.510610 | \n", "
wtd_gmean_atomic_mass | \n", "0.642180 | \n", "
\n", " | number_of_elements | \n", "mean_atomic_mass | \n", "wtd_mean_atomic_mass | \n", "gmean_atomic_mass | \n", "wtd_gmean_atomic_mass | \n", "entropy_atomic_mass | \n", "wtd_entropy_atomic_mass | \n", "range_atomic_mass | \n", "wtd_range_atomic_mass | \n", "std_atomic_mass | \n", "... | \n", "mean_Valence | \n", "wtd_mean_Valence | \n", "gmean_Valence | \n", "wtd_gmean_Valence | \n", "entropy_Valence | \n", "wtd_entropy_Valence | \n", "range_Valence | \n", "wtd_range_Valence | \n", "std_Valence | \n", "wtd_std_Valence | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2 | \n", "82.768190 | \n", "87.837285 | \n", "82.144935 | \n", "87.360109 | \n", "0.685627 | \n", "0.509575 | \n", "20.27638 | \n", "51.522285 | \n", "10.138190 | \n", "... | \n", "4.50 | \n", "4.750000 | \n", "4.472136 | \n", "4.728708 | \n", "0.686962 | \n", "0.514653 | \n", "1 | \n", "2.750000 | \n", "0.500000 | \n", "0.433013 | \n", "
1 | \n", "4 | \n", "76.444563 | \n", "81.456750 | \n", "59.356672 | \n", "68.229617 | \n", "1.199541 | \n", "1.108189 | \n", "121.32760 | \n", "36.950657 | \n", "43.823354 | \n", "... | \n", "2.25 | \n", "2.142857 | \n", "2.213364 | \n", "2.119268 | \n", "1.368922 | \n", "1.309526 | \n", "1 | \n", "0.571429 | \n", "0.433013 | \n", "0.349927 | \n", "
2 | \n", "5 | \n", "88.936744 | \n", "51.090431 | \n", "70.358975 | \n", "34.783991 | \n", "1.445824 | \n", "1.525092 | \n", "122.90607 | \n", "10.438667 | \n", "46.482335 | \n", "... | \n", "2.40 | \n", "2.114679 | \n", "2.352158 | \n", "2.095193 | \n", "1.589027 | \n", "1.314189 | \n", "1 | \n", "0.967890 | \n", "0.489898 | \n", "0.318634 | \n", "
3 | \n", "4 | \n", "76.517718 | \n", "56.149432 | \n", "59.310096 | \n", "35.562124 | \n", "1.197273 | \n", "1.042132 | \n", "122.90607 | \n", "31.920690 | \n", "44.289459 | \n", "... | \n", "2.25 | \n", "2.251429 | \n", "2.213364 | \n", "2.214646 | \n", "1.368922 | \n", "1.078855 | \n", "1 | \n", "1.074286 | \n", "0.433013 | \n", "0.433834 | \n", "
4 | \n", "3 | \n", "104.608490 | \n", "89.558979 | \n", "101.719818 | \n", "88.481210 | \n", "1.070258 | \n", "0.944284 | \n", "59.94547 | \n", "33.541423 | \n", "25.225148 | \n", "... | \n", "5.00 | \n", "5.811245 | \n", "4.762203 | \n", "5.743954 | \n", "1.054920 | \n", "0.803990 | \n", "3 | \n", "3.024096 | \n", "1.414214 | \n", "0.728448 | \n", "
5 rows × 81 columns
\n", "