{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Baseline Submission for the Challenge SPCRT\n", "### Author - Pulkit Gera" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ayushshivani/aicrowd_educational_baselines/blob/master/SPCRT_baseline.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!pip install numpy\n", "!pip install pandas\n", "!pip install sklearn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import necessary packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split \n", "from sklearn.linear_model import LinearRegression\n", "from sklearn import metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download data\n", "The first step is to download out train test data. We will be training a classifier on the train data and make predictions on test data. We submit our predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_spcrt/data/public/test.csv\n", "!wget https://s3.eu-central-1.wasabisys.com/aicrowd-public-datasets/aicrowd_educational_spcrt/data/public/train.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data\n", "We use pandas library to load our data. Pandas loads them into dataframes which helps us analyze our data easily. Learn more about it [here](https://www.tutorialspoint.com/python_pandas/index.html)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train_data = pd.read_csv('train.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean and analyse the data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_elementsmean_atomic_masswtd_mean_atomic_massgmean_atomic_masswtd_gmean_atomic_massentropy_atomic_masswtd_entropy_atomic_massrange_atomic_masswtd_range_atomic_massstd_atomic_mass...wtd_mean_Valencegmean_Valencewtd_gmean_Valenceentropy_Valencewtd_entropy_Valencerange_Valencewtd_range_Valencestd_Valencewtd_std_Valencecritical_temp
0386.29910065.78961064.98413949.7654000.8366211.013759146.8813020.95061063.713516...3.5000003.3019273.4641021.0889000.97134211.4000000.4714050.5000004.50
1572.95285456.41476359.18624135.6397031.4457951.041520122.9060735.38315940.250192...2.2571432.1689442.2197831.5941671.08748011.1314290.4000000.4370597.60
2682.31811299.03355453.06978771.2598341.4277491.324091192.9810040.19614070.933858...4.3000003.2031013.7720871.6472141.51061351.5800001.9507831.7916473.01
3457.44444960.47665056.06790758.9367971.3627751.12804134.8436027.02198012.367487...3.6500003.3097513.4426231.3337361.08948931.8000001.1180341.19478014.10
4476.51771856.80881759.31009635.7734321.1972730.981880122.9060734.83316044.289459...2.2642862.2133642.2262221.3689221.04883411.1000000.4330130.44095236.80
\n", "

5 rows × 82 columns

\n", "
" ], "text/plain": [ " number_of_elements mean_atomic_mass wtd_mean_atomic_mass \\\n", "0 3 86.299100 65.789610 \n", "1 5 72.952854 56.414763 \n", "2 6 82.318112 99.033554 \n", "3 4 57.444449 60.476650 \n", "4 4 76.517718 56.808817 \n", "\n", " gmean_atomic_mass wtd_gmean_atomic_mass entropy_atomic_mass \\\n", "0 64.984139 49.765400 0.836621 \n", "1 59.186241 35.639703 1.445795 \n", "2 53.069787 71.259834 1.427749 \n", "3 56.067907 58.936797 1.362775 \n", "4 59.310096 35.773432 1.197273 \n", "\n", " wtd_entropy_atomic_mass range_atomic_mass wtd_range_atomic_mass \\\n", "0 1.013759 146.88130 20.950610 \n", "1 1.041520 122.90607 35.383159 \n", "2 1.324091 192.98100 40.196140 \n", "3 1.128041 34.84360 27.021980 \n", "4 0.981880 122.90607 34.833160 \n", "\n", " std_atomic_mass ... wtd_mean_Valence gmean_Valence \\\n", "0 63.713516 ... 3.500000 3.301927 \n", "1 40.250192 ... 2.257143 2.168944 \n", "2 70.933858 ... 4.300000 3.203101 \n", "3 12.367487 ... 3.650000 3.309751 \n", "4 44.289459 ... 2.264286 2.213364 \n", "\n", " wtd_gmean_Valence entropy_Valence wtd_entropy_Valence range_Valence \\\n", "0 3.464102 1.088900 0.971342 1 \n", "1 2.219783 1.594167 1.087480 1 \n", "2 3.772087 1.647214 1.510613 5 \n", "3 3.442623 1.333736 1.089489 3 \n", "4 2.226222 1.368922 1.048834 1 \n", "\n", " wtd_range_Valence std_Valence wtd_std_Valence critical_temp \n", "0 1.400000 0.471405 0.500000 4.50 \n", "1 1.131429 0.400000 0.437059 7.60 \n", "2 1.580000 1.950783 1.791647 3.01 \n", "3 1.800000 1.118034 1.194780 14.10 \n", "4 1.100000 0.433013 0.440952 36.80 \n", "\n", "[5 rows x 82 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we use the `describe` function to get an understanding of the data. It shows us the distribution for all the columns. You can use more functions like `info()` to get useful info." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_elementsmean_atomic_masswtd_mean_atomic_massgmean_atomic_masswtd_gmean_atomic_massentropy_atomic_masswtd_entropy_atomic_massrange_atomic_masswtd_range_atomic_massstd_atomic_mass...wtd_mean_Valencegmean_Valencewtd_gmean_Valenceentropy_Valencewtd_entropy_Valencerange_Valencewtd_range_Valencestd_Valencewtd_std_Valencecritical_temp
count18073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.000000...18073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.00000018073.000000
mean4.11652787.49585372.91528171.19395158.4442081.1656121.064409115.73213333.21372744.442844...3.1523123.0565463.0547141.2960281.0540282.0447081.4816850.8410780.67604134.492796
std1.43962529.58656433.32043730.92047236.4705630.3650190.40123354.71859526.88607120.068666...1.1893561.0434511.1723830.3927610.3802741.2428610.9764550.4852470.45598434.307997
min1.0000006.9410006.9410005.6850333.1937450.0000000.0000000.0000000.0000000.000000...1.0000001.0000001.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000210
25%3.00000072.45124052.17772558.00164835.2585900.9698580.77761978.35315016.83045032.890369...2.1180562.2797052.0921151.0608570.7789981.0000000.9202860.4714050.3085155.400000
50%4.00000084.84188060.78669366.36159239.8984821.1995411.146366122.90607026.65840145.129500...2.6181822.6153212.4335891.3689221.1654102.0000001.0626670.8000000.50000020.000000
75%5.000000100.35127585.99413078.01968973.0977961.4445371.360442155.00600038.36037559.663892...4.0300003.7416573.9205171.5890271.3319263.0000001.9200001.2000001.02102363.000000
max9.000000208.980400208.980400208.980400208.9804001.9837971.958203207.972460205.589910101.019700...7.0000007.0000007.0000002.1419631.9497396.0000006.9922003.0000003.000000185.000000
\n", "

8 rows × 82 columns

\n", "
" ], "text/plain": [ " number_of_elements mean_atomic_mass wtd_mean_atomic_mass \\\n", "count 18073.000000 18073.000000 18073.000000 \n", "mean 4.116527 87.495853 72.915281 \n", "std 1.439625 29.586564 33.320437 \n", "min 1.000000 6.941000 6.941000 \n", "25% 3.000000 72.451240 52.177725 \n", "50% 4.000000 84.841880 60.786693 \n", "75% 5.000000 100.351275 85.994130 \n", "max 9.000000 208.980400 208.980400 \n", "\n", " gmean_atomic_mass wtd_gmean_atomic_mass entropy_atomic_mass \\\n", "count 18073.000000 18073.000000 18073.000000 \n", "mean 71.193951 58.444208 1.165612 \n", "std 30.920472 36.470563 0.365019 \n", "min 5.685033 3.193745 0.000000 \n", "25% 58.001648 35.258590 0.969858 \n", "50% 66.361592 39.898482 1.199541 \n", "75% 78.019689 73.097796 1.444537 \n", "max 208.980400 208.980400 1.983797 \n", "\n", " wtd_entropy_atomic_mass range_atomic_mass wtd_range_atomic_mass \\\n", "count 18073.000000 18073.000000 18073.000000 \n", "mean 1.064409 115.732133 33.213727 \n", "std 0.401233 54.718595 26.886071 \n", "min 0.000000 0.000000 0.000000 \n", "25% 0.777619 78.353150 16.830450 \n", "50% 1.146366 122.906070 26.658401 \n", "75% 1.360442 155.006000 38.360375 \n", "max 1.958203 207.972460 205.589910 \n", "\n", " std_atomic_mass ... wtd_mean_Valence gmean_Valence \\\n", "count 18073.000000 ... 18073.000000 18073.000000 \n", "mean 44.442844 ... 3.152312 3.056546 \n", "std 20.068666 ... 1.189356 1.043451 \n", "min 0.000000 ... 1.000000 1.000000 \n", "25% 32.890369 ... 2.118056 2.279705 \n", "50% 45.129500 ... 2.618182 2.615321 \n", "75% 59.663892 ... 4.030000 3.741657 \n", "max 101.019700 ... 7.000000 7.000000 \n", "\n", " wtd_gmean_Valence entropy_Valence wtd_entropy_Valence range_Valence \\\n", "count 18073.000000 18073.000000 18073.000000 18073.000000 \n", "mean 3.054714 1.296028 1.054028 2.044708 \n", "std 1.172383 0.392761 0.380274 1.242861 \n", "min 1.000000 0.000000 0.000000 0.000000 \n", "25% 2.092115 1.060857 0.778998 1.000000 \n", "50% 2.433589 1.368922 1.165410 2.000000 \n", "75% 3.920517 1.589027 1.331926 3.000000 \n", "max 7.000000 2.141963 1.949739 6.000000 \n", "\n", " wtd_range_Valence std_Valence wtd_std_Valence critical_temp \n", "count 18073.000000 18073.000000 18073.000000 18073.000000 \n", "mean 1.481685 0.841078 0.676041 34.492796 \n", "std 0.976455 0.485247 0.455984 34.307997 \n", "min 0.000000 0.000000 0.000000 0.000210 \n", "25% 0.920286 0.471405 0.308515 5.400000 \n", "50% 1.062667 0.800000 0.500000 20.000000 \n", "75% 1.920000 1.200000 1.021023 63.000000 \n", "max 6.992200 3.000000 3.000000 185.000000 \n", "\n", "[8 rows x 82 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_data.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Split Data into Train and Validation\n", "Now we want to see how well our model is performing, but we dont have the test data labels with us to check. What do we do ? So we split our dataset into train and validation. The idea is that we test our classifier on validation set in order to get an idea of how well our classifier works. This way we can also ensure that we dont [overfit](https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/) on the train dataset. There are many ways to do validation like [k-fold](https://machinelearningmastery.com/k-fold-cross-validation/),[leave one out](https://en.wikipedia.org/wiki/Cross-validation_(statistics), etc" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = train_data.drop('critical_temp',1)\n", "y = train_data['critical_temp']\n", "# Validation testing\n", "X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the Model and Train\n", "Now we come to the juicy part. We have fixed our data and now we train a model. The model will learn the function by looking at the inputs and corresponding outputs. There are a ton of models to choose from some being [Linear Regression](https://machinelearningmastery.com/linear-regression-for-machine-learning/), [Random Forests](https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47), [Decision Trees](https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052), etc. \n", "Tip: A good model doesnt depend solely on the model but on the features(columns) you choose. So make sure to play with your data and keep only whats important. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regressor = LinearRegression() \n", "regressor.fit(X_train, y_train)\n", "\n", "# from sklearn import tree\n", "# clf = tree.DecisionTreeRegressor()\n", "# clf = clf.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have used [Linear Regression](https://machinelearningmastery.com/linear-regression-for-machine-learning/) as a model here and set few of the parameteres. But one can set more parameters and increase the performance. To see the list of parameters visit [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html). \n", "Also given Decision Tree examples. Check out Decision Tree's parameters [here](https://scikit-learn.org/stable/modules/tree.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check which variables have the most impact\n", "We now take this time to identify the columns that have the most impact. This is used to remove the columns that have negligble impact on the data and improve our model." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Coefficient
number_of_elements-4.202422
mean_atomic_mass0.833105
wtd_mean_atomic_mass-0.881193
gmean_atomic_mass-0.510610
wtd_gmean_atomic_mass0.642180
\n", "
" ], "text/plain": [ " Coefficient\n", "number_of_elements -4.202422\n", "mean_atomic_mass 0.833105\n", "wtd_mean_atomic_mass -0.881193\n", "gmean_atomic_mass -0.510610\n", "wtd_gmean_atomic_mass 0.642180" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient']) \n", "coeff_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predict on Validation\n", "Now we predict our trained model on the validation set and evaluate our model" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y_pred = regressor.predict(X_val)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.DataFrame({'Actual': y_val, 'Predicted': y_pred})\n", "df1 = df.head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate the Performance\n", "We use the same metrics as that will be used for the test set. \n", "[MAE](https://en.wikipedia.org/wiki/Mean_absolute_error) and [RMSE](https://www.statisticshowto.com/rmse/) are the metrics for this challenge" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Absolute Error: 13.42086725495139\n", "Mean Squared Error: 323.28465055058496\n", "Root Mean Squared Error: 17.98011820179681\n" ] } ], "source": [ "print('Mean Absolute Error:', metrics.mean_absolute_error(y_val, y_pred)) \n", "print('Mean Squared Error:', metrics.mean_squared_error(y_val, y_pred)) \n", "print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_val, y_pred)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Test Set\n", "Load the test data now" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "test_data = pd.read_csv('test.csv')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_elementsmean_atomic_masswtd_mean_atomic_massgmean_atomic_masswtd_gmean_atomic_massentropy_atomic_masswtd_entropy_atomic_massrange_atomic_masswtd_range_atomic_massstd_atomic_mass...mean_Valencewtd_mean_Valencegmean_Valencewtd_gmean_Valenceentropy_Valencewtd_entropy_Valencerange_Valencewtd_range_Valencestd_Valencewtd_std_Valence
0282.76819087.83728582.14493587.3601090.6856270.50957520.2763851.52228510.138190...4.504.7500004.4721364.7287080.6869620.51465312.7500000.5000000.433013
1476.44456381.45675059.35667268.2296171.1995411.108189121.3276036.95065743.823354...2.252.1428572.2133642.1192681.3689221.30952610.5714290.4330130.349927
2588.93674451.09043170.35897534.7839911.4458241.525092122.9060710.43866746.482335...2.402.1146792.3521582.0951931.5890271.31418910.9678900.4898980.318634
3476.51771856.14943259.31009635.5621241.1972731.042132122.9060731.92069044.289459...2.252.2514292.2133642.2146461.3689221.07885511.0742860.4330130.433834
43104.60849089.558979101.71981888.4812101.0702580.94428459.9454733.54142325.225148...5.005.8112454.7622035.7439541.0549200.80399033.0240961.4142140.728448
\n", "

5 rows × 81 columns

\n", "
" ], "text/plain": [ " number_of_elements mean_atomic_mass wtd_mean_atomic_mass \\\n", "0 2 82.768190 87.837285 \n", "1 4 76.444563 81.456750 \n", "2 5 88.936744 51.090431 \n", "3 4 76.517718 56.149432 \n", "4 3 104.608490 89.558979 \n", "\n", " gmean_atomic_mass wtd_gmean_atomic_mass entropy_atomic_mass \\\n", "0 82.144935 87.360109 0.685627 \n", "1 59.356672 68.229617 1.199541 \n", "2 70.358975 34.783991 1.445824 \n", "3 59.310096 35.562124 1.197273 \n", "4 101.719818 88.481210 1.070258 \n", "\n", " wtd_entropy_atomic_mass range_atomic_mass wtd_range_atomic_mass \\\n", "0 0.509575 20.27638 51.522285 \n", "1 1.108189 121.32760 36.950657 \n", "2 1.525092 122.90607 10.438667 \n", "3 1.042132 122.90607 31.920690 \n", "4 0.944284 59.94547 33.541423 \n", "\n", " std_atomic_mass ... mean_Valence wtd_mean_Valence \\\n", "0 10.138190 ... 4.50 4.750000 \n", "1 43.823354 ... 2.25 2.142857 \n", "2 46.482335 ... 2.40 2.114679 \n", "3 44.289459 ... 2.25 2.251429 \n", "4 25.225148 ... 5.00 5.811245 \n", "\n", " gmean_Valence wtd_gmean_Valence entropy_Valence wtd_entropy_Valence \\\n", "0 4.472136 4.728708 0.686962 0.514653 \n", "1 2.213364 2.119268 1.368922 1.309526 \n", "2 2.352158 2.095193 1.589027 1.314189 \n", "3 2.213364 2.214646 1.368922 1.078855 \n", "4 4.762203 5.743954 1.054920 0.803990 \n", "\n", " range_Valence wtd_range_Valence std_Valence wtd_std_Valence \n", "0 1 2.750000 0.500000 0.433013 \n", "1 1 0.571429 0.433013 0.349927 \n", "2 1 0.967890 0.489898 0.318634 \n", "3 1 1.074286 0.433013 0.433834 \n", "4 3 3.024096 1.414214 0.728448 \n", "\n", "[5 rows x 81 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predict on test set\n", "Time for the moment of truth! Predict on test set and time to make the submission." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y_test = regressor.predict(test_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save it in correct format" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.DataFrame(y_test,columns=['critical_temp'])\n", "df.to_csv('submission.csv',index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## To download the generated in collab csv run the below command" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from google.colab import files\n", "files.download('submission.csv') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To participate in the challenge click [here](https://www.aicrowd.com/challenges/spcrt-superconductor-critical-temperature)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 2 }