{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Decision tree for classification\n", "> A Summary of lecture \"Machine Learning with Tree-Based Models in Python\n", "\", via datacamp\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Machine_Learning]\n", "- image: images/decision-boundary.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Decision tree for classification\n", "- Classification-tree\n", " - Sequence of if-else questions about individual features.\n", " - **Objective**: infer class labels\n", " - Able to caputre non-linear relationships between features and labels\n", " - Don't require feature scaling(e.g. Standardization)\n", "- Decision Regions\n", " - Decision region: region in the feature space where all instances are assigned to one class label\n", " - Decision Boundary: surface separating different decision regions\n", "![decision region](image/decision_boundary.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train your first classification tree\n", "In this exercise you'll work with the [Wisconsin Breast Cancer Dataset](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data) from the UCI machine learning repository. You'll predict whether a tumor is malignant or benign based on two features: the mean radius of the tumor (```radius_mean```) and its mean number of concave points (```concave points_mean```)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preprocess" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iddiagnosisradius_meantexture_meanperimeter_meanarea_meansmoothness_meancompactness_meanconcavity_meanconcave points_mean...texture_worstperimeter_worstarea_worstsmoothness_worstcompactness_worstconcavity_worstconcave points_worstsymmetry_worstfractal_dimension_worstUnnamed: 32
0842302M17.9910.38122.801001.00.118400.277600.30010.14710...17.33184.602019.00.16220.66560.71190.26540.46010.11890NaN
1842517M20.5717.77132.901326.00.084740.078640.08690.07017...23.41158.801956.00.12380.18660.24160.18600.27500.08902NaN
284300903M19.6921.25130.001203.00.109600.159900.19740.12790...25.53152.501709.00.14440.42450.45040.24300.36130.08758NaN
384348301M11.4220.3877.58386.10.142500.283900.24140.10520...26.5098.87567.70.20980.86630.68690.25750.66380.17300NaN
484358402M20.2914.34135.101297.00.100300.132800.19800.10430...16.67152.201575.00.13740.20500.40000.16250.23640.07678NaN
\n", "

5 rows × 33 columns

\n", "
" ], "text/plain": [ " id diagnosis radius_mean texture_mean perimeter_mean area_mean \\\n", "0 842302 M 17.99 10.38 122.80 1001.0 \n", "1 842517 M 20.57 17.77 132.90 1326.0 \n", "2 84300903 M 19.69 21.25 130.00 1203.0 \n", "3 84348301 M 11.42 20.38 77.58 386.1 \n", "4 84358402 M 20.29 14.34 135.10 1297.0 \n", "\n", " smoothness_mean compactness_mean concavity_mean concave points_mean \\\n", "0 0.11840 0.27760 0.3001 0.14710 \n", "1 0.08474 0.07864 0.0869 0.07017 \n", "2 0.10960 0.15990 0.1974 0.12790 \n", "3 0.14250 0.28390 0.2414 0.10520 \n", "4 0.10030 0.13280 0.1980 0.10430 \n", "\n", " ... texture_worst perimeter_worst area_worst smoothness_worst \\\n", "0 ... 17.33 184.60 2019.0 0.1622 \n", "1 ... 23.41 158.80 1956.0 0.1238 \n", "2 ... 25.53 152.50 1709.0 0.1444 \n", "3 ... 26.50 98.87 567.7 0.2098 \n", "4 ... 16.67 152.20 1575.0 0.1374 \n", "\n", " compactness_worst concavity_worst concave points_worst symmetry_worst \\\n", "0 0.6656 0.7119 0.2654 0.4601 \n", "1 0.1866 0.2416 0.1860 0.2750 \n", "2 0.4245 0.4504 0.2430 0.3613 \n", "3 0.8663 0.6869 0.2575 0.6638 \n", "4 0.2050 0.4000 0.1625 0.2364 \n", "\n", " fractal_dimension_worst Unnamed: 32 \n", "0 0.11890 NaN \n", "1 0.08902 NaN \n", "2 0.08758 NaN \n", "3 0.17300 NaN \n", "4 0.07678 NaN \n", "\n", "[5 rows x 33 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wbc = pd.read_csv('./dataset/wbc.csv')\n", "wbc.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "X = wbc[['radius_mean', 'concave points_mean']]\n", "y = wbc['diagnosis']\n", "y = y.map({'M':1, 'B':0})" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 0 0 1 0]\n" ] } ], "source": [ "from sklearn.tree import DecisionTreeClassifier\n", "\n", "# Instantiate a DecisionTreeClassifier 'dt' with a maximum depth of 6\n", "dt = DecisionTreeClassifier(max_depth=6, random_state=1)\n", "\n", "# Fit dt to the training set\n", "dt.fit(X_train, y_train)\n", "\n", "# Predict test set labels\n", "y_pred = dt.predict(X_test)\n", "print(y_pred[0:5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate the classification tree\n", "Now that you've fit your first classification tree, it's time to evaluate its performance on the test set. You'll do so using the accuracy metric which corresponds to the fraction of correct predictions made on the test set." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test set accuracy: 0.89\n" ] } ], "source": [ "from sklearn.metrics import accuracy_score\n", "\n", "# Predict test set labels\n", "y_pred = dt.predict(X_test)\n", "\n", "# Compute test set accuracy\n", "acc = accuracy_score(y_test, y_pred)\n", "print(\"Test set accuracy: {:.2f}\".format(acc))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Logistic regression vs classification tree\n", "A classification tree divides the feature space into rectangular regions. In contrast, a linear model such as logistic regression produces only a single linear decision boundary dividing the feature space into two decision regions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Helper function" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from mlxtend.plotting import plot_decision_regions\n", "\n", "def plot_labeled_decision_regions(X,y, models):\n", " '''Function producing a scatter plot of the instances contained \n", " in the 2D dataset (X,y) along with the decision \n", " regions of two trained classification models contained in the\n", " list 'models'.\n", " \n", " Parameters\n", " ----------\n", " X: pandas DataFrame corresponding to two numerical features \n", " y: pandas Series corresponding the class labels\n", " models: list containing two trained classifiers \n", " \n", " '''\n", " if len(models) != 2:\n", " raise Exception('''Models should be a list containing only two trained classifiers.''')\n", " if not isinstance(X, pd.DataFrame):\n", " raise Exception('''X has to be a pandas DataFrame with two numerical features.''')\n", " if not isinstance(y, pd.Series):\n", " raise Exception('''y has to be a pandas Series corresponding to the labels.''')\n", " fig, ax = plt.subplots(1, 2, figsize=(10.0, 5), sharey=True)\n", " for i, model in enumerate(models):\n", " plot_decision_regions(X.values, y.values, model, legend= 2, ax = ax[i])\n", " ax[i].set_title(model.__class__.__name__)\n", " ax[i].set_xlabel(X.columns[0])\n", " if i == 0:\n", " ax[i].set_ylabel(X.columns[1])\n", " ax[i].set_ylim(X.values[:,1].min(), X.values[:,1].max())\n", " ax[i].set_xlim(X.values[:,0].min(), X.values[:,0].max())\n", " plt.tight_layout()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "\n", "# Instantiate logreg\n", "logreg = LogisticRegression(random_state=1)\n", "\n", "# Fit logreg to the training set\n", "logreg.fit(X_train, y_train)\n", "\n", "# Define a list called clfs containing the two classifiers logreg and dt\n", "clfs = [logreg, dt]\n", "\n", "# Review the decision regions of the two classifier\n", "plot_labeled_decision_regions(X_test, y_test, clfs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classification tree Learning\n", "- Building Blocks of a Decision-Tree\n", " - Decision-Tree: data structure consisting of a hierarchy of nodes\n", " - Node: question or prediction\n", " - Three kinds of nodes\n", " - **Root**: no parent node, question giving rise to two children nodes.\n", " - **Internal node**: one parent node, question giving rise to two children nodes.\n", " - **Leaf**: one parent node, no children nodes --> prediction.\n", "- Information Gain (IG)\n", "![information gain](./image/ig.png)\n", "$$ IG(\\underbrace{f}_{\\text{feature}}, \\underbrace{sp}_{\\text{split-point}} ) = I(\\text{parent}) - \\big( \\frac{N_{\\text{left}}}{N}I(\\text{left}) + \\frac{N_{\\text{right}}}{N}I(\\text{right}) \\big) $$\n", " - Criteria to measure the impurity of a note $I(\\text{node})$:\n", " - gini index\n", " - entropy\n", " - etc...\n", "- Classification-Tree Learning\n", " - Nodes are grown recursively.\n", " - At each node, split the data based on:\n", " - feature $f$ and split-point $sp$ to maximize $IG(\\text{node})$.\n", " - If $IG(\\text{node}) = 0$, declare the node a leaf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using entropy as a criterion\n", "In this exercise, you'll train a classification tree on the Wisconsin Breast Cancer dataset using entropy as an information criterion. You'll do so using all the 30 features in the dataset, which is split into 80% train and 20% test.\n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='entropy',\n", " max_depth=8, max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort='deprecated',\n", " random_state=1, splitter='best')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.tree import DecisionTreeClassifier\n", "\n", "# Instantiate dt_entropy, set 'entropy' as the information criterion\n", "dt_entropy = DecisionTreeClassifier(max_depth=8, criterion='entropy', random_state=1)\n", "\n", "# Fit dt_entropy to the training set\n", "dt_entropy.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Entropy vs Gini index\n", "In this exercise you'll compare the test set accuracy of dt_entropy to the accuracy of another tree named ```dt_gini```. The tree ```dt_gini``` was trained on the same dataset using the same parameters except for the information criterion which was set to the gini index using the keyword ```'gini'```." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n", " max_depth=8, max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort='deprecated',\n", " random_state=1, splitter='best')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt_gini = DecisionTreeClassifier(max_depth=8, criterion='gini', random_state=1)\n", "dt_gini.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy achieved by using entropy: 0.8947368421052632\n", "Accuracy achieved by using gini: 0.8859649122807017\n" ] } ], "source": [ "from sklearn.metrics import accuracy_score\n", "\n", "# Use dt_entropy to predict test set labels\n", "y_pred = dt_entropy.predict(X_test)\n", "y_pred_gini = dt_gini.predict(X_test)\n", "\n", "# Evaluate accuracy_entropy\n", "accuracy_entropy = accuracy_score(y_test, y_pred)\n", "accuracy_gini = accuracy_score(y_test, y_pred_gini)\n", "\n", "# Print accuracy_entropy\n", "print(\"Accuracy achieved by using entropy: \", accuracy_entropy)\n", "\n", "# Print accuracy_gini\n", "print(\"Accuracy achieved by using gini: \", accuracy_gini)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Decision tree for regression\n", "- Information Criterion for Regression Tree\n", "$$ I(\\text{node}) = \\underbrace{\\text{MSE}(\\text{node})}_{\\text{mean-squared-error}} = \\dfrac{1}{N_{\\text{node}}} \\sum_{i \\in \\text{node}} \\big(y^{(i)} - \\hat{y}_{\\text{node}} \\big)^2 $$\n", "$$ \\underbrace{\\hat{y}_{\\text{node}}}_{\\text{mean-target-value}} = \\dfrac{1}{N_{\\text{node}}} \\sum_{i \\in \\text{node}}y^{(i)}$$\n", "- Prediction\n", "$$ \\hat{y}_{\\text{pred}}(\\text{leaf}) = \\dfrac{1}{N_{\\text{leaf}}} \\sum_{i \\in \\text{leaf}} y^{(i)}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train your first regression tree\n", "In this exercise, you'll train a regression tree to predict the mpg (miles per gallon) consumption of cars in the [auto-mpg dataset](https://www.kaggle.com/uciml/autompg-dataset) using all the six available features.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Preprocess" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgdisplhpweightacceloriginsize
018.0250.088313914.5US15.0
19.0304.0193473218.5US20.0
236.191.060180016.4Asia10.0
318.5250.098352519.0US15.0
434.397.078218815.8Europe10.0
\n", "
" ], "text/plain": [ " mpg displ hp weight accel origin size\n", "0 18.0 250.0 88 3139 14.5 US 15.0\n", "1 9.0 304.0 193 4732 18.5 US 20.0\n", "2 36.1 91.0 60 1800 16.4 Asia 10.0\n", "3 18.5 250.0 98 3525 19.0 US 15.0\n", "4 34.3 97.0 78 2188 15.8 Europe 10.0" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mpg = pd.read_csv('./dataset/auto.csv')\n", "mpg.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "mpg = pd.get_dummies(mpg)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgdisplhpweightaccelsizeorigin_Asiaorigin_Europeorigin_US
018.0250.088313914.515.0001
19.0304.0193473218.520.0001
236.191.060180016.410.0100
318.5250.098352519.015.0001
434.397.078218815.810.0010
\n", "
" ], "text/plain": [ " mpg displ hp weight accel size origin_Asia origin_Europe \\\n", "0 18.0 250.0 88 3139 14.5 15.0 0 0 \n", "1 9.0 304.0 193 4732 18.5 20.0 0 0 \n", "2 36.1 91.0 60 1800 16.4 10.0 1 0 \n", "3 18.5 250.0 98 3525 19.0 15.0 0 0 \n", "4 34.3 97.0 78 2188 15.8 10.0 0 1 \n", "\n", " origin_US \n", "0 1 \n", "1 1 \n", "2 0 \n", "3 1 \n", "4 0 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mpg.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "X = mpg.drop('mpg', axis='columns')\n", "y = mpg['mpg']" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=8,\n", " max_features=None, max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=0.13, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, presort='deprecated',\n", " random_state=3, splitter='best')" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "\n", "# Instantiate dt\n", "dt = DecisionTreeRegressor(max_depth=8, min_samples_leaf=0.13, random_state=3)\n", "\n", "# Fit dt to the training set\n", "dt.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate the regression tree\n", "In this exercise, you will evaluate the test set performance of ```dt``` using the Root Mean Squared Error (RMSE) metric. The RMSE of a model measures, on average, how much the model's predictions differ from the actual labels. The RMSE of a model can be obtained by computing the square root of the model's Mean Squared Error (MSE).\n", "\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test set RMSE of dt: 4.37\n" ] } ], "source": [ "from sklearn.metrics import mean_squared_error\n", "\n", "# Compute y_pred\n", "y_pred = dt.predict(X_test)\n", "\n", "# Compute mse_dt\n", "mse_dt = mean_squared_error(y_test, y_pred)\n", "\n", "# Compute rmse_dt\n", "rmse_dt = mse_dt ** (1/2)\n", "\n", "# Print rmse_dt\n", "print(\"Test set RMSE of dt: {:.2f}\".format(rmse_dt))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linear regression vs regression tree\n", "In this exercise, you'll compare the test set RMSE of ```dt``` to that achieved by a linear regression model. We have already instantiated a linear regression model ```lr``` and trained it on the same dataset as ```dt```.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Preprocess" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "lr = LinearRegression()\n", "\n", "lr.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Linear Regression test set RMSE: 5.10\n", "Regression Tree test set RMSE: 4.37\n" ] } ], "source": [ "# Predict test set labels\n", "y_pred_lr = lr.predict(X_test)\n", "\n", "# Compute mse_lr\n", "mse_lr = mean_squared_error(y_test, y_pred_lr)\n", "\n", "# Compute rmse_lr\n", "rmse_lr = mse_lr ** 0.5\n", "\n", "# Print rmse_lr\n", "print(\"Linear Regression test set RMSE: {:.2f}\".format(rmse_lr))\n", "\n", "# Print rmse_dt\n", "print(\"Regression Tree test set RMSE: {:.2f}\".format(rmse_dt))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }