{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating Features\n",
    "> In this chapter, you will explore what feature engineering is and how to get started with applying it to real-world data. You will load, explore and visualize a survey response dataset, and in doing so you will learn about its underlying data types and why they have an influence on how you should engineer your features. Using the pandas package you will create new features from both categorical and continuous columns. This is the Summary of lecture \"Feature Engineering for Machine Learning in Python\", via datacamp.\n",
    "\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true\n",
    "- author: Chanseok Kang\n",
    "- categories: [Python, Datacamp, Machine_Learning]\n",
    "- image: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why generate features?\n",
    "- Different types of data\n",
    "    - Continuous: either integers (or whole numbers) or floats (decimals)\n",
    "    - Categorical: one of a limited set of values, e.g., gender, country of birth\n",
    "    - Ordinal: ranked values often with no details of distance between them\n",
    "    - Boolean: True/False values\n",
    "    - Datetime: dates and times"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting to know your data\n",
    "You will be working with a modified subset of the [Stackoverflow survey response data](https://insights.stackoverflow.com/survey/2018/#overview) in the first three chapters of this course. This data set records the details, and preferences of thousands of users of the StackOverflow website."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SurveyDate</th>\n",
       "      <th>FormalEducation</th>\n",
       "      <th>ConvertedSalary</th>\n",
       "      <th>Hobby</th>\n",
       "      <th>Country</th>\n",
       "      <th>StackOverflowJobsRecommend</th>\n",
       "      <th>VersionControl</th>\n",
       "      <th>Age</th>\n",
       "      <th>Years Experience</th>\n",
       "      <th>Gender</th>\n",
       "      <th>RawSalary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2/28/18 20:20</td>\n",
       "      <td>Bachelor's degree (BA. BS. B.Eng.. etc.)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>South Africa</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Git</td>\n",
       "      <td>21</td>\n",
       "      <td>13</td>\n",
       "      <td>Male</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>6/28/18 13:26</td>\n",
       "      <td>Bachelor's degree (BA. BS. B.Eng.. etc.)</td>\n",
       "      <td>70841.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sweeden</td>\n",
       "      <td>7.0</td>\n",
       "      <td>Git;Subversion</td>\n",
       "      <td>38</td>\n",
       "      <td>9</td>\n",
       "      <td>Male</td>\n",
       "      <td>70,841.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6/6/18 3:37</td>\n",
       "      <td>Bachelor's degree (BA. BS. B.Eng.. etc.)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>No</td>\n",
       "      <td>Sweeden</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Git</td>\n",
       "      <td>45</td>\n",
       "      <td>11</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5/9/18 1:06</td>\n",
       "      <td>Some college/university study without earning ...</td>\n",
       "      <td>21426.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Sweeden</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Zip file back-ups</td>\n",
       "      <td>46</td>\n",
       "      <td>12</td>\n",
       "      <td>Male</td>\n",
       "      <td>21,426.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4/12/18 22:41</td>\n",
       "      <td>Bachelor's degree (BA. BS. B.Eng.. etc.)</td>\n",
       "      <td>41671.0</td>\n",
       "      <td>Yes</td>\n",
       "      <td>UK</td>\n",
       "      <td>8.0</td>\n",
       "      <td>Git</td>\n",
       "      <td>39</td>\n",
       "      <td>7</td>\n",
       "      <td>Male</td>\n",
       "      <td>£41,671.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      SurveyDate                                    FormalEducation  \\\n",
       "0  2/28/18 20:20           Bachelor's degree (BA. BS. B.Eng.. etc.)   \n",
       "1  6/28/18 13:26           Bachelor's degree (BA. BS. B.Eng.. etc.)   \n",
       "2    6/6/18 3:37           Bachelor's degree (BA. BS. B.Eng.. etc.)   \n",
       "3    5/9/18 1:06  Some college/university study without earning ...   \n",
       "4  4/12/18 22:41           Bachelor's degree (BA. BS. B.Eng.. etc.)   \n",
       "\n",
       "   ConvertedSalary Hobby       Country  StackOverflowJobsRecommend  \\\n",
       "0              NaN   Yes  South Africa                         NaN   \n",
       "1          70841.0   Yes       Sweeden                         7.0   \n",
       "2              NaN    No       Sweeden                         8.0   \n",
       "3          21426.0   Yes       Sweeden                         NaN   \n",
       "4          41671.0   Yes            UK                         8.0   \n",
       "\n",
       "      VersionControl  Age  Years Experience Gender   RawSalary  \n",
       "0                Git   21                13   Male         NaN  \n",
       "1     Git;Subversion   38                 9   Male   70,841.00  \n",
       "2                Git   45                11    NaN         NaN  \n",
       "3  Zip file back-ups   46                12   Male   21,426.00  \n",
       "4                Git   39                 7   Male  £41,671.00  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Import the data\n",
    "so_survey_df = pd.read_csv('./dataset/Combined_DS_v10.csv')\n",
    "\n",
    "# Print the first five rows of the DataFrame\n",
    "so_survey_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SurveyDate                     object\n",
      "FormalEducation                object\n",
      "ConvertedSalary               float64\n",
      "Hobby                          object\n",
      "Country                        object\n",
      "StackOverflowJobsRecommend    float64\n",
      "VersionControl                 object\n",
      "Age                             int64\n",
      "Years Experience                int64\n",
      "Gender                         object\n",
      "RawSalary                      object\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "# Print the data type of each column\n",
    "print(so_survey_df.dtypes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting specific data types\n",
    "Often a data set will contain columns with several different data types (like the one you are working with). The majority of machine learning models require you to have a consistent data type across features. Similarly, most feature engineering techniques are applicable to only one type of data at a time. For these reasons among others, you will often want to be able to access just the columns of certain types when working with a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index(['ConvertedSalary', 'StackOverflowJobsRecommend'], dtype='object')\n"
     ]
    }
   ],
   "source": [
    "# Create subset of only the numberic columns\n",
    "so_numeric_df = so_survey_df.select_dtypes(include=['int', 'float'])\n",
    "\n",
    "# Print the column names contained in so_numeric_df\n",
    "print(so_numeric_df.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dealing with categorical features\n",
    "- Encoding categorical features\n",
    "    - One-hot encoding\n",
    "    - Dummy encoding\n",
    "- One-hot vs. dummies\n",
    "    - One-hot encoding: Explainable features\n",
    "    - Dummy encoding: Necessary information without duplication"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### One-hot encoding and dummy variables\n",
    "To use categorical variables in a machine learning model, you first need to represent them in a quantitative way. The two most common approaches are to one-hot encode the variables using or to use dummy variables. In this exercise, you will create both types of encoding, and compare the created column sets. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index(['SurveyDate', 'FormalEducation', 'ConvertedSalary', 'Hobby',\n",
      "       'StackOverflowJobsRecommend', 'VersionControl', 'Age',\n",
      "       'Years Experience', 'Gender', 'RawSalary', 'OH_France', 'OH_India',\n",
      "       'OH_Ireland', 'OH_Russia', 'OH_South Africa', 'OH_Spain', 'OH_Sweeden',\n",
      "       'OH_UK', 'OH_USA', 'OH_Ukraine'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "# Convert the Country column to a one hot encoded DataFrame\n",
    "one_hot_encoded = pd.get_dummies(so_survey_df, columns=['Country'], prefix='OH')\n",
    "\n",
    "# Print the columns names\n",
    "print(one_hot_encoded.columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index(['SurveyDate', 'FormalEducation', 'ConvertedSalary', 'Hobby',\n",
      "       'StackOverflowJobsRecommend', 'VersionControl', 'Age',\n",
      "       'Years Experience', 'Gender', 'RawSalary', 'DM_India', 'DM_Ireland',\n",
      "       'DM_Russia', 'DM_South Africa', 'DM_Spain', 'DM_Sweeden', 'DM_UK',\n",
      "       'DM_USA', 'DM_Ukraine'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "# Convert the Country column to a one hot encoded DataFrame\n",
    "dummy = pd.get_dummies(so_survey_df, columns=['Country'], drop_first=True, prefix='DM')\n",
    "\n",
    "# Print the columns names\n",
    "print(dummy.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dealing with uncommon categories\n",
    "Some features can have many different categories but a very uneven distribution of their occurrences. Take for example Data Science's favorite languages to code in, some common choices are Python, R, and Julia, but there can be individuals with bespoke choices, like FORTRAN, C etc. In these cases, you may not want to create a feature for each value, but only the more common occurrences."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "South Africa    166\n",
      "USA             164\n",
      "Spain           134\n",
      "Sweeden         119\n",
      "France          115\n",
      "Russia           97\n",
      "India            95\n",
      "UK               95\n",
      "Ukraine           9\n",
      "Ireland           5\n",
      "Name: Country, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# Create a series out of the Country columns\n",
    "countries = so_survey_df.Country\n",
    "\n",
    "# Get the counts of each category\n",
    "country_counts = countries.value_counts()\n",
    "\n",
    "# Print the count values for each category\n",
    "print(country_counts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    False\n",
      "1    False\n",
      "2    False\n",
      "3    False\n",
      "4    False\n",
      "Name: Country, dtype: bool\n"
     ]
    }
   ],
   "source": [
    "# Create a mask for only categories that occur less than 10 times\n",
    "mask = countries.isin(country_counts[country_counts < 10].index)\n",
    "\n",
    "# Print the top 5 rows in the mask series\n",
    "print(mask.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "South Africa    166\n",
      "USA             164\n",
      "Spain           134\n",
      "Sweeden         119\n",
      "France          115\n",
      "Russia           97\n",
      "India            95\n",
      "UK               95\n",
      "Other            14\n",
      "Name: Country, dtype: int64\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\kcsgo\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \n"
     ]
    }
   ],
   "source": [
    "# Label all other categories as Other\n",
    "countries[mask] = 'Other'\n",
    "\n",
    "# Print the updated category counts\n",
    "print(countries.value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numeric variables\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Binarizing columns\n",
    "While numeric values can often be used without any feature engineering, there will be cases when some form of manipulation can be useful. For example on some occasions, you might not care about the magnitude of a value but only care about its direction, or if it exists at all. In these situations, you will want to binarize a column. In the `so_survey_df` data, you have a large number of survey respondents that are working voluntarily (without pay). You will create a new column titled `Paid_Job` indicating whether each person is paid (their salary is greater than zero)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Paid_Job</th>\n",
       "      <th>ConvertedSalary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>70841.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>21426.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>41671.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Paid_Job  ConvertedSalary\n",
       "0         0              NaN\n",
       "1         1          70841.0\n",
       "2         0              NaN\n",
       "3         1          21426.0\n",
       "4         1          41671.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create the Paid_Job column filled with zeros\n",
    "so_survey_df['Paid_Job'] = 0\n",
    "\n",
    "# Replace all the Paid_Job values where ConvertedSalary is > 0\n",
    "so_survey_df.loc[so_survey_df['ConvertedSalary'] > 0, 'Paid_Job'] = 1\n",
    "\n",
    "# Print the first five rows of the columns\n",
    "so_survey_df[['Paid_Job', 'ConvertedSalary']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Binning values\n",
    "For many continuous values you will care less about the exact value of a numeric column, but instead care about the bucket it falls into. This can be useful when plotting values, or simplifying your machine learning models. It is mostly used on continuous variables where accuracy is not the biggest concern e.g. age, height, wages.\n",
    "\n",
    "Bins are created using `pd.cut(df['column_name'], bins)` where `bins` can be an integer specifying the number of evenly spaced bins, or a list of bin boundaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>equal_binned</th>\n",
       "      <th>ConvertedSalary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(-2000.0, 400000.0]</td>\n",
       "      <td>70841.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(-2000.0, 400000.0]</td>\n",
       "      <td>21426.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>(-2000.0, 400000.0]</td>\n",
       "      <td>41671.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          equal_binned  ConvertedSalary\n",
       "0                  NaN              NaN\n",
       "1  (-2000.0, 400000.0]          70841.0\n",
       "2                  NaN              NaN\n",
       "3  (-2000.0, 400000.0]          21426.0\n",
       "4  (-2000.0, 400000.0]          41671.0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Bin the continouos variable ConvertedSalary into 5 bins\n",
    "so_survey_df['equal_binned'] = pd.cut(so_survey_df['ConvertedSalary'], bins=5)\n",
    "\n",
    "# Print the first 5 rows of the equal_binned column\n",
    "so_survey_df[['equal_binned', 'ConvertedSalary']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boundary_binned</th>\n",
       "      <th>ConvertedSalary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Medium</td>\n",
       "      <td>70841.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Low</td>\n",
       "      <td>21426.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Low</td>\n",
       "      <td>41671.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  boundary_binned  ConvertedSalary\n",
       "0             NaN              NaN\n",
       "1          Medium          70841.0\n",
       "2             NaN              NaN\n",
       "3             Low          21426.0\n",
       "4             Low          41671.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Specify the boundaries of the bins\n",
    "bins = [-np.inf, 10000, 50000, 100000, 150000, np.inf]\n",
    "\n",
    "# Bin labels\n",
    "labels = ['Very low', 'Low', 'Medium', 'High', 'Very high']\n",
    "\n",
    "# Bin the continous variable ConvertedSalary using these boundaries\n",
    "so_survey_df['boundary_binned'] = pd.cut(so_survey_df['ConvertedSalary'], bins=bins, labels=labels)\n",
    "\n",
    "# Print the first 5 rows of the boundary_binned column\n",
    "so_survey_df[['boundary_binned', 'ConvertedSalary']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}