{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Topic Modeling\n", "\n", "In this lecture, we'll work through an example of *topic modeling*. The idea of topic modeling is to find \"topics\" in documents that tie together many words. Here are some examples of hypothetical topics that you might find in a newspaper: \n", "\n", "1. **Finance**: \"dollar\", \"stock\", \"banks\"\n", "2. **Politics**: \"party\", \"vote\", \"election\"\n", "3. **Sports**: \"team\", \"win\", \"game\"\n", "\n", "In this lecture, we'll see how to use the term-document matrix from last time, in combination with some nice algorithms from `scikit-learn`, to perform topic modeling. Our overall aim is to get a coarse, topic-level summary of the plot of the short book *Alice’s Adventures in Wonderland* by Lewis Carroll. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt\n", "\n", "import nltk\n", "from nltk.corpus import gutenberg\n", "# need to do this once to download the data\n", "# nltk.download('gutenberg')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's briefly review the steps that we took to construct our term-document matrix. First, we used the `gutenberg` module to read in the raw text of the book, and split it into chapters. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "s = gutenberg.raw(\"carroll-alice.txt\")\n", "chapters = s.split(\"CHAPTER\")[1:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we created a nice, tidy data frame in which we stored the complete text of each chapter. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chaptertext
01I. Down the Rabbit-Hole\\n\\nAlice was beginnin...
12II. The Pool of Tears\\n\\n'Curiouser and curio...
23III. A Caucus-Race and a Long Tale\\n\\nThey we...
34IV. The Rabbit Sends in a Little Bill\\n\\nIt w...
45V. Advice from a Caterpillar\\n\\nThe Caterpill...
56VI. Pig and Pepper\\n\\nFor a minute or two she...
67VII. A Mad Tea-Party\\n\\nThere was a table set...
78VIII. The Queen's Croquet-Ground\\n\\nA large r...
89IX. The Mock Turtle's Story\\n\\n'You can't thi...
910X. The Lobster Quadrille\\n\\nThe Mock Turtle s...
1011XI. Who Stole the Tarts?\\n\\nThe King and Quee...
1112XII\\n\\n Alice's Evidence\\n\\n\\n'Here...
\n", "
" ], "text/plain": [ " chapter text\n", "0 1 I. Down the Rabbit-Hole\\n\\nAlice was beginnin...\n", "1 2 II. The Pool of Tears\\n\\n'Curiouser and curio...\n", "2 3 III. A Caucus-Race and a Long Tale\\n\\nThey we...\n", "3 4 IV. The Rabbit Sends in a Little Bill\\n\\nIt w...\n", "4 5 V. Advice from a Caterpillar\\n\\nThe Caterpill...\n", "5 6 VI. Pig and Pepper\\n\\nFor a minute or two she...\n", "6 7 VII. A Mad Tea-Party\\n\\nThere was a table set...\n", "7 8 VIII. The Queen's Croquet-Ground\\n\\nA large r...\n", "8 9 IX. The Mock Turtle's Story\\n\\n'You can't thi...\n", "9 10 X. The Lobster Quadrille\\n\\nThe Mock Turtle s...\n", "10 11 XI. Who Stole the Tarts?\\n\\nThe King and Quee...\n", "11 12 XII\\n\\n Alice's Evidence\\n\\n\\n'Here..." ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({\n", " \"chapter\" : range(1, len(chapters) + 1),\n", " \"text\" : chapters\n", "})\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, and this is the complex part, we used the `CountVectorizer` from `sklearn` to construct the term-document matrix. In this example, I've used a few more of the arguments for `CountVectorizer`. In particular, because I'd like to eventually be able to see how topics evolve between chapters, I use the `max_df` argument to specify that I'd like like to include words that appear in at most 50% of the chapters. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.feature_extraction.text import CountVectorizer\n", "vec = CountVectorizer(max_df = 0.5, min_df = 0, stop_words = \"english\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we can use this `CountVectorizer` to create the term-document matrix and collect it all as a nice, tidy data frame. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "counts = vec.fit_transform(df['text'])\n", "counts = counts.toarray()\n", "count_df = pd.DataFrame(counts, columns = vec.get_feature_names())\n", "df = pd.concat((df, count_df), axis = 1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chaptertext_i_abideableabsenceabsurdacceptanceaccidentaccidentally...yearyearsyelledyelpyeryesterdayyoungyouthzealandzigzag
01I. Down the Rabbit-Hole\\n\\nAlice was beginnin...00000000...0000000010
12II. The Pool of Tears\\n\\n'Curiouser and curio...10100000...0000010000
23III. A Caucus-Race and a Long Tale\\n\\nThey we...00001100...0000001000
\n", "

3 rows × 2148 columns

\n", "
" ], "text/plain": [ " chapter text _i_ abide \\\n", "0 1 I. Down the Rabbit-Hole\\n\\nAlice was beginnin... 0 0 \n", "1 2 II. The Pool of Tears\\n\\n'Curiouser and curio... 1 0 \n", "2 3 III. A Caucus-Race and a Long Tale\\n\\nThey we... 0 0 \n", "\n", " able absence absurd acceptance accident accidentally ... year \\\n", "0 0 0 0 0 0 0 ... 0 \n", "1 1 0 0 0 0 0 ... 0 \n", "2 0 0 1 1 0 0 ... 0 \n", "\n", " years yelled yelp yer yesterday young youth zealand zigzag \n", "0 0 0 0 0 0 0 0 1 0 \n", "1 0 0 0 0 1 0 0 0 0 \n", "2 0 0 0 0 0 1 0 0 0 \n", "\n", "[3 rows x 2148 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## On To Topic Modeling\n", "\n", "Now we are ready to run our model! Topic modeling is an *unsupervised* machine learning framework, which means that there's no set of true labels `y`. So, we just need to create the variables `X`. To do this, we can ignore the `text` and `chapter` columns. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "X = df.drop(['text', 'chapter'], axis = 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many algorithms for topic modeling. We will use *nonnegative matrix factorization* or NMF for now. As usual, there are three easy steps: \n", "\n", "1. Import the model we want. \n", "2. Initialize an instance of the model. \n", "3. Fit the model on data. \n", "\n", "NMF requires us to specify `n_components`, which is the number of topics to find. Choosing the right number of topics is a bit of an art, but there are also quantitative approaches based on Bayesian statistics that we won't go into here. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NMF(init='random', n_components=4, random_state=0)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.decomposition import NMF\n", "model = NMF(n_components = 4, init = \"random\", random_state = 0)\n", "model.fit(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two important parts of NMF. First, we have the topics themselves, which are stored in the `components_` attribute of the model. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 0. , 0. , ..., 0.03184396, 0. ,\n", " 0.00530733],\n", " [0.1975168 , 0.04338739, 0. , ..., 0. , 0. ,\n", " 0. ],\n", " [0.20127845, 0.21670879, 0.2105765 , ..., 1.11826124, 0.11323678,\n", " 0.18637687],\n", " [0. , 0.00499915, 0. , ..., 0. , 0.00330162,\n", " 0. ]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.components_" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 2146)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.components_.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uh, what does that mean? We can think of each component as a collection of **weights** for each word. We can find the most important words in each component by finding the words where the weights are highest within that component. We can do this with a handy function called `np.argsort()`, which tells you which entries of an array are the largest, second largest, etc." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1272, 1271, ..., 806, 1162, 1966],\n", " [2145, 926, 1784, ..., 244, 980, 1442],\n", " [1634, 1527, 1524, ..., 247, 244, 1174],\n", " [ 0, 1277, 1274, ..., 1117, 493, 835]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "orders = np.argsort(model.components_, axis = 1)\n", "orders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then use `numpy` \"fancy\" indexing to arrange the words in the needed orders. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([['_i_', 'painting', 'paint', ..., 'gryphon', 'mock', 'turtle'],\n", " ['zigzag', 'inquisitively', 'station', ..., 'cat', 'king',\n", " 'queen'],\n", " ['sheep', 'riper', 'rightly', ..., 'caterpillar', 'cat', 'mouse'],\n", " ['_i_', 'panted', 'pairs', ..., 'march', 'dormouse', 'hatter']],\n", " dtype=object)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "important_words = np.array(X.columns)[orders]\n", "important_words" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's convenient to write a function to automate this for us: " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "def top_words(X, model, component, num_words):\n", " orders = np.argsort(model.components_, axis = 1)\n", " important_words = np.array(X.columns)[orders]\n", " return important_words[component][-num_words:]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['tea', 'hare', 'march', 'dormouse', 'hatter'], dtype=object)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_words(X, model, 3, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next important aspect of topic modeling is the assignment of topics per document. This is done via weights. We can access this by using the `transform()` method of the model. " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[9.77501978e-05, 1.97664522e-02, 5.17821692e-01, 3.07225844e-02],\n", " [0.00000000e+00, 0.00000000e+00, 9.62026820e-01, 0.00000000e+00],\n", " [2.58343542e-04, 0.00000000e+00, 9.42349279e-01, 0.00000000e+00],\n", " [0.00000000e+00, 5.37128674e-03, 8.67735980e-01, 2.36338240e-04],\n", " [1.31577250e-02, 0.00000000e+00, 8.51527332e-01, 0.00000000e+00],\n", " [0.00000000e+00, 2.70835534e-01, 1.00413330e+00, 9.09882306e-02],\n", " [0.00000000e+00, 0.00000000e+00, 2.05082222e-02, 1.85495376e+00],\n", " [0.00000000e+00, 1.64643881e+00, 0.00000000e+00, 0.00000000e+00],\n", " [8.26943575e-01, 3.86067385e-01, 0.00000000e+00, 0.00000000e+00],\n", " [1.16895457e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],\n", " [0.00000000e+00, 8.61097347e-01, 0.00000000e+00, 9.03937864e-01],\n", " [3.73729071e-02, 9.74766421e-01, 1.75619195e-02, 7.02019118e-02]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights = model.transform(X)\n", "weights" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAACPCAYAAADTJpFmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAJGUlEQVR4nO3df4jfdQHH8ddrd+fOTU1TC91GGixrDXJ1mGsQoYKzpEUQKCQSwuiHpSKE9UdRf4RUSP0hwdClkCiiQhIzFbNEyOU2rTZPaWrq5WyG2fzBPLe9+uP7vby273bfzc+P933v+YDjvj9un/frs7t73fve9/l+Pk4iAEC55rUdAABwaBQ1ABSOogaAwlHUAFA4ihoACkdRA0DhhuvY6FGen1EtrGPTAA5h8tRmv+989N5Gxxt5enej48lubKjdeUOT2d1zwFqKelQL9UmfW8em0YQGvzglSRzLX5nnvray0fGGl+9qdLxFX9zW6HieP7+xsR55656DPsfSBwAUjqIGgMJR1ABQOIoaAApHUQNA4ShqACgcRQ0AhaOoAaBwfRW17dW2n7K93fY1dYcCALxjxqK2PSTpekkXSFom6WLby+oOBgDo6GdGfZak7UmeSTIp6TZJa+qNBQCY0k9RL5L0wrT7E93HAAAN6OekTL3O0HPAWXRsr5W0VpJGteBdxgIATOlnRj0hacm0+4slvbj/ByVZl2QsydiImjvjFAAMun6K+lFJS22fbvsoSRdJurveWACAKTMufSTZY/tySfdKGpK0PkmzJ4UFgDmsrwsHJNkgaUPNWQAAPfDKRAAoHEUNAIWjqAGgcBQ1ABSOogaAwlHUAFA4ihoACkdRA0DhKGoAKFxfr0w8IvOGatv0AfbtbW6suSAHnByxVl7x0cbGymPNnv3gqu3jjY73469+otHxRr7X7P/nsz9a2eh4e96zr7Gx3rr2Dwd9jhk1ABSOogaAwlHUAFA4ihoACkdRA0DhKGoAKBxFDQCFo6gBoHAUNQAUbsaitr3e9k7bW5sIBAD4f/3MqG+StLrmHACAg5ixqJM8JOmVBrIAAHpgjRoAClfZ2fNsr5W0VpJGtaCqzQLAnFfZjDrJuiRjScZGNL+qzQLAnMfSBwAUrp/D826V9EdJZ9iesH1Z/bEAAFNmXKNOcnETQQAAvbH0AQCFo6gBoHAUNQAUjqIGgMJR1ABQOIoaAApHUQNA4ShqACgcRQ0AhXOSyjd6zIlLsvz8Kyvf7sG8tKr6fTikfW50uOE3mx3P+xodTpMn7m1srA99fXNjY0mS9jW3b6jBvKHGhtq49z7tyis9v9mZUQNA4ShqACgcRQ0AhaOoAaBwFDUAFI6iBoDCUdQAUDiKGgAKR1EDQOH6ubjtEtsP2h63vc32FU0EAwB0zHhxW0l7JF2dZIvtYyVttn1/kidqzgYAUB8z6iQ7kmzp3n5N0rikRXUHAwB0HNYate3TJK2QtLGOMACAA/Vd1LaPkXSnpCuT7Orx/Frbm2xvenv3G1VmBIA5ra+itj2iTknfkuSuXh+TZF2SsSRjI6MLq8wIAHNaP0d9WNKNksaTXFd/JADAdP3MqFdJukTSObYf7759tuZcAICuGQ/PS/KwpGYvMQIA+B9emQgAhaOoAaBwFDUAFI6iBoDCUdQAUDiKGgAKR1EDQOEoagAoHEUNAIXr58IBh23ev9/UcXduqWPTPR1722RjY6F6Qyec0NhYGyY2NzaWJJ1/6pmNjjfonv7JykbHW/rDrY2N5dcPPm9mRg0AhaOoAaBwFDUAFI6iBoDCUdQAUDiKGgAKR1EDQOEoagAoHEUNAIXr5yrko7b/ZPvPtrfZ/kETwQAAHf28hPwtSecked32iKSHbd+T5JGaswEA1N9VyCPp9e7dke5b6gwFAHhHX2vUtodsPy5pp6T7k2ysNxYAYEpfRZ1kb5IzJS2WdJbt5ft/jO21tjfZ3vR2dledEwDmrMM66iPJq5J+L2l1j+fWJRlLMjbi0YriAQD6OerjZNvHd28fLek8SU/WHQwA0NHPUR+nSLrZ9pA6xX57kt/UGwsAMKWfoz7+ImlFA1kAAD3wykQAKBxFDQCFo6gBoHAUNQAUjqIGgMJR1ABQOIoaAApHUQNA4ShqACicO6ebrnij9suSnjuCf3qSpH9VHKcUg7xvEvs327F/7ftAkpN7PVFLUR8p25uSjLWdow6DvG8S+zfbsX9lY+kDAApHUQNA4Uor6nVtB6jRIO+bxP7NduxfwYpaowYAHKi0GTUAYD9FFLXt1bafsr3d9jVt56mS7SW2H7Q9bnub7SvazlS17lXqH7M9cFf+sX287TtsP9n9HK5sO1OVbF/V/brcavtWe3Zf8NT2ets7bW+d9th7bd9v+2/d9ye0mfFItF7U3Ut8XS/pAknLJF1se1m7qSq1R9LVST4i6WxJ3xiw/ZOkKySNtx2iJj+X9NskH5b0MQ3QftpeJOlbksaSLJc0JOmidlO9azfpwItvXyPpgSRLJT3QvT+rtF7Uks6StD3JM0kmJd0maU3LmSqTZEeSLd3br6nzjb6o3VTVsb1Y0uck3dB2lqrZPk7SpyXdKElJJpO82m6qyg1LOtr2sKQFkl5sOc+7kuQhSa/s9/AaSTd3b98s6QuNhqpACUW9SNIL0+5PaICKbDrbp6lz/cmN7Sap1M8kfVvSvraD1OCDkl6W9Mvu0s4Nthe2HaoqSf4h6aeSnpe0Q9J/ktzXbqpavD/JDqkzcZL0vpbzHLYSito9Hhu4Q1FsHyPpTklXJtnVdp4q2L5Q0s4km9vOUpNhSR+X9IskKyS9oVn4a/PBdNdq10g6XdKpkhba/nK7qdBLCUU9IWnJtPuLNct//dqf7RF1SvqWJHe1nadCqyR93vbf1VmyOsf2r9qNVKkJSRNJpn4DukOd4h4U50l6NsnLSd6WdJekT7WcqQ7/tH2KJHXf72w5z2EroagflbTU9um2j1Lnjxl3t5ypMratzhrneJLr2s5TpSTfSbI4yWnqfN5+l2RgZmRJXpL0gu0zug+dK+mJFiNV7XlJZ9te0P06PVcD9MfSae6WdGn39qWSft1iliMy3HaAJHtsXy7pXnX+6rw+ybaWY1VplaRLJP3V9uPdx76bZEOLmdC/b0q6pTuJeEbSV1rOU5kkG23fIWmLOkcnPabZ/go++1ZJn5F0ku0JSd+XdK2k221fps4Ppy+1l/DI8MpEAChcCUsfAIBDoKgBoHAUNQAUjqIGgMJR1ABQOIoaAApHUQNA4ShqACjcfwHmlh5kceIWowAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1)\n", "ax.imshow(weights.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The weights indicate the relative presence of each topic in each chapter. For example, Topic 2 is highly present in the first six chapters, but then mostly absent for the rest of the book. Topic 3 appears in Chapters 7 and 11, and so on. \n", "\n", "We can also visualize the same information as a line chart. Let's add as labels some of the top words for each topic. " ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1)\n", "\n", "for i in range(4):\n", " ax.plot(df['chapter'], weights[:,i], label = top_words(X, model, i, 5))\n", "\n", "ax.legend(bbox_to_anchor=(1.05, 0.65), loc=\"upper left\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This plot allows us to easily see several major features of the plot of the novel, including the tea party with the March Hare, the Mad Hatter, and the Dormouse (Chapter 7), the crocquet game in the court of the Queen of Hearts (Chapter 8), the appearance of the Mock Turtle and the Lobster in (Chapters 9 and 10), and the reappearance of many characters in Chapter 11. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }