{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to text classification\n", "\n", "This workshop is a one-hour beginner's introduction to text classification. \n", "\n", "This notebook deliberately has more content that we can reasonably cover in one hour. To get the most out of this workshop, I'd suggest spending some time working through it in full after the workshop.\n", "\n", "We'll cover the following topics:\n", "\n", "[What is text classification?](#what)\n", "\n", "_What is text classification and why should you care?_\n", "\n", "[Exploratory Data Analysis](#eda)\n", "\n", "_It's always a good idea to get to know your data before doing anything else._\n", "\n", "[Featurization](#featurization)\n", "\n", "_How can we turn natural language data into something we can do machine learning on?_\n", "\n", "[Classification](#classification)\n", "\n", "_How can a computer learn to distinguish between different categories?_\n", "\n", "[Interpretation](#interpret)\n", "\n", "_Did the computer learn correctly? Can the computer tell us anything new about our data?_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is text classification?\n", "\n", "Imagine that you work at [YouTube](https://www.youtube.com/) (if you haven't heard of it, YouTube is a video-sharing website). Your job is to remove comments on videos that are spam (unsolicited and inappropriate comments). You look through each video and read the comments yourself, deciding which are spam and which are not spam. Perhaps you see comments like those below. Which would you consider to be spam and which not spam?\n", "\n", "- _Hey @dancer317, love ur videos so much! Thanks for all the tips on dancing!_\n", "- _OUR LASER PRINTER/FAX/COPIER TONER CARTRIDGE PRICES NOW AS LOW AS 39 DOLLARS. SPECIALS WEEKLY ON ALL LASER PRINTER SUPPLIES. WE CARRY MOST ALL LASER PRINTER CARTRIDGES, FAX SUPPLIES AND COPIER TONERS AT WAREHOUSE PRICES_\n", "- _I'm not sold on your first point about crossing national boundaries, but I see what you mean about non-economic alternatives._\n", "- _Some of the most beautiful women in the world bare it all for you. Denise Richards, Britney Spears, Jessica Simpson, and many more. CLICK HERE FOR NUDE CELEBS_\n", "\n", "How did you decide which were spam and which weren't? Maybe one thing you noted was the high number of words in all capitals. The topics can also give you a clue, as the spam-like comments talk about selling things and nudity, which are often found in spam comments.\n", "\n", "However you decided, we can think about the task you were doing like this:\n", "\n", "\n", "\n", "This is text classification, performed by a human. What you just did was an example of text classification. You took a comment written in English, and you classified it into one of two classes: spam or not spam. Wouldn't it be nice to have a computer do this for you? [You could outsource your job to the computer and just surf the web all day](https://www.npr.org/sections/thetwo-way/2013/01/16/169528579/outsourced-employee-sends-own-job-to-china-surfs-web). What you'd want to do is replace the human with a computer, like this:\n", "\n", "\n", "\n", "\n", "How are we going to do this? Well, what if, for each comment on YouTube, we counted the number of times it mentioned nudity or tried to sell something, and we measured the proportion of capital letters? We'd get two numbers for each comment. We could also use your human judgements from before in a third column telling us whether that comment is spam or not.\n", "\n", "| Comment | Selling or nudity | Proportion capital letters | Is it spam? |\n", "|---------------------------------------------------------|-------------------|----------------------------|-------------|\n", "| Hey @dancer317, love ur videos so much! Thanks for ... | 0 | 0.1 | No |\n", "| OUR LASER PRINTER/FAX/COPIER TONER CARTRIDGE PRICES ... | 4 | 1.0 | Yes |\n", "| I'm not sold on your first point ... | 1 | 0.05 | No |\n", "| Some of the most beautiful women in the world ... | 3 | 0.15 | Yes |\n", "\n", "We can treat these two numbers as geometric coordinates and plot them. We can plot the spam comments in red and the non-spam comments in green.\n", "\n", "\n", "\n", "\n", "\n", "**To do text classification, we're going to need to do two things:**\n", "- **Turn our natural language comments into numbers.**\n", "- **Train a classifier to take those numbers and distinguish between the classes.**\n", "\n", "Why do we care about text classification? Because most applied natural language processing problems can be tackled as text classification:\n", "\n", "- Sentiment analysis\n", "- Genre classification\n", "- Language identification\n", "- Authorship attribution\n", "- Is this document relevant to this legal case?\n", "- Is the patient in need of urgent care?\n", "\n", "### What is sentiment analysis?\n", "\n", "In this notebook, we're going to perform [sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) on a dataset of tweets about US airlines. Sentiment analysis is the task of extracting [affective states][1] from text. Sentiment analysis is most ofen used to answer questions like:\n", "\n", "[1]: https://en.wikipedia.org/wiki/Affect_(psychology)\n", "\n", "- _what do our customers think of us?_\n", "- _do our users like the look of our product?_\n", "- _what aspects of our service are users dissatisfied with?_\n", "\n", "### Dataset\n", "\n", "The dataset was collected by [Crowdflower](https://www.crowdflower.com/), which they then made public through [Kaggle](https://www.kaggle.com/crowdflower/twitter-airline-sentiment). I've downloaded it for you and put it in the \"data\" directory. Note that this is a nice clean dataset; not the norm in real-life data science! I've chosen this dataset so that we can concentrate on understanding what text classification is and how to do it." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.\n", " from numpy.core.umath_tests import inner1d\n" ] } ], "source": [ "%matplotlib inline\n", "import os\n", "import re\n", "import warnings\n", "warnings.simplefilter(action='ignore', category=FutureWarning)\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\n", "from sklearn.linear_model import LogisticRegressionCV\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.ensemble import RandomForestClassifier\n", "sns.set()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploratory Data Analysis" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idairline_sentimentairline_sentiment_confidencenegativereasonnegativereason_confidenceairlineairline_sentiment_goldnamenegativereason_goldretweet_counttexttweet_coordtweet_createdtweet_locationuser_timezone
0570306133677760513neutral1.0000NaNNaNVirgin AmericaNaNcairdinNaN0@VirginAmerica What @dhepburn said.NaN2015-02-24 11:35:52 -0800NaNEastern Time (US & Canada)
1570301130888122368positive0.3486NaN0.0Virgin AmericaNaNjnardinoNaN0@VirginAmerica plus you've added commercials t...NaN2015-02-24 11:15:59 -0800NaNPacific Time (US & Canada)
2570301083672813571neutral0.6837NaNNaNVirgin AmericaNaNyvonnalynnNaN0@VirginAmerica I didn't today... Must mean I n...NaN2015-02-24 11:15:48 -0800Lets PlayCentral Time (US & Canada)
\n", "
" ], "text/plain": [ " tweet_id airline_sentiment airline_sentiment_confidence \\\n", "0 570306133677760513 neutral 1.0000 \n", "1 570301130888122368 positive 0.3486 \n", "2 570301083672813571 neutral 0.6837 \n", "\n", " negativereason negativereason_confidence airline \\\n", "0 NaN NaN Virgin America \n", "1 NaN 0.0 Virgin America \n", "2 NaN NaN Virgin America \n", "\n", " airline_sentiment_gold name negativereason_gold retweet_count \\\n", "0 NaN cairdin NaN 0 \n", "1 NaN jnardino NaN 0 \n", "2 NaN yvonnalynn NaN 0 \n", "\n", " text tweet_coord \\\n", "0 @VirginAmerica What @dhepburn said. NaN \n", "1 @VirginAmerica plus you've added commercials t... NaN \n", "2 @VirginAmerica I didn't today... Must mean I n... NaN \n", "\n", " tweet_created tweet_location user_timezone \n", "0 2015-02-24 11:35:52 -0800 NaN Eastern Time (US & Canada) \n", "1 2015-02-24 11:15:59 -0800 NaN Pacific Time (US & Canada) \n", "2 2015-02-24 11:15:48 -0800 Lets Play Central Time (US & Canada) " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA_DIR = 'data'\n", "fname = os.path.join(DATA_DIR, 'tweets.csv')\n", "df = pd.read_csv(fname)\n", "df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which airlines are tweeted about and how many of each in this dataset?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAEICAYAAACj2qi6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtcVHX+x/HXcPMCg8BqFr9SIU1TM5dI20LLC4q2tltC\nIElbml02Mdr0ByIihopG6pZ4S21X0VJRM7d1rSSNH2roWmagdvFnbAZ5CUoYFJCZ3x8+nF+soIfN\nYQjfz8fDx8M5fObw+TKX93zPmXOOyWaz2RARETHAxdkNiIjIL4dCQ0REDFNoiIiIYQoNERExTKEh\nIiKGKTRERMQwh4bG999/z7333svRo0cpLCxk1KhRREdHM23aNKxWKwAZGRmEh4cTFRXFwYMHAeqt\nFRER53JYaFRXV5OcnEzLli0BSEtLIy4ujjfeeAObzUZ2djYFBQXs3buXrKws5s2bx/Tp0+utFRER\n53Nz1IrnzJlDVFQUr732GgAFBQX06dMHgP79+7Nr1y4CAgIICQnBZDLh7+9PTU0NJSUlddaGhoZe\n9vedOlXmqKGIiDRb7dqZG1TvkNDYtGkTfn5+9OvXzx4aNpsNk8kEgKenJ2VlZZSXl+Pj42O/38Xl\nddVeia9va9zcXB0wGhERucghobFx40ZMJhN79uzh8OHDxMfHU1JSYv+5xWLB29sbLy8vLBZLreVm\nsxkXF5dLaq+ktLTi6g5CROQa0NCZhkP2aaxZs4bVq1eTmZnJrbfeypw5c+jfvz95eXkA5OTkEBwc\nTFBQELm5uVitVoqKirBarfj5+dG9e/dLakVExPkctk/j38XHxzN16lTmzZtHYGAgQ4cOxdXVleDg\nYCIjI7FarSQnJ9dbKyIizmdqLme51Y5wEZGGaxKbp0REpHlSaIiIiGEKDRERMUyhISIihik0RETE\nsEb7yq2zPJe+xdktNNgrkx5wdgsiInXSTENERAxTaIiIiGEKDRERMUyhISIihik0RETEMIWGiIgY\nptAQERHDFBoiImKYQkNERAxTaIiIiGEKDRERMUyhISIihik0RETEMIed5bampoakpCSOHTuGyWRi\n+vTpnD9/nqeeeopOnToBMGrUKIYPH05GRgY7d+7Ezc2NxMREevXqRWFhIQkJCZhMJrp06cK0adNw\ncVHGiYg4k8NCY8eOHQCsXbuWvLw85s+fz8CBA3n88ccZM2aMva6goIC9e/eSlZVFcXExsbGxbNy4\nkbS0NOLi4ujbty/JyclkZ2cTGhrqqHZFRMQAh4XG4MGDue+++wAoKirC29ub/Px8jh07RnZ2Nh07\ndiQxMZH9+/cTEhKCyWTC39+fmpoaSkpKKCgooE+fPgD079+fXbt2KTRERJzMoRdhcnNzIz4+nvff\nf59XX32VEydOEBERQc+ePVm8eDELFy7EbDbj4+Njv4+npydlZWXYbDZMJlOtZZfj69saNzdXRw6n\n0bRrZ3Z2CyIidXL4lfvmzJnDxIkTefjhh1m7di3t27cHIDQ0lNTUVAYNGoTFYrHXWywWzGZzrf0X\nFosFb2/vy/6e0tIKxwzACU6dunxAiohcLQ39kOqwPcubN29m6dKlALRq1QqTycT48eM5ePAgAHv2\n7KFHjx4EBQWRm5uL1WqlqKgIq9WKn58f3bt3Jy8vD4CcnByCg4Md1aqIiBjksJnGkCFDmDx5Mo88\n8gjnz58nMTGRG264gdTUVNzd3Wnbti2pqal4eXkRHBxMZGQkVquV5ORkAOLj45k6dSrz5s0jMDCQ\noUOHOqpVERExyGSz2WzObuJqqG+TznPpWxq5k5/vlUkPOLsFEblGNJnNUyIi0vwoNERExDCFhoiI\nGKbQEBERwxQaIiJimEJDREQMU2iIiIhhCg0RETFMoSEiIoYpNERExDCFhoiIGKbQEBERwxQaIiJi\nmEJDREQMU2iIiIhhCg0RETFMoSEiIoYpNERExDCFhoiIGObmqBXX1NSQlJTEsWPHMJlMTJ8+nRYt\nWpCQkIDJZKJLly5MmzYNFxcXMjIy2LlzJ25ubiQmJtKrVy8KCwvrrBUREedx2Lvwjh07AFi7di1x\ncXHMnz+ftLQ04uLieOONN7DZbGRnZ1NQUMDevXvJyspi3rx5TJ8+HaDOWhERcS6HhcbgwYNJTU0F\noKioCG9vbwoKCujTpw8A/fv3Z/fu3ezfv5+QkBBMJhP+/v7U1NRQUlJSZ62IiDiXwzZPAbi5uREf\nH8/777/Pq6++yq5duzCZTAB4enpSVlZGeXk5Pj4+9vtcXG6z2S6pvRxf39a4ubk6bjCNqF07s7Nb\nEBGpk0NDA2DOnDlMnDiRhx9+mMrKSvtyi8WCt7c3Xl5eWCyWWsvNZnOt/RcXay+ntLTi6jfvJKdO\nXT4gRUSuloZ+SHXY5qnNmzezdOlSAFq1aoXJZKJnz57k5eUBkJOTQ3BwMEFBQeTm5mK1WikqKsJq\nteLn50f37t0vqRUREecy2Ww2myNWXFFRweTJkzl9+jTnz59n3Lhx3HzzzUydOpXq6moCAwOZMWMG\nrq6uLFiwgJycHKxWK5MnTyY4OJhjx47VWVuf+j6dP5e+xRHDc6hXJj3g7BZE5BrR0JmGw0KjsSk0\nREQarqGh4fB9GuJYk95JcnYLDZL+2xnObkFEfgYdLSciIoYpNERExDCFhoiIGKbQEBERwxQaIiJi\nmEJDREQMU2iIiIhhOk5DmrR9L0xwdgsNcufcV53dgohDaaYhIiKGKTRERMQwhYaIiBim0BAREcMU\nGiIiYphCQ0REDFNoiIiIYQoNERExTKEhIiKGKTRERMQwh5xGpLq6msTERL799luqqqp45plnuOGG\nG3jqqafo1KkTAKNGjWL48OFkZGSwc+dO3NzcSExMpFevXhQWFpKQkIDJZKJLly5MmzYNFxflm4iI\nszkkNLZs2YKPjw/p6en88MMP/P73v+fZZ5/l8ccfZ8yYMfa6goIC9u7dS1ZWFsXFxcTGxrJx40bS\n0tKIi4ujb9++JCcnk52dTWhoqCNaFRGRBnBIaISFhTF06FAAbDYbrq6u5Ofnc+zYMbKzs+nYsSOJ\niYns37+fkJAQTCYT/v7+1NTUUFJSQkFBAX369AGgf//+7Nq1S6EhItIEOCQ0PD09ASgvL2fChAnE\nxcVRVVVFREQEPXv2ZPHixSxcuBCz2YyPj0+t+5WVlWGz2TCZTLWWXYmvb2vc3FwdMZxG166d2dkt\nOExzHhs0//GJOOzU6MXFxTz77LNER0czYsQIzpw5g7e3NwChoaGkpqYyaNAgLBaL/T4WiwWz2Vxr\n/4XFYrHf73JKSyuu/iCc5NSpK4fkL1VzHhs0//FJ89PQDzoO2bt8+vRpxowZw6RJkwgPDwdg7Nix\nHDx4EIA9e/bQo0cPgoKCyM3NxWq1UlRUhNVqxc/Pj+7du5OXlwdATk4OwcHBjmhTREQayCEzjSVL\nlnDmzBkWLVrEokWLAEhISGDWrFm4u7vTtm1bUlNT8fLyIjg4mMjISKxWK8nJyQDEx8czdepU5s2b\nR2BgoH3/iIiIOJfJZrPZnN3E1VDfZoHn0rc0cic/3yuTHjBcO+mdJAd2cvWl/3ZGg+p15T4Rx2ro\n5ild7lXESZb9eZuzW2iwcXFhzm5BnExHzImIiGEKDRERMUyhISIihik0RETEMIWGiIgYptAQERHD\nFBoiImKYQkNERAxTaIiIiGE6IlxEHOJw3lxnt9Bgt/Z9wdktNHmGZhqpqamXLIuPj7/qzYiISNN2\n2ZnGlClT+Oabb8jPz+fLL7+0Lz9//ryhCyOJiEjzctnQeOaZZ/j222+ZOXMm48ePty93dXXl5ptv\ndnhzIiLStFw2NG688UZuvPFGtmzZQnl5uf1SrAAVFRW1LtUqIiLNn6Ed4UuXLmXp0qW1QsJkMpGd\nne2wxkREpOkxFBpZWVls374dPz8/R/cjIiJNmKFvT91www20adPG0b2IiEgTZ2im0alTJ6Kjo+nb\nty8eHh725T/dOS4iIs2fodBo37497du3d3QvIiLSxBkKjYbOKKqrq0lMTOTbb7+lqqqKZ555hs6d\nO5OQkIDJZKJLly5MmzYNFxcXMjIy2LlzJ25ubiQmJtKrVy8KCwvrrBUREecyFBrdunXDZDLVWnbd\nddfx4Ycf1lm/ZcsWfHx8SE9P54cffuD3v/893bp1Iy4ujr59+5KcnEx2djb+/v7s3buXrKwsiouL\niY2NZePGjaSlpV1SGxoa+vNHKyIiP4uh0Dhy5Ij9/9XV1Wzfvp0DBw7UWx8WFsbQoUMBsNlsuLq6\nUlBQQJ8+fQDo378/u3btIiAggJCQEEwmE/7+/tTU1FBSUlJn7ZVCw9e3NW5urkaG0+S1a2d2dgsO\n05zHBhrfTx12YB+O0twfv6uhwScsdHd3Z9iwYSxZsqTeGk9PTwDKy8uZMGECcXFxzJkzxz5b8fT0\npKysjPLy8lrHflxcbrPZLqm9ktLSioYOpck6dar5nqKlOY8NNL5fuuY+vro0NCgNhcbmzZvt/7fZ\nbHz55Ze4u7tf9j7FxcU8++yzREdHM2LECNLT0+0/s1gseHt74+XlhcViqbXcbDbX2n9xsVZERJzP\n0N7lvLw8+7+9e/cCMH/+/HrrT58+zZgxY5g0aRLh4eEAdO/enby8PABycnIIDg4mKCiI3NxcrFYr\nRUVFWK1W/Pz86qwVERHnMzTTSEtLo7q6mmPHjlFTU0OXLl1wc6v/rkuWLOHMmTMsWrSIRYsWARfO\nmDtjxgzmzZtHYGAgQ4cOxdXVleDgYCIjI7FarSQnJwMXTrs+derUWrUiIuJ8hkIjPz+fCRMm4OPj\ng9Vq5fTp0yxcuJDbb7+9zvqkpCSSkpIuWb569epLlsXGxhIbG1trWUBAQJ21IiLiXIZCY8aMGcyf\nP98eEgcOHCA1NZUNGzY4tDkREWlaDO3TqKioqDWr6N27N5WVlQ5rSkREmiZDodGmTRu2b99uv719\n+3ZdS0NE5BpkaPNUamoqTz31FFOmTLEvW7t2rcOaEhGRpsnQTCMnJ4dWrVqxY8cOVq5ciZ+fn/2r\ntyIicu0wFBrr16/nzTffpHXr1nTr1o1Nmzbp200iItcgQ6FRXV1d6wjwKx0NLiIizZOhfRqDBw/m\nD3/4A8OGDQPgvffeY9CgQQ5tTEREmh5DoTFp0iS2bdvGvn37cHNz49FHH2Xw4MGO7k1ERJoYw2e5\nDQsLIywszJG9iIhIE6fL4YmIiGEKDRERMUyhISIihik0RETEMIWGiIgYptAQERHDFBoiImKYQkNE\nRAxTaIiIiGEODY1PP/2UmJgYAA4dOkS/fv2IiYkhJiaGrVu3ApCRkUF4eDhRUVEcPHgQgMLCQkaN\nGkV0dDTTpk3DarU6sk0RETHI8GlEGmrZsmVs2bKFVq1aAVBQUMDjjz/OmDFj7DUFBQXs3buXrKws\niouLiY2NZePGjaSlpREXF0ffvn1JTk4mOzub0NBQR7UqIiIGOWym0aFDBxYsWGC/nZ+fz86dO3nk\nkUdITEykvLyc/fv3ExISgslkwt/fn5qaGkpKSigoKKBPnz4A9O/fn927dzuqTRERaQCHzTSGDh3K\n8ePH7bd79epFREQEPXv2ZPHixSxcuBCz2VzrWuOenp6UlZVhs9kwmUy1ll2Jr29r3Nxcr/5AnKBd\nO7OzW3CY5jw20Ph+6rAD+3CU5v74XQ0OC41/Fxoaire3t/3/qampDBo0CIvFYq+xWCyYzWZcXFxq\nLbt4v8spLa24+k07yalTVw7JX6rmPDbQ+H7pmvv46tLQoGy0b0+NHTvWvqN7z5499OjRg6CgIHJz\nc7FarRQVFWG1WvHz86N79+7k5eUBF65PHhwc3FhtiojIZTTaTCMlJYXU1FTc3d1p27YtqampeHl5\nERwcTGRkJFarleTkZADi4+OZOnUq8+bNIzAwkKFDhzZWmyIichkODY0bb7yR9evXA9CjRw/Wrl17\nSU1sbCyxsbG1lgUEBLB69WpHtiYiIv8BHdwnIiKGKTRERMQwhYaIiBim0BAREcMUGiIiYphCQ0RE\nDFNoiIiIYQoNERExTKEhIiKGKTRERMQwhYaIiBim0BAREcMUGiIiYphCQ0REDFNoiIiIYQoNEREx\nTKEhIiKGKTRERMQwhYaIiBjm0ND49NNPiYmJAaCwsJBRo0YRHR3NtGnTsFqtAGRkZBAeHk5UVBQH\nDx68bK2IiDiXw0Jj2bJlJCUlUVlZCUBaWhpxcXG88cYb2Gw2srOzKSgoYO/evWRlZTFv3jymT59e\nb62IiDifw0KjQ4cOLFiwwH67oKCAPn36ANC/f392797N/v37CQkJwWQy4e/vT01NDSUlJXXWioiI\n87k5asVDhw7l+PHj9ts2mw2TyQSAp6cnZWVllJeX4+PjY6+5uLyu2ivx9W2Nm5vrVR6Fc7RrZ3Z2\nCw7TnMcGGt9PHXZgH47S3B+/q8FhofHvXFz+f1JjsVjw9vbGy8sLi8VSa7nZbK6z9kpKSyuubsNO\ndOrUlUPyl6o5jw00vl+65j6+ujQ0KBvt21Pdu3cnLy8PgJycHIKDgwkKCiI3Nxer1UpRURFWqxU/\nP786a0VExPkabaYRHx/P1KlTmTdvHoGBgQwdOhRXV1eCg4OJjIzEarWSnJxcb62IiDifQ0Pjxhtv\nZP369QAEBASwevXqS2piY2OJjY2ttay+WhERcS4d3CciIoYpNERExDCFhoiIGKbQEBERwxQaIiJi\nmEJDREQMU2iIiIhhCg0RETFMoSEiIoYpNERExDCFhoiIGKbQEBERwxQaIiJimEJDREQMU2iIiIhh\nCg0RETFMoSEiIoYpNERExDCFhoiIGObQa4TX5cEHH8TLywu4cA3xyMhIZs6ciaurKyEhIYwfPx6r\n1UpKSgqff/45Hh4ezJgxg44dOzZ2qyIi8m8aNTQqKyux2WxkZmbal/3ud79jwYIF3HTTTTz55JMc\nOnSI48ePU1VVxbp16zhw4ACzZ89m8eLFjdmqiIjUoVFD48iRI5w9e5YxY8Zw/vx5YmNjqaqqokOH\nDgCEhISwe/duTp06Rb9+/QDo3bs3+fn5jdmmiMgVpeV+7OwWGmxySNDPXkejhkbLli0ZO3YsERER\nfP3114wbNw5vb2/7zz09Pfnmm28oLy+3b8ICcHV15fz587i51d+ur29r3NxcHdp/Y2nXzuzsFhym\nOY8NNL6fOuzAPhxFj9+VNWpoBAQE0LFjR0wmEwEBAZjNZn744Qf7zy0WC97e3pw7dw6LxWJfbrVa\nLxsYAKWlFQ7ru7GdOlXm7BYcpjmPDTS+X7prcXwNDZJG/fbUhg0bmD17NgAnTpzg7NmztG7dmn/9\n61/YbDZyc3MJDg4mKCiInJwcAA4cOMAtt9zSmG2KiEg9GnWmER4ezuTJkxk1ahQmk4lZs2bh4uLC\nxIkTqampISQkhNtvv53bbruNXbt2ERUVhc1mY9asWY3ZpoiI1KNRQ8PDw4O5c+desnz9+vW1bru4\nuPDiiy82VlsiImKQDu4TERHDFBoiImKYQkNERAxTaIiIiGEKDRERMUyhISIihik0RETEMIWGiIgY\nptAQERHDFBoiImKYQkNERAxTaIiIiGEKDRERMUyhISIihik0RETEMIWGiIgYptAQERHDFBoiImKY\nQkNERAxr1GuEN4TVaiUlJYXPP/8cDw8PZsyYQceOHZ3dlojINa3JzjS2b99OVVUV69at44UXXmD2\n7NnObklE5JrXZENj//799OvXD4DevXuTn5/v5I5ERMRks9lszm6iLlOmTGHIkCHce++9ANx3331s\n374dN7cmu0VNRKTZa7IzDS8vLywWi/221WpVYIiIOFmTDY2goCBycnIAOHDgALfccouTOxIRkSa7\neerit6e++OILbDYbs2bN4uabb3Z2WyIi17QmGxoiItL0NNnNUyIi0vQoNERExLBrPjTy8vJ4/vnn\nay17+eWX2bRpU531r732GgcPHqSyspKsrCzDv+f5558nLy/vZ/UKV+73rbfe4tFHHyUmJoaoqChy\nc3PrXE9lZSX33HMPy5cvty87fPgwGRkZP7tHR1q2bBkhISFUVlZetXVu2rSJ7Ozsq7a+hnjttdd4\n7LHHGD16NDExMQ0+HumHH37gb3/7GwAJCQn2L484wurVqx227ry8PH7zm98QExPD6NGjiYqKYuvW\nrfXWx8TEcPTo0Vrjv1pGjx7Nnj17ai2bMWMGWVlZzJw5k6KiIkPraejzqq7X5NUyfvz4q7YufYe1\ngZ588kkAjh8/TlZWFhEREU7u6P+VlZWxaNEi/v73v+Ph4cGJEyeIiIhg586duLjU/nzw7rvvMnz4\ncN566y3GjBmDi4sLt956K7feequTujdmy5YtDB8+nL///e889NBDV2WdV2s9DfXVV1/xwQcf8Oab\nb2IymTh8+DDx8fFs2bLF8Do+//xzPvjgA0aMGOHATi9YvHgxo0ePdtj677rrLubPnw+AxWIhJiaG\ngICAyz4nHTH+iIgI3n77bX7zm98AUFVVxY4dO/jTn/7UoNd7Q59Xdb0mr5ar+WFQoXEZTzzxBO7u\n7hw/fpzhw4fzzDPPkJCQwPDhw3nvvff46quvyMjI4A9/+ANTpkyhtLQUgKSkJLp27cqaNWvIysqi\nXbt2fP/99w7v18PDg+rqat58800GDBhAhw4d2L59e51PvqysLKZMmUJJSQkffvghAwYMIC8vj7Vr\n1zJ//nwGDBhAYGAg1113HYcOHeLtt9/mwIEDjBs3jry8PE6ePMmUKVN45ZVXmDJlCmVlZZw8eZLo\n6GhGjBjBgw8+yLvvvourqyvp6en06NGD0tJSNm/ejIuLC7fddhtJSUkNGl9eXh4dOnQgKiqKSZMm\n8dBDDxETE0PXrl358ssvad26NcHBweTm5nLmzBlef/11WrduzbRp0ygsLMRqtRIXF0ffvn357W9/\nS6dOnXB3dycwMJC2bdsSFRVFamoqBw8epLq6mtjYWAYMGEBycjLfffcdJ0+eZODAgTz//PMkJCTg\n4eHBt99+y8mTJ5k9ezY9evRo0HjMZjNFRUVs2LCB/v37c+utt7JhwwYOHTpEamoqrq6utGjRgtTU\nVKxWK3/6059Yv349AA8//DDz5s1jyZIlHDlyhHXr1gGwbt06li9fTnl5OSkpKbzzzjsEBQURFhbG\n2LFjCQkJ4fHHHycpKYmHHnqI8+fPM3/+fFxdXbnpppt48cUXOX78OJMnT8bNzQ2r1crcuXPZvHkz\nP/74IykpKaSkpDRonP8JT09PIiMj2bZtG1u3buWf//wnVquVxx57jGHDhtnrfjr+X//618yePZua\nmhpKS0tJSUkhKCiowb87LCyM+fPnc/bsWVq1akV2djb33HMPrVu3JiYmhpSUFLZu3conn3xCRUUF\nM2fOZNu2bWzfvh0/Pz/Onj3Lc889x969e2nbti2BgYEsW7bskveSf1ffa/K1117D3d2d7777jqio\nKD766COOHDnCo48+SnR0NHv37r3kMfzb3/7Gxo0bsVqtTJgwgYkTJ7Jr1y4+/fRTZs2ahdVqpX37\n9rz88sscPHiQjIwMbDYbFouFuXPnEhAQUO/f55rfPFUfk8lEUVERCxYssL8Qf+rpp5+mc+fOjB8/\nniVLlnDXXXeRmZlJamoqKSkpnD59mlWrVrF+/XoWLVpEdXW1w/tt0aIFK1eupLCwkCeeeIIBAwaw\nYcOGS2q//vprzp49S7du3Rg5ciRr1qy5pKa4uJiXX36ZtLQ0fHx8KC4uJicnhxtuuIH8/Hyys7MZ\nPHgwhYWF3H///bz++uusWLGCv/71r5jNZu644w5yc3OpqakhJyeHwYMHs2nTJqZOncq6desIDAzk\n/PnzDRrjxZldYGAgHh4efPrppwD06tWLlStXUlVVRcuWLfnLX/5C586d2bdvH1lZWfj6+rJmzRoW\nLVrEiy++CEBFRQV//OMf7Z9s4cL5zkpLS9mwYQOrVq0iPz+f4uJievfuzYoVK9iwYQNr16611/v7\n+7NixQpiYmLsb9oN0b59exYvXszHH39MZGQkYWFh7Nixg6SkJJKTk1m9ejWjRo267HnXnn76ae66\n6y4iIyMB6NGjB6tWrWL06NFs2rSJ0NBQcnJyOHfuHGfOnGHPnj3YbDYKCgr49a9/zdSpU8nIyGD1\n6tW0b9+et956i927d9OrVy/+8pe/EBsbS1lZGc888wxt2rRplMC46Fe/+hXbtm3j+PHjvPnmm6xa\ntYolS5Zw5syZOsf/1VdfER8fz8qVKxk3bly9m5ivpEWLFgwePJj3338fuLCZKSoq6pK6wMBA1q5d\nS3V1Nf/zP//Dhg0bWLhwIadOnbqk9nLvJXD51+R3333HggULSElJYfHixbz00kssW7aMdevWYbPZ\n6nwMAby9vXnzzTftMyaA5ORkZs2aRVZWFvfeey9Hjx7lyy+/JD09nczMTIYMGcK2bdsu+/e55mca\nLVu2pKqqqtayiooKWrRowS233IKbmxtubm60bNmy3nV88cUXfPTRR/zjH/8A4Mcff+Rf//oXnTt3\nxsPDA7jwxubofk+cOMG5c+dITk4G4NixYzzxxBPccccddO3a1V6flZXF2bNnGTt2LAAff/wxhYWF\ntdbp6+uLr68vAKGhoXz44Yd88sknPPnkk+zatYtPPvmEWbNmUVNTw8qVK3nvvffw8vKyB0FERASZ\nmZlYrVbuvvtuPDw8SEtL4/XXX+ell16id+/eNOTb3j/++CM5OTmUlJSQmZlJeXm5fRv7xU/43t7e\ndO7c2f7/yspKvvjiC/bv38/BgwcBOH/+PCUlJQCXfJo6duwYvXv3BqBNmzbExcVRXl7OZ599xkcf\nfYSXl1etv/3FzSbXX389H3/8seGxXFRYWIiXlxdpaWkAfPbZZ4wbN46zZ8/a133nnXcyd+7cS+5b\n39/u4t+ibdu2nDt3jjvuuIOZM2eSl5fHkCFDePfdd/nnP/9J7969KSkp4eTJk8TFxQFw7tw57r77\nbv74xz+ybNkynnjiCcxm8yX70BpLUVERI0aMYMuWLcTExAAXHr9vv/22zvrrrruORYsW0bJlSywW\nC15eXv+SfZADAAAHNElEQVTx746IiOCll16ib9++nDlzhu7du19Sc/H5c/ToUW677TZcXV1xdXWl\nZ8+el9Re6b3kcq/JLl264O7ujtlspkOHDnh4eNCmTRsqKyvrfQw7duxY52zh9OnT9uPdLm5qKy4u\nZubMmbRu3ZoTJ05ccXZ2zc80br75Zg4fPszJkyeBCzuj9u3bh8ViwWQy1Xs/FxcXrFYrcOETx2OP\nPUZmZiZ//vOfeeCBB+jUqRNfffUV586do6amhsOHDzu03x49enD69GkmTZpEeXk5AP/1X/+Fr68v\n7u7u9vtXV1ezdetW1qxZw4oVK1ixYgVPPvkkb7zxxiXju2jw4MG88847eHl50a9fP/sZiNu2bcvr\nr79O7969efnllwkLC7O/mQUHB/PNN9+wYcMGwsPDAVi/fj3Tp09n9erVHD58mE8++cTwuLds2cLI\nkSPtM5r169eza9cuewDUJzAwkPvvv5/MzEyWLVtGWFgYPj4+l4zxYu1nn30GXNg/NHbsWDZt2oTZ\nbGbu3LmMGTOGc+fO2cd4ueeHEZ9//jkvvviiPYgCAgLw9vYmICCAI0eOALBv3z46depEixYt+P77\n76mpqeHMmTMcP37cPoaLz8O6enJxcaFnz54sX76ckJAQ7rjjDtLT0xkyZAi+vr5cf/31LFq0iMzM\nTPun9uzsbO644w5WrlxJWFiY/ZNxYx7SVV5eTlZWFmazmb59+5KZmcnKlSsZNmwYN910U63xXRz/\nzJkzmTBhAnPmzOGWW275Wf127doVi8XCqlWrGDlyZJ01F58/nTt35rPPPsNqtVJVVcWhQ4cuqb3c\nc+VKr8nL3be+x/Cn/f3Uddddx9dffw1c+BLG+++/z9SpU5k1axazZ8/muuuuu+Lf7ZqfaXh5eZGQ\nkMBTTz1Fy5Ytqa6uJiYmhg4dOrB79+567/erX/2K6upq0tPTefrpp5kyZQrr16+nvLyc8ePH4+fn\nx7hx44iKisLPz49WrVo5tN+L1xq5+O2Tli1bUlNTY9+cc9GOHTvo0aOH/Y0TLuyw+93vfsfdd99d\n5++8/vrrqays5K677qJNmza4ublx3333ATBgwABmzJjB1q1bMZvNuLq6UlVVhYeHByNGjGDbtm10\n6dIFuPBCjI6OxtPTk/bt23P77bcbHndWVhYvvfSS/XarVq0YMmRInZvffioqKoqkpCRGjx5NeXk5\n0dHR9e5gHDRoEHv27GHUqFHU1NTw7LPP4u/vzwsvvMCBAwfw8PCgY8eO9sD+uYYMGcLRo0cJDw+n\ndevW2Gw2/vu//xt/f39SU1Ox2Wy4uroya9Ys2rVrxz333EN4eDg33XST/fHu0KEDX3zxBX/961/r\n/T2hoaFMnjyZbt26ERISwubNm7nzzjtxcXFhypQpPPnkk9hsNjw9PXnppZewWCzEx8ezePFirFYr\nkydPBi58YJk4cSIvv/zyVRn/v/voo4+IiYnBxcWFmpoaYmNjCQ0NZfbs2URHR1NRUcHgwYNrzSB+\nOv4HHniA5557Dm9vb66//nr7Psb/1MiRI0lPT2fHjh2XrevatSv33nsvDz/8sP1DWkPOk/efvCYv\nqu8xLC4urrN++vTpJCYm4uLiQrt27Xjsscd44IEHeOSRR2jVqhVt27a94vNbR4SLwyxfvhwfHx/7\nTEOkOfr+++/Ztm0bjzzyCFVVVdx///2sXLkSf39/Z7fmENf8TEMcIyEhgZMnT7JkyRJntyLiUL6+\nvuTn5zNy5EhMJhMRERHNNjBAMw0REWmAa35HuIiIGKfQEBERwxQaIiJimEJD5Gc4ceIE48aNq/Nn\nAwcO5Pjx42RnZ/PKK680cmcijqEd4SIOMnDgQFatWsWNN97o7FZErhrNNEQMOn/+PElJSURGRjJo\n0CCeeOIJjh49ysCBA4ELXzN++umnGTZsGB988IH9fps2bSIhIQG4ECR//vOfCQ8P5/7777efCr2w\nsJDHH3+cBx98kFGjRtV5VLFIU6DQEDHok08+wd3dnXXr1vH+++9TWVnJhx9+WKvGx8eHf/zjH/Yg\nqYuPjw8bNmwgKiqKpUuXAhAfH8+kSZN46623SE1Nddr5nkSuRAf3iRh055134uPjw5o1a/jf//1f\nvv76ayoqKmrVGDkxZb9+/YALJ6J77733sFgs5Ofn20/XARdOQllaWmo/aaRIU6HQEDEoOzubV199\nlUcffZSHHnqI0tLSS478vdzZkC9q0aIF8P8norNarXh4ePD222/ba7777rta5yISaSq0eUrEoD17\n9jBs2DBGjhxJ27Zt2bdvHzU1NT97vWazmU6dOtlDY9euXTzyyCM/e70ijqCZhohBERERTJw4kW3b\ntuHh4UHv3r2vynXfAdLT00lJSWH58uW4u7szf/78n33qdRFH0FduRUTEMG2eEhERwxQaIiJimEJD\nREQMU2iIiIhhCg0RETFMoSEiIoYpNERExDCFhoiIGPZ/PcApL4HVggcAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.countplot(df['airline'], order=df['airline'].value_counts().index);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Challenge\n", "\n", "- How many tweets are in the dataset?\n", "- How many tweets are positive, neutral and negative?\n", "- What **proportion** of tweets are positive, neutral and negative?\n", "- Visualize these last two questions." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Length is 14640\n" ] } ], "source": [ "# solution\n", "print(\"Length is\", len(df))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "negative 9178\n", "neutral 3099\n", "positive 2363\n", "Name: airline_sentiment, dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solution\n", "df['airline_sentiment'].value_counts()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "negative 0.626913\n", "neutral 0.211680\n", "positive 0.161407\n", "Name: airline_sentiment, dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solution\n", "df['airline_sentiment'].value_counts(normalize=True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEFCAYAAAD5bXAgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGadJREFUeJzt3X1UVHXix/HPMAO6wCCQaHKsVXw2U3NRj4Xu8Sm02s6e\nVsvopP72nGpdXWU1w3wCAkVD0Yz1abXV1dDU0B63U2BJVgK5WivVrpnShqkYqDCYAjO/P/w5v8gv\nOpbDCL5ff8Wd79z53pmcN3cuc6/F5XK5BADAj/j5egIAgOsTgQAAGBEIAIARgQAAGBEIAICRzdcT\nuFZKSyt8PQUAaHQiIuz13sYeBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADA\niEAAAIyazKk2ADQOhdMm+3oKTV7fxcuuyXrYgwAAGBEIAIARgQAAGBEIAIARgQAAGBEIAIARgQAA\nGBEIAIARgQAAGBEIAIARgQAAGBEIAIARgQAAGBEIAIARgQAAGBEIAIARgQAAGBEIAIARgQAAGBEI\nAIARgQAAGBEIAIARgQAAGNm8teLq6mrNmDFDJSUl8vPzU0pKimw2m2bMmCGLxaJOnTopMTFRfn5+\nyszM1HvvvSebzaaZM2eqZ8+eKi4uNo4FADQMr73j7tq1SzU1Ndq8ebMmTpyopUuXKi0tTfHx8crK\nypLL5VJubq6KiopUUFCgrVu3KiMjQ8nJyZJkHAsAaDhe24No3769amtr5XQ6VVlZKZvNpv3796tf\nv36SpEGDBumDDz5Q+/btFRMTI4vFosjISNXW1qqsrExFRUWXjB0+fHi9jxcWFiibzeqtzQGARiMi\nwn5N1uO1QAQGBqqkpEQjR45UeXm5Vq5cqcLCQlksFklSUFCQKioqVFlZqdDQUPf9Li53uVyXjL2c\n8vIqb20KADQqpaWXf7/8ocvFxGuBWLdunWJiYjRt2jR9++23GjdunKqrq923OxwOhYSEKDg4WA6H\no85yu91e53jDxbEAgIbjtWMQISEhstsvlKlFixaqqalR9+7dlZ+fL0nKy8tTdHS0+vTpo927d8vp\ndOro0aNyOp0KDw83jgUANByLy+VyeWPFDodDM2fOVGlpqaqrqzV27Fj16NFDc+bMUXV1taKiopSa\nmiqr1arnn39eeXl5cjqdevrppxUdHa3Dhw8bx9bnanapAPhO4bTJvp5Ck9d38TKPx17uIyavBaKh\nEQigcSAQ3netAsEXCwAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBE\nIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAA\nRgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQC\nAGBEIAAARjZvrnzVqlXauXOnqqur9fDDD6tfv36aMWOGLBaLOnXqpMTERPn5+SkzM1PvvfeebDab\nZs6cqZ49e6q4uNg4FgDQMLz2jpufn699+/Zp06ZN2rBhg44dO6a0tDTFx8crKytLLpdLubm5Kioq\nUkFBgbZu3aqMjAwlJydLknEsAKDheG0PYvfu3ercubMmTpyoyspKPfXUU9qyZYv69esnSRo0aJA+\n+OADtW/fXjExMbJYLIqMjFRtba3KyspUVFR0ydjhw4fX+3hhYYGy2aze2hwAaDQiIuzXZD1eC0R5\nebmOHj2qlStX6ptvvtGECRPkcrlksVgkSUFBQaqoqFBlZaVCQ0Pd97u43DT28o9X5a1NAYBGpbT0\n8u+XP3S5mHgtEKGhoYqKilJAQICioqLUrFkzHTt2zH27w+FQSEiIgoOD5XA46iy32+11jjdcHAsA\naDheOwbxq1/9Su+//75cLpeOHz+us2fPasCAAcrPz5ck5eXlKTo6Wn369NHu3bvldDp19OhROZ1O\nhYeHq3v37peMBQA0HK/tQQwePFiFhYUaNWqUXC6X5s6dq7Zt22rOnDnKyMhQVFSUYmNjZbVaFR0d\nrYceekhOp1Nz586VJCUkJFwyFgDQcCwul8vl60lcC1fzmRsA3ymcNtnXU2jy+i5e5vHYyx2D4IsF\nAAAjAgEAMCIQAAAjAgEAMCIQAAAjAgEAMCIQAAAjAgEAMCIQAAAjAgEAMPIoECkpKZcsS0hIuOaT\nAQBcPy57sr5Zs2bpv//9rw4cOKCDBw+6l9fU1Fzx+gwAgMbtsoGYMGGCSkpKNG/ePE2aNMm93Gq1\nqkOHDl6fHADAdy4biLZt26pt27Z69dVXVVlZ6b7SmyRVVVXVuRIcAKBp8eh6EKtWrdKqVavqBMFi\nsSg3N9drEwMA+JZHgdi6datycnIUHh7u7fkAAK4THv0VU5s2bdSiRQtvzwUAcB3xaA+iXbt2iouL\nU//+/RUQEOBe/sMD1wCApsWjQLRu3VqtW7f29lwAANcRjwLBngIA3Hg8CkTXrl1lsVjqLGvVqpV2\n7drllUkBAHzPo0B88cUX7v+urq5WTk6O9u/f77VJAQB876pP1ufv76+RI0dqz5493pgPAOA64dEe\nxI4dO9z/7XK5dPDgQfn7+3ttUgAA3/MoEPn5+XV+DgsL05IlS7wyIQDA9cGjQKSlpam6ulqHDx9W\nbW2tOnXqJJvNo7sCABopj97lDxw4oMmTJys0NFROp1MnT57UX/7yF/Xq1cvb8wMA+IhHgUhNTdWS\nJUvcQdi/f79SUlK0bds2r04OAOA7Hv0VU1VVVZ29hd69e+vcuXNemxQAwPc8CkSLFi2Uk5Pj/jkn\nJ4drQQBAE+fRR0wpKSl64oknNGvWLPeyzZs3e21SAADf82gPIi8vT7/4xS/07rvvav369QoPD1dB\nQYG35wYA8CGPArFlyxZt2rRJgYGB6tq1q7Kzs7Vx40Zvzw0A4EMeBaK6urrON6f5FjUANH0eHYMY\nNmyYxo0bp5EjR0qS3n77bQ0dOtSrEwMA+JZHgZg+fbreeustFRYWymazaezYsRo2bJi35wYA8CGP\nz5cxYsQIjRgxwptzAQBcR676dN9X47vvvtOvf/1rHTp0SMXFxXr44YcVFxenxMREOZ1OSVJmZqZG\njRqlMWPG6NNPP5WkescCABqO1wJRXV2tuXPnqnnz5pIunPAvPj5eWVlZcrlcys3NVVFRkQoKCrR1\n61ZlZGQoOTm53rEAgIbltUAsXLhQY8aMUatWrSRJRUVF6tevnyRp0KBB+vDDD7V3717FxMTIYrEo\nMjJStbW1KisrM44FADQsr5yzOzs7W+Hh4Ro4cKBWr14t6cKFhi5e1zooKEgVFRWqrKysc8qOi8tN\nY68kLCxQNpvVC1sDAI1LRIT9mqzHK4F4+eWXZbFY9NFHH+nzzz9XQkKCysrK3Lc7HA6FhIQoODhY\nDoejznK73S4/P79Lxl5JeXnVtd0IAGikSkuv/Ev1RZeLiVc+YnrxxRe1ceNGbdiwQd26ddPChQs1\naNAg95Xp8vLyFB0drT59+mj37t1yOp06evSonE6nwsPD1b1790vGAgAaVoNdFi4hIUFz5sxRRkaG\noqKiFBsbK6vVqujoaD300ENyOp2aO3duvWMBAA3L4nK5XL6exLVwNbtUAHyncNpkX0+hyeu7eJnH\nYxv8IyYAQONHIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAARgQCAGBEIAAA\nRgQCAGBEIAAARgQCAGDUYBcMAq6V6a/P9vUUmrz0+1J9PQVcB9iDAAAYEQgAgBGBAAAYEQgAgBGB\nAAAYEQgAgBGBAAAYEQgAgBGBAAAYEQgAgBGBAAAYEQgAgBGBAAAYEQgAgBGBAAAY3ZDXg5iS/qqv\np3BDeG76/b6eAoCfgT0IAIARgQAAGBEIAIARgQAAGBEIAICRV/6Kqbq6WjNnzlRJSYnOnz+vCRMm\nqGPHjpoxY4YsFos6deqkxMRE+fn5KTMzU++9955sNptmzpypnj17qri42DgWANBwvPKu++qrryo0\nNFRZWVlas2aNUlJSlJaWpvj4eGVlZcnlcik3N1dFRUUqKCjQ1q1blZGRoeTkZEkyjgUANCyvBGLE\niBGaMmWKJMnlcslqtaqoqEj9+vWTJA0aNEgffvih9u7dq5iYGFksFkVGRqq2tlZlZWXGsQCAhuWV\nj5iCgoIkSZWVlZo8ebLi4+O1cOFCWSwW9+0VFRWqrKxUaGhonftVVFTI5XJdMvZKwsICZbNZvbA1\n+KkiIuy+ngJ+Il67xu1avX5e+yb1t99+q4kTJyouLk6/+c1vlJ6e7r7N4XAoJCREwcHBcjgcdZbb\n7fY6xxsujr2S8vKqa7sB+NlKS68cdlyfeO0at6t5/S4XE698xHTy5En9/ve/1/Tp0zVq1ChJUvfu\n3ZWfny9JysvLU3R0tPr06aPdu3fL6XTq6NGjcjqdCg8PN44FADQsr+xBrFy5UmfOnNHy5cu1fPly\nSdKsWbOUmpqqjIwMRUVFKTY2VlarVdHR0XrooYfkdDo1d+5cSVJCQoLmzJlTZywAoGFZXC6Xy9eT\nuBauZpeKk/U1DG+drG/667O9sl78v/T7Ur227sJpk722blzQd/Eyj8c2+EdMAIDGj0AAAIwIBADA\niEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAA\nAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwI\nBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAiEAAAIwIBADAyObrCdTH6XQqKSlJ//73vxUQ\nEKDU1FT98pe/9PW0AOCGcd3uQeTk5Oj8+fN66aWXNG3aNC1YsMDXUwKAG8p1G4i9e/dq4MCBkqTe\nvXvrwIEDPp4RANxYrtuPmCorKxUcHOz+2Wq1qqamRjabecoREXaP15317CM/e37wnXX/85yvp4Cf\n4Z6//83XU4CHrts9iODgYDkcDvfPTqez3jgAAK696zYQffr0UV5eniRp//796ty5s49nBAA3FovL\n5XL5ehImF/+K6T//+Y9cLpfmz5+vDh06+HpaAHDDuG4DAQDwrev2IyYAgG8RCACAEYEAABgRiEag\ntLRUSUlJkqTCwkJ98cUXkqRJkyb5cFa4WkePHtXOnTs9Hv/oo4/q0KFDXpwRfop33nlHx48fr/Pv\nsqkiEI1ARESE+3/El19+WSdOnJAkZWZm+nBWuFp79uzRP//5T19PAz/T3//+d1VWVtb5d9lU8c2z\nBpKdna2cnBw5HA6Vl5dr4sSJCg4O1tKlS9WsWTOFhoZq/vz5qqmpUXx8vFwul86dO6fk5GTZ7XZN\nnTpVc+fO1fvvv6+ioiJ17NhRo0eP1muvvaZHHnlEb775piwWi5555hkNGDBAt956q1JTUyXJvW67\n3fNvm+NS2dnZ2rVrl77//nt9/fXXeuyxx3Tbbbdd8jx/9tln2rx5s5YsWSJJuuuuu5SXl6fVq1fr\n+++/1x133KF169YpPDxcp0+f1vPPP6/Zs2eroqJCJ06cUFxcnOLi4ny5qU2Cp69XcHCwkpOTdeDA\nAbVs2VIlJSVasWKFqqqqtGDBAtXW1qq8vFxJSUk6c+aMPv/8cyUkJCg9PV0JCQl65plnNG/ePG3Y\nsEGS9MQTT2jKlCmqrKzUkiVLZLVadcstt+iZZ56Rv7+/L5+Sq0YgGtDZs2f1t7/9TWVlZRo9erQs\nFos2bdqk1q1ba/369VqxYoX69++v0NBQPfvss/ryyy9VVVXlfmPv0aOHBg4cqHvuuUeRkZGSpPDw\ncHXp0kUff/yxevXqpfz8fM2cOVNxcXGaP3++OnbsqK1bt2rNmjX685//7MvNbxIqKyu1du1aHTly\nRH/4wx8UEhJyyfN85513XnI/q9Wqxx9/XF999ZWGDh2qdevW6b777tPw4cNVVFSke++9V3fffbeO\nHz+uRx99lEBcI568XrfffrtOnTqlbdu2qaysTHfffbck6csvv1RCQoK6dOmi1157TdnZ2UpNTVW3\nbt2UlJTkfrPv2rWrzp8/r5KSEvn7+6u8vFzdunXTiBEjlJWVpZtuuklLly7V9u3b9eCDD/ry6bhq\nBKIB9e3bV35+fmrZsqUCAwNVU1Oj1q1bu2/LyMjQ9OnTdeTIEf3xj3+UzWbThAkTrrjeBx98UNu3\nb1dpaamGDBkim82mQ4cOKTk5WZJUXV2tdu3aeXPTbhhdu3aVJLVp00bnz5/36Hmu76tG7du3lyS1\nbNlS69ev19tvv63g4GDV1NR4Z/I3IE9er6CgIPXu3VvShV+4oqKiJEmtWrXS8uXL1bx5czkcjjrn\nhvuxUaNGaceOHQoICNADDzygsrIynThxQvHx8ZKk77//3viLw/WOQDSgoqIiSdLJkyd19uxZSdKJ\nEyfUqlUrFRQUqF27dsrPz1erVq30wgsvaN++fcrIyFBaWpp7HRaL5ZI3nAEDBig9PV3Hjx9XYmKi\npAtvPgsXLlRkZKT27t2r0tLSBtrKps1isdT52fQ8N2vWzP18l5SU6PTp05IkPz8/OZ3OS9b1wgsv\nqHfv3oqLi9OePXu0a9euBtqaps/T1+uVV16RJJ0+fVpHjhyRJM2bN0+LFi1Shw4dtGzZMpWUlLjX\n+eN/g/fcc4/Gjx8vPz8/rV27VoGBgbr55pu1fPly2e125ebmKjAw0PsbfI0RiAZ08uRJjRs3ThUV\nFUpKSpLNZtOf/vQnWSwWtWjRQmlpabJYLJo6dao2bdqkmpoaTZw4sc46evXqpUWLFqlt27buZRaL\nRbGxsfrwww916623SpKSkpKUkJCgmpoaWSwWzZs3r0G39UZhep5vueUW2e12jR49Wh06dHC/Vp07\nd9aKFSt022231VnH4MGDlZqaqjfffFN2u11Wq1Xnz5/3xeY0eabXq127dsrLy9OYMWPUsmVLNW/e\nXP7+/rr//vs1ZcoUhYSE6Oabb1Z5ebkk6Y477tBTTz2llJQU93qDgoLUtWtX1dTUuPc0Zs2apccf\nf1wul0tBQUF69tlnfbLNPwen2mgg2dnZ+uqrr/Tkk0/6eioAfuDQoUP64osvdO+996q8vFz33Xef\n3n33XQUEBPh6aj7HHgSAG1qbNm20aNEirV+/XrW1tXryySeJw/9hDwIAYMQX5QAARgQCAGBEIAAA\nRgQCAGBEINDkHD9+XI899pjxtiFDhuibb75Rbm6unnvuuQaemWeWLVumjz/+WNKFv6X/17/+5bXH\neumll/T66697bf1o3AgEmpzWrVvrr3/962XHDB06VFOmTGmgGV2dwsJC1dbWSrrwbd7bb7/da4+1\nb98+vpSHevE9CDRqNTU1SkpK0sGDB3Xy5Em1b99eTz/9tB577DHt3LlTM2bM0KlTp1RcXKzp06e7\n75edna2CggItWLBAQ4YM0f3336/du3fr7NmzWrhwoXr06KHi4mIlJSXp1KlTat68uebMmaPu3bvX\nO5ePPvpI6enpkqQWLVpo8eLFCg8P144dO7R+/Xo5nU7ddtttSkxMVLNmzRQTE6PY2Fjt3btXVqtV\nS5cu1d69e3XgwAHNnj1bmZmZSk1NdV/3Y+XKlXK5XPr6668VGxsru92unJwcSdLq1avVsmVL5eXl\nadmyZaqpqVHbtm2VkpKisLAw4zaeOXNGO3fu1J49exQREaGBAwd68ZVCY8QeBBq1ffv2yd/fXy+9\n9JLeeecdnTt37pJzGYWGhuof//iHhgwZUu96QkNDtW3bNo0ZM0arVq2SJCUkJGj69Onavn27UlJS\nrng23OXLlyspKUnZ2dkaPHiwPvvsMx08eFBbtmzR5s2b9corr+imm27S2rVrJV24ENSAAQO0Y8cO\n9e3bVy+++KJ++9vfqkePHkpNTVWXLl3qrP+TTz5RWlqa3njjDW3evFnh4eHKzs5Wly5d9MYbb6is\nrEyLFy/W2rVrtWPHDsXExGjRokX1buOdd96pIUOGaPLkycQBRuxBoFHr27evQkND9eKLL+qrr77S\nkSNHVFVVVWdMz549r7iei2+QnTp10ttvvy2Hw6EDBw7o6aefdo+pqqpSeXm5wsLCjOsYOnSoJk2a\npGHDhmno0KG66667tHHjRhUXF7tP81xdXV1nL+SHj3vxuEN9OnfurDZt2kiSwsLCNGDAAElSZGSk\nzpw5o08++UTffvutxo4dK0lyOp1q0aJFvdsIXAmBQKOWm5urZcuWaezYsXrggQdUXl7uvlbGRc2b\nN7/iepo1aybp/8/+6XQ6FRAQ4D7LpyQdO3ZMoaGh9a5j/PjxGjx4sN59912lp6fr008/VWBgoEaO\nHKnZs2dLkhwOh/v4wo8f90onNfjxxWasVmudn2tra9WnTx+tXLlSknTu3Dk5HI56txG4Ej5iQqP2\n0UcfaeTIkfrd736nli1b1jnA+3PY7Xa1a9fOHYgPPvhAjzzyyGXvM3r0aDkcDo0fP17jx4/XZ599\npv79++udd97Rd999J5fLpaSkJK1fv/6y67FarT9pG3r16qX9+/fr8OHDki585HWlM4j+1MfCjYE9\nCDRqo0eP1pNPPqm33npLAQEB6t27t/Lz86/JutPT05WUlKQ1a9bI399fS5Ysuexv31OnTtWMGTNk\ns9nUrFkzJScnq3Pnzpo0aZLGjRsnp9Opbt266fHHH7/s4w4cOFCJiYlauHDhVc03IiJC8+fPV3x8\nvJxOp1q3bu0+aF6fO++8UxkZGbLb7RoxYsRVPR6aPk7WBwAwYg8CuArr1q3T9u3bL1neqlWrK373\nAmhs2IMAABhxkBoAYEQgAABGBAIAYEQgAABG/wtNj6MIAo3yFwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# solution\n", "sns.countplot(df['airline_sentiment'], order=['positive', 'neutral', 'negative']);" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD3CAYAAAAALt/WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFDlJREFUeJzt3X90U/X9x/FXmtuibeJo1yDqDDBYqKvO0LMfOk63M3sq\nDjzzHCdtV60yD+om7Oiog+9xO6V0pVbgqxwZdGcHLcxtWFTAuXF2tg4OZd0Bt2K2lQk9wqHb6c6B\ncpphk0pjyP3+wb75fsuPBkqT9FOej78IN7n3nXzCszc5pHHYtm0LAGCsjHQPAAC4MoQcAAxHyAHA\ncIQcAAxHyAHAcFaqD9jb25/qQ6ZUbm62gsGBdI+BEWDtzDbe18/jcV90G2fko8yynOkeASPE2pnt\nal4/Qg4AhiPkAGA4Qg4AhiPkAGA4Qg4AhiPkAGA4Qg4AhiPkAGA4Qg4Ahkv5R/QBXB26Fi5I7fFS\nejTJt3FTio94cZyRA4DhCDkAGI6QA4DhCDkAGI6QA4DhEv6vlVgsptraWh0+fFhZWVmqr6/XlClT\n4tv37Nmj9evXy7ZtFRYWavny5XI4HEkdGgDwfxKekbe2tioSiailpUXV1dVqbGyMbwuFQlq9erV+\n/OMf6/XXX9dNN92kYDCY1IEBAEMlDHlHR4eKi4slSX6/X52dnfFt7777rnw+n55//nlVVlYqPz9f\neXl5yZsWAHCehG+thEIhuVyu+GWn06loNCrLshQMBrV//37t2LFD2dnZevDBB+X3+zVt2rSL7i83\nN3vcfyXTcN+th7GNtRs9qf6ATqqNpedKwpC7XC6Fw+H45VgsJss6e7OJEyfqtttuk8fjkSR99rOf\n1XvvvTdsyMfzl6NKZxd3vH/B9HjF2uFypPq5ckVfvlxUVKS2tjZJUiAQkM/ni28rLCxUV1eX+vr6\nFI1G9Ze//EUzZswYhZEBAJcq4Rl5aWmp2tvbVVFRIdu21dDQoObmZnm9XpWUlKi6uloLFy6UJN1z\nzz1DQg8ASD6Hbdt2Kg843l+68vLcXKzd6Er1L81KtVT/0qwremsFADC2EXIAMBwhBwDDEXIAMBwh\nBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDD\nEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDWYmuEIvFVFtbq8OHDysrK0v19fWa\nMmVKfHt9fb0OHDignJwcSdKGDRvkdruTNzEAYIiEIW9tbVUkElFLS4sCgYAaGxvV1NQU337w4EFt\n3LhReXl5SR0UAHBhCd9a6ejoUHFxsSTJ7/ers7Mzvi0Wi6m7u1s1NTWqqKjQG2+8kbxJAQAXlPCM\nPBQKyeVyxS87nU5Fo1FZlqWBgQE99NBD+uY3v6kzZ87o4Ycf1q233qqCgoKL7i83N1uW5Ryd6cco\nj4e3lkzF2o2ernQPkGRj6bmSMOQul0vhcDh+ORaLybLO3uzaa6/Vww8/rGuvvVaSdMcdd+jQoUPD\nhjwYHLjSmcc0j8et3t7+dI+BEWDtcDlS/VwZ7gdHwrdWioqK1NbWJkkKBALy+XzxbceOHdM3vvEN\nnTlzRh999JEOHDigwsLCURgZAHCpEp6Rl5aWqr29XRUVFbJtWw0NDWpubpbX61VJSYnuu+8+lZWV\nKTMzU/fdd58+9alPpWJuAMB/OGzbtlN5wPH+0pWX5+Zi7UZX18IF6R4hqXwbN6X0eFf01goAYGwj\n5ABgOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIYj5ABg\nOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIYj5ABgOEIOAIZLGPJYLKaa\nmhqVl5erqqpK3d3dF7zOwoULtWXLlqQMCQC4uIQhb21tVSQSUUtLi6qrq9XY2HjeddauXasPPvgg\nKQMCAIaXMOQdHR0qLi6WJPn9fnV2dg7Z/pvf/EYOhyN+HQBAalmJrhAKheRyueKXnU6notGoLMtS\nV1eXfvWrX+mll17S+vXrL+mAubnZsiznyCc2gMfjTvcIGCHWbvR0pXuAJBtLz5WEIXe5XAqHw/HL\nsVhMlnX2Zjt27NDx48f1yCOPqKenR5mZmbrpppv0pS996aL7CwYHRmHsscvjcau3tz/dY2AEWDtc\njlQ/V4b7wZEw5EVFRdq9e7fmzp2rQCAgn88X37Z06dL4n9etW6f8/PxhIw4AGH0JQ15aWqr29nZV\nVFTItm01NDSoublZXq9XJSUlqZgRADAMh23bdioPON5fuvLy3Fys3ejqWrgg3SMklW/jppQeb7i3\nVvhAEAAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEI\nOQAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEIOQAYjpADgOEIOQAY\nLmHIY7GYampqVF5erqqqKnV3dw/Z/vOf/1xf//rX9cADD2jnzp1JGxQAcGFWoiu0trYqEomopaVF\ngUBAjY2NampqkiT19fVpy5Yt2r59uwYHBzVv3jx99atflcPhSPrgAICzEp6Rd3R0qLi4WJLk9/vV\n2dkZ35aXl6cdO3YoMzNTJ0+e1IQJE4g4AKRYwjPyUCgkl8sVv+x0OhWNRmVZZ29qWZZ+9rOfad26\ndaqqqkp4wNzcbFmW8wpGHvs8Hne6R8AIsXajpyvdAyTZWHquJAy5y+VSOByOX47FYvGI/6+HHnpI\nZWVleuyxx7Rv3z7dcccdF91fMDhwBeOOfR6PW729/ekeAyPA2uFypPq5MtwPjoRvrRQVFamtrU2S\nFAgE5PP54tuOHj2qxYsXy7ZtZWZmKisrSxkZ/EcYAEilhGfkpaWlam9vV0VFhWzbVkNDg5qbm+X1\nelVSUqKCggKVl5fL4XCouLhYn//851MxNwDgPxy2bdupPOB4f+nKy3NzsXajq2vhgnSPkFS+jZtS\nerwremsFADC2EXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIA\nMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMJyV\n7gGA4SzatTTdIyTN+rtWpXsEjBMJQx6LxVRbW6vDhw8rKytL9fX1mjJlSnz7pk2b9Otf/1qS9OUv\nf1mLFy9O3rQAgPMkfGultbVVkUhELS0tqq6uVmNjY3zbP//5T/3yl7/Ua6+9pq1bt+oPf/iDDh06\nlNSBAQBDJTwj7+joUHFxsSTJ7/ers7Mzvm3y5MnauHGjnE6nJCkajWrChAlJGhUAcCEJQx4KheRy\nueKXnU6notGoLMtSZmam8vLyZNu2Vq1apU9/+tOaNm3asPvLzc2WZTmvfPIxzONxp3sEGGC8P0+6\n0j1Ako2l9UsYcpfLpXA4HL8ci8VkWf93s8HBQT377LPKycnR8uXLEx4wGBwY4ahm8Hjc6u3tT/cY\nMADPE7Olev2G+8GR8D3yoqIitbW1SZICgYB8Pl98m23bevLJJzVz5kzV1dXF32IBAKROwjPy0tJS\ntbe3q6KiQrZtq6GhQc3NzfJ6vYrFYnrnnXcUiUS0d+9eSdKSJUs0a9aspA8OADgrYcgzMjJUV1c3\n5O+mT58e//Pf/va30Z8KAHDJ+GQnABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABhu3P8+8kcb\nd6V7hKR65b/uSvcIANKMM3IAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwh\nBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMFzCkMdiMdXU1Ki8vFxVVVXq7u4+7zp9\nfX2aM2eOBgcHkzIkAODiEoa8tbVVkUhELS0tqq6uVmNj45Dte/fu1aOPPqre3t6kDQkAuLiEIe/o\n6FBxcbEkye/3q7Ozc+gOMjLU3NysiRMnJmdCAMCwEn75cigUksvlil92Op2KRqOyrLM3nT179mUd\nMDc3W5blvMwxcTEejzvdI2CExvvadaV7gCQbS+uXMOQul0vhcDh+ORaLxSM+EsHgwIhvi/P19van\newSMEGtntlSv33A/OBK+tVJUVKS2tjZJUiAQkM/nG73JAABXLOGpdWlpqdrb21VRUSHbttXQ0KDm\n5mZ5vV6VlJSkYkYAwDAShjwjI0N1dXVD/m769OnnXW/Xrl2jNxUA4JLxgSAAMBwhBwDDEXIAMBwh\nBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDD\nEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMBwhBwDDEXIAMFzCkMdiMdXU1Ki8vFxVVVXq\n7u4esn3r1q26//77VVZWpt27dydtUADAhVmJrtDa2qpIJKKWlhYFAgE1NjaqqalJktTb26tXX31V\nb775pgYHB1VZWanZs2crKysr6YMDAM5KeEbe0dGh4uJiSZLf71dnZ2d821//+lfNmjVLWVlZcrvd\n8nq9OnToUPKmBQCcJ+EZeSgUksvlil92Op2KRqOyLEuhUEhutzu+LScnR6FQaNj9eTzuYbePtrf/\n+76UHg+ja2t5U7pHwAh53noz3SNcNRKekbtcLoXD4fjlWCwmy7IuuC0cDg8JOwAg+RKGvKioSG1t\nbZKkQCAgn88X3/aZz3xGHR0dGhwcVH9/v44cOTJkOwAg+Ry2bdvDXSEWi6m2tlZdXV2ybVsNDQ1q\na2uT1+tVSUmJtm7dqpaWFtm2rSeeeEJz5sxJ1ewAAF1CyAEAYxsfCAIAwxFyADAcIQcAwxHyUdLb\n26va2lpJ0p/+9Kf4B6MWL16cxqkwEv/617+0a9euS75+VVWVjhw5ksSJMBK/+93vdPz48SH/Nscr\nQj5KPB5P/Mny5ptv6sSJE5KkH/3oR2mcCiOxb98+HThwIN1j4Ar99Kc/VSgUGvJvc7xK+MnOq8m2\nbdvU2tqqcDisYDCoRYsWyeVyae3atZowYYImTpyohoYGRaNRPf3007JtW4ODg1qxYoXcbreWLFmi\nmpoa7d27VwcPHtSMGTM0f/58vf3223rwwQe1c+dOORwO1dXV6c4775TX61V9fb0kxffNB6qu3LZt\n27Rnzx6dPn1a//jHP/TYY4+psLDwvMf673//u1577TW9+OKLkqTZs2erra1NP/nJT3T69GnNmjVL\nmzZtUl5enk6dOqV169bpBz/4gfr7+3XixAlVVlaqsrIynXd1XLjU9XK5XFqxYoU6OzuVn5+vnp4e\nNTU1aWBgQI2NjTpz5oyCwaBqa2v1wQcf6L333tOyZcu0evVqLVu2THV1dVq5cqVeffVVSdITTzyh\np556SqFQSC+++KKcTqduvvlm1dXVKTMzM50PyWUj5Of48MMP1dzcrL6+Ps2fP18Oh0NbtmzR9ddf\nr82bN6upqUlf+MIXNHHiRK1atUrvv/++BgYG4gG+9dZbVVxcrLlz5+rGG2+UJOXl5WnmzJn685//\nrNtvv1379+/Xs88+q8rKSjU0NGjGjBl6/fXXtXHjRn33u99N590fN0KhkF5++WUdO3ZM3/rWt3Td\ndded91h/8YtfPO92TqdTjz/+uI4ePaqSkhJt2rRJ9957r0pLS3Xw4EHNmzdPd999t44fP66qqipC\nPkouZb1uu+02/fvf/9Ybb7yhvr4+3X333ZKk999/X8uWLdPMmTP19ttva9u2baqvr9ctt9yi2tra\neJQLCgoUiUTU09OjzMxMBYNB3XLLLbrnnnv0i1/8Qh//+Me1du1abd++XWVlZel8OC4bIT/H5z73\nOWVkZCg/P1/Z2dmKRqO6/vrr49teeOEFfe9739OxY8f05JNPyrIsffvb306437KyMm3fvl29vb26\n6667ZFmWjhw5ohUrVkiSPvroI02dOjWZd+2qUlBQIEm64YYbFIlELumxvthHKqZNmyZJys/P1+bN\nm/Xb3/5WLpdL0Wg0OcNfhS5lvXJycuT3+yWdPTn65Cc/KUmaNGmSNmzYoGuuuUbhcHjI74Y61wMP\nPKAdO3YoKytL999/v/r6+nTixAk9/fTTkqTTp09f8Af8WEfIz3Hw4EFJ0smTJ/Xhhx9Kkk6cOKFJ\nkybpnXfe0dSpU7V//35NmjRJr7zyit5991298MILeu655+L7cDgc50Xhzjvv1OrVq3X8+HEtX75c\n0tlAPP/887rxxhvV0dGh3t7eFN3L8c/hcAy5fKHHesKECfHHvKenR6dOnZIkZWRkKBaLnbevV155\nRX6/X5WVldq3b5/27NmTonsz/l3qer311luSpFOnTunYsWOSpJUrV2rNmjWaPn26XnrpJfX09MT3\nee6/w7lz52rBggXKyMjQyy+/rOzsbE2ePFkbNmyQ2+3W73//e2VnZyf/Do8yQn6OkydP6pFHHlF/\nf79qa2tlWZa+853vyOFw6GMf+5iee+45ORwOLVmyRFu2bFE0GtWiRYuG7OP222/XmjVr9IlPfCL+\ndw6HQ3PmzNEf//hHeb1eSVJtba2WLVumaDQqh8OhlStXpvS+Xk0u9FjffPPNcrvdmj9/vqZPnx5f\nL5/Pp6amJhUWFg7Zx1e+8hXV19dr586dcrvdcjqdikQi6bg7496F1mvq1Klqa2tTRUWF8vPzdc01\n1ygzM1Nf+9rX9NRTT+m6667T5MmTFQwGJUmzZs3S0qVL9cMf/jC+35ycHBUUFCgajcbP3L///e/r\n8ccfl23bysnJ0apVq9Jyn68EH9H/f7Zt26ajR4/qmWeeSfcoAM5x5MgRHTp0SPPmzVMwGNS9996r\n3bt380U24owcgCFuuOEGrVmzRps3b9aZM2f0zDPPEPH/4IwcAAzHB4IAwHCEHAAMR8gBwHCEHAAM\nR8gBwHD/A0Kx+c95v6tVAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# solution\n", "df['airline_sentiment'].value_counts(normalize=True, ascending=True).plot(kind='bar', rot=0);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extra challenge\n", "\n", "- When did the tweets come from?\n", "- Who gets more retweets: positive, negative or neutral tweets?\n", "- What are the three main reasons why people are tweeting negatively? What could airline companies do to improve this?\n", "- What's the distribution of time zones in which people are tweeting?\n", "- Is this distribution consistent depending on what airlines they're tweeting about?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**All the tweets in this dataset came from the third week of February 2015.**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Year:\n", "\n", "2015 14640\n", "Name: tweet_created, dtype: int64\n", "Month:\n", "\n", "2 14640\n", "Name: tweet_created, dtype: int64\n", "Day:\n", "\n", "23 3515\n", "22 2392\n", "24 2136\n", "20 1512\n", "21 1418\n", "18 1416\n", "19 1298\n", "17 953\n", "Name: tweet_created, dtype: int64\n" ] } ], "source": [ "# solution\n", "dates = pd.to_datetime(df['tweet_created'])\n", "print(\"Year:\\n\")\n", "print(dates.dt.year.value_counts())\n", "print(\"Month:\\n\")\n", "print(dates.dt.month.value_counts())\n", "print(\"Day:\\n\")\n", "print(dates.dt.day.value_counts())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We don't see any interesting evidence of tweets of different classes getting more or less retweets. The vast majority of tweets from all three classes get no retweets." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
airline_sentiment
negative9178.00.0933750.7928650.00.00.00.044.0
neutral3099.00.0609870.6580370.00.00.00.028.0
positive2363.00.0694030.6599140.00.00.00.022.0
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "airline_sentiment \n", "negative 9178.0 0.093375 0.792865 0.0 0.0 0.0 0.0 44.0\n", "neutral 3099.0 0.060987 0.658037 0.0 0.0 0.0 0.0 28.0\n", "positive 2363.0 0.069403 0.659914 0.0 0.0 0.0 0.0 22.0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solution\n", "df.groupby('airline_sentiment')['retweet_count'].describe()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAF4CAYAAABO0W4/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlcVOXbP/DPYXUBBBQXBBcUFzTkq2hZaqmYWimaC7jl\nnqVouKWQICqKhku5lGaLuyGa5S+wFMRwSck9NTdEc0FBAWUTBub8/uBhvqAHs+dx7jN1Pu/Xq1ez\nwX1xnJnrnHu5bkmWZRlERESPMVM7ACIiMk1MEEREpIgJgoiIFDFBEBGRIiYIIiJSxARBRESKLNQO\n4HlJT89WOwQion8cJyfbCp8zWoIoLi7GrFmzkJKSAkmSMGfOHFhbW2PmzJmQJAnu7u6YPXs2zMzM\nsHLlSuzfvx8WFhYIDg6Gp6cnrl+/rvhaIiISw2jfuAkJCQCAb7/9FoGBgVi2bBkiIiIQGBiILVu2\nQJZlxMfH49y5c0hKSkJ0dDSWLl2KOXPmAIDia4mISByjJQgfHx/MmzcPAHD79m3Y2dnh3LlzaNeu\nHQCgU6dOOHz4MI4fP44OHTpAkiQ4OzujuLgYGRkZiq8lIiJxjDoGYWFhgRkzZmDv3r1Yvnw5Dh06\nBEmSAABVq1ZFdnY2cnJyYG9vb/iZ0sdlWX7itU/j4FAFFhbmxvtjiIg0xuiD1IsWLcK0adMwcOBA\nFBQUGB7Pzc2FnZ0dbGxskJubW+5xW1vbcuMNpa99mszMvOcfPBHRv9zTBqmN1sX0/fffY82aNQCA\nypUrQ5IktGzZEkePHgUAJCYmwtvbG61bt8bBgweh1+tx+/Zt6PV6ODo6wsPD44nXEhGROJKxqrnm\n5eUhKCgI9+7dQ1FREcaOHYtGjRohJCQEOp0Obm5uCA8Ph7m5OVasWIHExETo9XoEBQXB29sbKSkp\niq+tCKe5EhH9fU+7gjBaghCNCYKI6O9TpYuJiIj+2f41K6mfsP0HMe309xXTDhGRYLyCICIiRUwQ\nRESkiAmCiIgUMUEQEZEiJggiIlLEBEFERIqYIIiISBETBBERKWKCICIiRUwQRESkiAmCiIgUMUEQ\nEZEiJggiIlLEBEFERIqYIIiISBETBBERKWKCICIiRUwQRESkiAmCiIgUMUEQEZEiJggiIlLEBEFE\nRIqYIIiISBETBBERKWKCICIiRRbG+KU6nQ7BwcG4desWCgsL8f7776NOnToYN24cGjRoAAAYNGgQ\n3njjDaxcuRL79++HhYUFgoOD4enpievXr2PmzJmQJAnu7u6YPXs2zMyYy4iIRDJKgti1axfs7e0R\nGRmJrKws9OnTBxMmTMDIkSMxatQow+vOnTuHpKQkREdHIzU1FRMnTsSOHTsQERGBwMBAvPjiiwgN\nDUV8fDy6detmjFCJiKgCRkkQPXr0QPfu3QEAsizD3NwcZ8+eRUpKCuLj41G/fn0EBwfj+PHj6NCh\nAyRJgrOzM4qLi5GRkYFz586hXbt2AIBOnTrh0KFDTBBERIIZJUFUrVoVAJCTk4NJkyYhMDAQhYWF\nGDBgAFq2bInPP/8cq1atgq2tLezt7cv9XHZ2NmRZhiRJ5R77Kw4OVWBhYW64n/6c/6aKODnZCmqJ\niEgsoyQIAEhNTcWECRMwePBg9OrVCw8fPoSdnR0AoFu3bpg3bx66du2K3Nxcw8/k5ubC1ta23HhD\nbm6u4eeeJjMz7/n/Ec8gPf2vkxcRkal62kmuUUZ+7927h1GjRmH69Ono378/AGD06NE4c+YMAODX\nX39FixYt0Lp1axw8eBB6vR63b9+GXq+Ho6MjPDw8cPToUQBAYmIivL29jREmERE9hSTLsvy8f2l4\neDh2794NNzc3w2OBgYGIjIyEpaUlatSogXnz5sHGxgYrVqxAYmIi9Ho9goKC4O3tjZSUFISEhECn\n08HNzQ3h4eEwNzd/SosKZ/Lbf3jef5ay/r5i2iEiMoKnXUEYJUGogQmCiOjvE97FRERE/3xMEERE\npIgJgoiIFDFBEBGRIiYIIiJSxARBRESKmCCIiEgREwQRESligiAiIkVMEEREpIgJgoiIFDFBEBGR\nIiYIIiJSxARBRESKmCCIiEgREwQRESligiAiIkVMEEREpIgJgoiIFDFBEBGRIiYIIiJSxARBRESK\nmCCIiEgREwQRESligiAiIkVMEEREpIgJgoiIFDFBEBGRIgtj/FKdTofg4GDcunULhYWFeP/999G4\ncWPMnDkTkiTB3d0ds2fPhpmZGVauXIn9+/fDwsICwcHB8PT0xPXr1xVfS0RE4hjlW3fXrl2wt7fH\nli1b8OWXX2LevHmIiIhAYGAgtmzZAlmWER8fj3PnziEpKQnR0dFYunQp5syZAwCKryUiIrGMkiB6\n9OiBDz74AAAgyzLMzc1x7tw5tGvXDgDQqVMnHD58GMePH0eHDh0gSRKcnZ1RXFyMjIwMxdcSEZFY\nRuliqlq1KgAgJycHkyZNQmBgIBYtWgRJkgzPZ2dnIycnB/b29uV+Ljs7G7IsP/Hav+LgUAUWFuaG\n++nP8w96CicnW0EtERGJZZQEAQCpqamYMGECBg8ejF69eiEyMtLwXG5uLuzs7GBjY4Pc3Nxyj9va\n2pYbbyh97V/JzMx7vn/AM0pP/+vkRURkqp52kmuULqZ79+5h1KhRmD59Ovr37w8A8PDwwNGjRwEA\niYmJ8Pb2RuvWrXHw4EHo9Xrcvn0ber0ejo6Oiq8lIiKxJFmW5ef9S8PDw7F79264ubkZHvvoo48Q\nHh4OnU4HNzc3hIeHw9zcHCtWrEBiYiL0ej2CgoLg7e2NlJQUhISEPPHap3niTH77D8/7z1LW31dM\nO0RERvC0KwijJAg1MEEQEf19wruYiIjon48JgoiIFDFBEBGRIiYIIiJSxARBRESKmCCIiEgREwQR\nESligiAiIkVMEEREpIgJgoiIFDFBEBGRIiYIIiJSxARBRESKmCCIiEgREwQRESligiAiIkVMEERE\npOiZEsS8efOeeGzGjBnPPRgiIjIdFk978qOPPsKNGzdw9uxZXL582fB4UVERsrOzn/KTRET0T/fU\nBPH+++/j1q1bmD9/PgICAgyPm5ubo1GjRkYP7p/u/rahwtqqPnCTsLaISBuemiBcXFzg4uKCXbt2\nIScnB9nZ2ZBlGQCQl5cHe3t7IUESEZF4T00QpdasWYM1a9aUSwiSJCE+Pt5ogRERkbqeKUFER0cj\nLi4Ojo6Oxo6HiIhMxDPNYqpTpw6qVatm7FiIiMiEPNMVRIMGDTB48GC8+OKLsLKyMjxeduCaiIj+\nXZ4pQdSqVQu1atUydixERGRCnilB8EqBiEh7nilBNGvWDJIklXusZs2a+OWXX576c6dPn8bixYux\nceNGnD9/HuPGjUODBg0AAIMGDcIbb7yBlStXYv/+/bCwsEBwcDA8PT1x/fp1zJw5E5Ikwd3dHbNn\nz4aZGauCEBGJ9EwJ4sKFC4bbOp0OcXFxOHXq1FN/Zu3atdi1axcqV64MADh37hxGjhyJUaNGGV5z\n7tw5JCUlITo6GqmpqZg4cSJ27NiBiIgIBAYG4sUXX0RoaCji4+PRrVu3/83fR0RE/0t/+7Tc0tIS\nPXv2xJEjR576unr16mHFihWG+2fPnsX+/fsxZMgQBAcHIycnB8ePH0eHDh0gSRKcnZ1RXFyMjIwM\nnDt3Du3atQMAdOrUCYcPH/67YRIR0f/RM11BfP/994bbsizj8uXLsLS0fOrPdO/eHTdv3jTc9/T0\nxIABA9CyZUt8/vnnWLVqFWxtbcstvqtataphtXZpl1bpY3/FwaEKLCzMDffTn+UPew6cnGwrfO6+\noBj+Kg4iov+NZ0oQR48eLXffwcEBy5Yt+1sNdevWDXZ2dobb8+bNQ9euXZGbm2t4TW5uLmxtbcuN\nN+Tm5hp+7mkyM/P+VjzPS3q6aRQtNJU4iOif5Wknl8+UICIiIqDT6ZCSkoLi4mK4u7vDwuKZftRg\n9OjRCAkJgaenJ3799Ve0aNECrVu3RmRkJEaPHo07d+5Ar9fD0dERHh4eOHr0KF588UUkJibipZde\n+lttERHR/90zfcufPXsWkyZNgr29PfR6Pe7du4dVq1ahVatWz9xQWFgY5s2bB0tLS9SoUQPz5s2D\njY0NvL294efnB71ej9DQUAAle02EhIRg6dKlcHNzQ/fu3f93fx0REf2vSXJpedan8Pf3R1BQkCEh\nnDp1CuHh4di+fbvRA3xWT3SxbP9BTMP9fSt8iuW+icjUPa2L6ZlmMeXl5ZW7WvDy8kJBQcH/PTIi\nIjJZz5QgqlWrhri4OMP9uLg47gVBRPQv90xjEPPmzcO4cePw0UcfGR779ttvjRYUERGp75muIBIT\nE1G5cmUkJCRg/fr1cHR0RFJSkrFjIyIiFT1Tgti2bRu2bt2KKlWqoFmzZvjuu++waRMHRYmI/s2e\nKUHodLpyK6f/ahU1ERH98z3TGISPjw+GDx+Onj17AgD27NmDrl27GjUwIiJS1zMliOnTp+Onn37C\nb7/9BgsLC7zzzjvw8fExdmxERKSiZ66X0aNHD/To0cOYsRARkQnhLjxERKSICYKIiBQxQRARkSIm\nCCIiUsQEQUREipggiIhIERMEEREpYoIgIiJFf29jafpH+uXHAULaefWtaCHtEJEYvIIgIiJFTBBE\nRKSICYKIiBQxQRARkSImCCIiUsQEQUREipggiIhIERMEEREpYoIgIiJFTBBERKTIqAni9OnTGDZs\nGADg+vXrGDRoEAYPHozZs2dDr9cDAFauXIn+/fvD398fZ86ceepriYhIHKMliLVr12LWrFkoKCgA\nAERERCAwMBBbtmyBLMuIj4/HuXPnkJSUhOjoaCxduhRz5syp8LVERCSW0RJEvXr1sGLFCsP9c+fO\noV27dgCATp064fDhwzh+/Dg6dOgASZLg7OyM4uJiZGRkKL6WiIjEMlo11+7du+PmzZuG+7IsQ5Ik\nAEDVqlWRnZ2NnJwc2NvbG15T+rjSa/+Kg0MVWFiYG+6nP68/5C84OdlW+Nx9QTH8VRxaioGInh9h\n5b7NzP57sZKbmws7OzvY2NggNze33OO2traKr/0rmZl5zzfgZ5Se/tfJSwRTiMMUYiCiv+dpJ3bC\nZjF5eHjg6NGjAIDExER4e3ujdevWOHjwIPR6PW7fvg29Xg9HR0fF1xIRkVjCriBmzJiBkJAQLF26\nFG5ubujevTvMzc3h7e0NPz8/6PV6hIaGVvhaIiISS5JlWVY7iOfhie6N7T+Iabi/b4VP3d82VEwM\nAKoP3FThc9xRjogqYhJdTERE9M/CBEFERIqYIIiISBETBBERKWKCICIiRUwQRESkSNg6CNK2VQn9\nhbU1ofN2YW0R/ZvxCoKIiBQxQRARkSImCCIiUsQEQUREipggiIhIERMEEREpYoIgIiJFXAdBmjL8\n4KdC2lnf4QMh7RAZE68giIhIERMEEREpYoIgIiJFTBBERKSICYKIiBQxQRARkSImCCIiUsQEQURE\nipggiIhIERMEEREpYqkNIsFG/rJLWFvfvNpbWFv078MrCCIiUsQEQUREioR3MfXt2xc2NjYAABcX\nF/j5+WH+/PkwNzdHhw4dEBAQAL1ej7CwMFy8eBFWVlYIDw9H/fr1RYdK9K/2buJpIe180amVkHbo\n+ROaIAoKCiDLMjZu3Gh4zNfXFytWrICrqyveffddnD9/Hjdv3kRhYSGioqJw6tQpLFy4EJ9//rnI\nUImINE9ogrhw4QLy8/MxatQoFBUVYeLEiSgsLES9evUAAB06dMDhw4eRnp6Ojh07AgC8vLxw9uzZ\nv/zdDg5VYGFhbrifbpw/4QlOTrYVPndfUAx/FYeWYgBMIw5TiAEwjThMIQb63xGaICpVqoTRo0dj\nwIABuHbtGsaOHQs7OzvD81WrVsWNGzeQk5Nj6IYCAHNzcxQVFcHCouJwMzPzjBp7RdLTs1Vp93Gm\nEIcpxACYRhymEANgGnGYQgxUsaclcKEJomHDhqhfvz4kSULDhg1ha2uLrKwsw/O5ubmws7PDo0eP\nkJuba3hcr9c/NTkQ0T/TtoPiTuwGdqgirK1/C6GzmLZv346FCxcCAO7evYv8/HxUqVIFf/75J2RZ\nxsGDB+Ht7Y3WrVsjMTERAHDq1Ck0adJEZJhERATBVxD9+/dHUFAQBg0aBEmSsGDBApiZmWHatGko\nLi5Ghw4d0KpVK7zwwgs4dOgQ/P39IcsyFixYIDJMIiKC4ARhZWWFJUuWPPH4tm3byt03MzPD3Llz\nRYVFREQKuFCOiIgUMUEQEZEiJggiIlLEBEFERIqYIIiISBETBBERKWKCICIiRUwQRESkiAmCiIgU\nMUEQEZEiJggiIlLEBEFERIqYIIiISBETBBERKWKCICIiRUwQRESkiAmCiIgUCd1RjojIFN38IV9I\nOy6+lYW087zwCoKIiBQxQRARkSImCCIiUsQEQUREipggiIhIERMEEREpYoIgIiJFXAdBRGQC9Jsv\nCWvLbEiTZ3udkeMgIqJ/KJO9gtDr9QgLC8PFixdhZWWF8PBw1K9fX+2wiIg0w2SvIOLi4lBYWIio\nqChMnToVCxcuVDskIiJNMdkEcfz4cXTs2BEA4OXlhbNnz6ocERGRtkiyLMtqB6Hko48+wuuvv45X\nX30VAPDaa68hLi4OFhYm2ytGRPSvYrJXEDY2NsjNzTXc1+v1TA5ERAKZbIJo3bo1EhMTAQCnTp1C\nkybPNi2LiIieD5PtYiqdxXTp0iXIsowFCxagUaNGaodFRKQZJpsgiIhIXSbbxUREROpigiAiIkVM\nEEREpIgJgshEFRUVlbv/8OFDlSIxDXq9HsXFxTh27BgKCwvVDkcTzMPCwsLUDkJNly5dwsSJE/HN\nN98gJycHDx8+RMOGDYXG8Nlnn6Ft27aG+0uWLMHLL78srP3t27fj/Pnziv95eHgIiwNQ/1gAQHFx\nMXbs2IG4uDhIkoQqVaqgcuXKwtpPT09HWloa3n//fbRt2xaZmZnIyMjAxIkTMXDgQGFxlMrJyYFO\np0NsbCzq1KmDSpUqCY9h/vz5SE1NRVxcHGJjY3H06FF069ZNeBymQOR3luZXns2fPx8RERGYNWsW\n+vfvjzFjxqBz585C2o6Ojsb27duRnJxsWPNRXFyMoqIiTJ06VUgMAHDz5k3FxyVJEhaDqRwLAAgN\nDUXNmjVx+PBhvPDCC5gxYwbWrl0rrP3Tp09j/fr1SElJQUhICADAzMwMHTp0EBZDqcmTJ+O1117D\nyZMnodfrsXfvXqxatUp4HL///js++ugjDBs2DBs3bsTw4cOFx1AqJycHkiRh79696Ny5M6pVqya0\nfZHfWZpPEABQv359SJIER0dHVK1aVVi7vr6+aN++PdasWYP33nsPQMkXQfXq1YXFAAD9+vUT2p4S\nUzkWAPDnn39i/vz5OH78OLp06YIvvvhCaPs+Pj7w8fHBL7/8Yig1o5a0tDT4+vpi+/bt2LhxI0aM\nGKFKHHq9HmfPnoWLiwsKCwvLVVkQyVQSpqjvLM0niGrVquHbb79Ffn4+YmJiYGdnJ6xtKysruLi4\nYPr06Th8+DAePXpkeK5Pnz7C4pgxYwYkSULpkpjS25IkYfPmzUJiMJVjAZRcuWRkZAAoOVs0M1Nn\nqE6WZYwbNw75+fmGxzZs2CA0Bp1Ohz179qBx48bIyMhQ7Yu5T58+mDNnDhYsWIDIyEj4+/urEocp\nJEyR31maTxALFizA6tWr4eDggLNnz2L+/PnCYwgICEDdunVRo0YNAGK7dgBgy5Ythts5OTlITU2F\ni4uL0H73UmofCwAIDAzEoEGDkJ6eDj8/PwQHBwuPAQA+/fRTBAUFGY6FGsaMGYOYmBgEBQVh48aN\nGD9+vCpx1KlTB9HR0QBKCnnGxsaqEocpJEyR31maX0l9+/btJx5zdnYWGkNpv6ra4uLisHz5cuj1\nevTo0QOWlpYYN26c0BhM5VgAQEZGBhwdHVVrf8SIEVi3bp1q7ZfKyclBQUGB4b7Ibr+EhAScOHEC\nMTExeOuttwCUdDfFx8dj9+7dwuIotWfPHsTGxmLmzJmIioqCp6ensDHLUr/99lu5+xYWFqhTpw5q\n16793NvS/BXE5MmTIUkS9Ho9bt68ifr162Pr1q1C2i6dqufq6oqTJ0+iRYsWhuesrKyExFDWl19+\niejoaIwZMwbjx49H//79hSUIUzoWr7/+OoqLiw33Sz+A06dPLxeXsURFRQEALC0tERISghYtWhiu\npPz8/IzeflkzZszA8ePHYWtra+h23Llzp7D2mzVrhqysLFhbWxtm6kiShDfffFNYDGW9/vrrcHd3\nx8WLF+Hn54datWoJj+GTTz7BvXv30KJFC5w/fx6WlpYoLCzEgAEDMGbMmOfaluYTROmHESiZZ146\na0SEHj16GPr7jxw5YnhckiTEx8cLi6OUubk5rK2tIUkSzMzMhHYxmdKxeOmll9CjRw94e3vj5MmT\niI6ORr9+/RAeHi7k5CE9PR0A0KpVKwDAvXv3jN5mRa5evYq4uDjV2q9Tpw769u0LX19f1caCytq0\naRP27t2LBw8eoG/fvrh+/TpCQ0OFxlCpUiXs2rUL1tbWKCwsxMSJE7FixQoMHTqUCcKYbG1tcePG\nDWHt7du3T1hbz8LLywvTp0/H3bt3MXfuXKFrIEzpWKSkpBjWXrz44ov47LPP0L59e6xcuVJI+wEB\nAQCUuxLu3LljlK6Einh6euLq1atwc3MT1qaStWvXYu3ateXWYBw8eFB4HDExMdi8eTOGDx+O4cOH\nqzIDMDMzE9bW1gBKrq4zMzNhZWUFvV7/3NvSfILw8/MznLnev38fr7zyivAY1O7SKDV9+nQkJCSg\ncePGcHNzU2UhkikcCysrK2zduhX/+c9/cPLkSVhZWeHs2bPl4hJBZFdCRWxsbNC/f39UqVLF8Jha\nX8wHDhxQZeJEWaXdbKVdfmp0BXft2hWDBg2Cp6cnfv/9d3Tp0gVbtmyBu7v7829M1ribN28a/ktP\nT1clhpCQEPnQoUNyQUGBfOTIEXnq1Kny4cOHZX9/fyHtf/DBB0LaeRZqHwtZluWMjAx5wYIF8ujR\no+VFixbJGRkZ8v79++UrV64Ii0GWZXnUqFHyo0ePZFmW5YKCAvndd9+VCwoK5AEDBgiLwc/PT9bp\ndMLaq8j7778v6/V6tcOQN27cKA8aNEh+9dVX5TFjxshffvmlKnH88ccfckxMjHzx4kVZlmX5/v37\nRjk+mr+CyM7ORn5+PszMzLB06VK89957aN++vdAY1O7SKJ3zbwrUPhYA4ODggNGjR6OoqAiyLOPa\ntWuqLFgT2ZVQkQYNGuD+/fuqDMaWpdPp0KtXL8POkpIkYcmSJcLjGDp0KNq3b49Lly6hYcOGaNas\nmfAYrl+/jl9++QU6nQ5Xr17Fpk2bMHfuXKO0pfkEERYWhpCQEKxYsQKTJ09GZGSk8AShdpfGjRs3\nsHTpUsXnpkyZIiSGUmofCwAIDg7GqVOnkJ+fj0ePHsHV1RXbtm0T1n4poV0JFThx4gS6dOkCe3t7\nQ7eKGl1MY8eOFd6mktTUVOzbtw8FBQVITk5GXFycYcxIlKlTp6Jbt244ceIEatasiby8PKO1pfkE\nYWVlBXd3d+h0Onh5eakyU2Lx4sVYvXo14uPj0aRJE3z88cc4c+aMsEV7lSpVEl6gsCJqHwsAuHDh\nAmJiYhAaGorJkyfjgw8+ENZ2WRMmTEDXrl1x9epV9OvXD02aNEFGRgYGDRokLIY9e/YIa0tJQkIC\nOnfujJSUlCeea9eunfB4PvjgA7Rv3x516tQR3napKlWqYNy4cbh27RoiIiIwePBgo7Wl+QQhSRI+\n/PBDdOrUCbGxsbC0tBTWdumMlKysrHKlA7KysoR2adSoUQN9+/YV1p4SUzkWQEkXkyRJyMvLU2Wh\nXHR0NAYMGIAlS5YYztovXLgAQPwV3eXLlzF79mw8fPgQvXv3hru7u9CFYVlZWQD+O/VXbVWrVsXk\nyZNVjUGSJKSnpyM3Nxd5eXm8gjCmZcuW4ffff0enTp2QlJRUYVeLMXz99dcIDg5+Yh61JElCa+60\nbNlSWFsVMZVjAQAtWrTAV199hZo1a2Ly5MnlaiGJUDqNVe2ppQAQHh6uWrVjAIYTl4CAAKSlpRnG\nhdLS0oTFUJa7uztiYmLQvHlzQ/IWffUdEBCAvXv3wtfXFz4+PvD19TVaW5ovtXH37l1kZ2fD3Nwc\na9euxbBhw9C8eXO1wyKV5ebmwtraGomJiWjVqpXQ8hJP6+MXXfJ7+PDhWL9+Pd555x1s2LBBtVIo\npjIuNGzYsHL31TiBEUnzVxBTp05FQEAAtmzZgu7du2PBggXCPgBP+7CrMRCoJlM6FkFBQeXu79+/\nH7Vr18aQIUOE1P6PiYmp8DnRCULNasdlmcq4kCnUCevYsSMyMjLg4OCArKwsWFlZoUaNGpg9e/Zz\nX8el+QQhSRLatm2L1atX48033xR6VmJqSaCoqAgWFv99Szx8+FDYF4IpHYuCggK4urrC29sbp0+f\nxu+//w5HR0fMmDEDq1evNnr7gwcPxgsvvGD0dp6FKVQ7BtQfF5o0aRKWL1+umKBFv3fbtm2LgIAA\nuLm54c8//8TKlSsxYcIETJ8+/bknCPWLm6isqKgIkZGR8Pb2xpEjR6DT6YS1PWnSJMPtX375RVi7\nj0tPT0dKSgoGDx6Ma9euISUlBcnJyRg1apSwGEzlWAAl60ImT56Mjh07IiAgADqdDoGBgcjOzhbS\nfmRkpOF2eHi4kDYfl5KSgpSUFKSnp6Nfv34ICgrCwIEDkZmZqUo8ao8LLV++HEBJMnj8P9Hu3Llj\nGJ+qV68eUlNTUb9+fZibmz/3tjR/BREREYFDhw5hwIABiIuLw6JFi4S1XfbD9tVXX6m2e1jZLS5D\nQ0Mhy7LwLS5N5VgAJeWtk5OT0ahRIyQnJyM3NxeZmZlGnS1SVtlhwUuXLglp83FKkwXk/ykzoUaf\n+5QpU54YFxLdfkV7k4hesOfk5ITFixcb1grVqFEDhw4dMsoMTM0miLLzql9++WXcunULzZs3R1FR\nkSrxqDkOzgStAAAgAElEQVRXwJS2uATUPRZAyZfj9OnTkZaWhjp16iA0NBSxsbGGrVCNTY1Nkh5X\ntq89IyMDt27dQv369YWPQZSd6lvWqVOnhE75VWsHOyUff/wxoqKikJiYiCZNmmDixIk4f/68UWZg\najZBVFSiV/QZkk6nM3whlr0tsghYly5d4OzsbLiMVospHAugpILpd999V+4xkWMCd+/eRVRUFGRZ\nNtwuJXo/iB07dmDt2rVo1KgRrl69iokTJ+KNN94Q1r4pTPUF/rso7/GSL5aWlkhNTcUbb7whbA3V\n7t27YWdnBy8vL0iShL1796J27dpG2XlQ89Nc1dSlSxfD2VHp5Xvp/0XugXDr1i04OjqqWinTVI4F\nIHaWiJKn1Z0SXdahf//+2Lx5M6ytrZGXl4fhw4cbtv4UKT8/H1FRUUhJSYG7uzv8/PyELmotNXHi\nRFhbWxsmMKSmpsLJyQlA+bEjYxo7diwePXoELy8vnDlzBgUFBTA3N0eLFi2e+/a4mr2CMAWmsgdC\n3bp1AQB//PEHoqKiym0vGRERISQGUzkWgNhZIkpEJ4Gnsbe3N8xsq1SpkmrTXKdOnQo3Nzd07NgR\nJ06cQFBQEBYvXiw8jocPH2L9+vUASrqdRo0ahcjISKHlT4qKirB+/XqYmZlBr9dj7Nix+Oqrr4zS\nDcYEQQYzZ87E0KFDhW5IY4pEzhIxVaWDshkZGXj77bfRqlUrnD9/vtyGPSJlZWVh2rRpAErGzIxZ\nf+hpsrOzDXuVZ2ZmIjs7GzqdDo8ePRIWQ1ZWFoqKimBlZYWioiI8ePAAwH+37X2emCBQMmvl5s2b\nqFevXrmNUbSmRo0aGDBggNphqE7kLBFT9fjZqCRJeOutt1SKBmjcuDGOHz+ONm3a4OLFi3B2djaM\nU4kco5o4cSIGDhwIGxsb5OXlYdasWfjmm2/Qv39/YTEMHjwYvXr1gru7O65evYoxY8Zg9erV6Nix\n43NvS/NjED/99BNWr16N4uJiw77I48ePVyWWrKws2Nvbq9I2UDJw7+LiUq7OjOiVu6agoKAAUVFR\nSE5ORpMmTdC/f3+cP38erq6uRhkIfNzt27crfM7Z2dno7ZeVk5ODVatWITk5GQ0aNMD48eNVeY++\n+eabyM/Ph6WlZbm1SmqMUen1emRkZKB69eqqzTjLzMzEn3/+iXr16sHBwQHFxcVGucLVfILw9/fH\nhg0bMHr0aGzYsAH9+vV7YgaLsSUlJWHu3LmGJOXs7KzKmfzjJSYAcWMQw4YNq/DDJnre/eN7QQMl\n4xKilM5UysrKQm5uLtzd3XHlyhXUqFEDO3fuFBYHULKAsW3btvD29kZSUhJ+/fVXIavJTdWhQ4ew\nbt26cuN0ot+fIj+nmu9iMjc3h5WVlWGfWTVm8nz66afYtGkTJk6ciPfeew+DBg1SJUE8/iYTWTFz\nzpw5AIBVq1aha9euaNOmDc6cOYOEhARhMZTaunUrJEmCXq/HlStXULduXaEJonRa64QJE7Bo0SJD\nd4boUt9AyZlqaYG65s2b4+effxYeAwB8++23T0ygiI2NFR5HREQEgoODVR2nK51mLMsyzp8/b9TP\nqeYTRJs2bTBlyhTcvXsXoaGhqtTAMTMzM+zYZW1tjapVqwqPAShJVFu3bjUMujVo0OCpheOep9JB\n4Xv37hk+AN26dVOlOFrZBUeFhYUIDAwUHgNQMlhuY2MDoGSTGDX2RCgoKEB6ejqcnJxw7949odud\nlrVhwwZ88cUXQoolPk2dOnUMW+KqpexYQ6dOnYxaEkfzCWLKlClITEyEh4cHGjVqJLTWfal69eph\nyZIlyMrKwhdffCG8n7nUvn37kJiYiAULFmDkyJGGs3rRoqOj4enpiZMnT6o+MFxcXIwbN26o0naH\nDh0wdOhQtGzZEmfOnIGPj4/wGAIDA+Hv7w9bW1vk5ORg3rx5wmMAgKZNm6JOnTqqzySrXr06QkND\n4eHhYegSFb14sWz9p/T0dNy7d89obWk+Qezbtw9nz57FpEmTMHr0aFhaWgofmJ0zZw6io6PRpk0b\nVK5cWbUCbU5OTrCyskJubi7q168vtHBhqdItR3/66Sc0btxYlbnuZf/9i4qK8M477wiPAQAmT56M\ns2fP4tq1a+jTpw+aNWsmPIZ79+4hPj7eMLVTLS+99BJ8fHzg6uqqak0oFxcXADDql/JfKXtVb2Vl\nhQULFhitLc0PUvft2xcbNmyAra0tsrOzMXbsWHz77bdCY5g7d2650h8ffvghPv74Y6ExAMCsWbMM\nqzOrVauGxMRE/PDDD8LjOHz4MG7cuIFWrVqhYcOGsLa2Fh5DWfn5+aqMTV2/fh0//fSTIVGnpaVh\n7ty5QmMYOnQoNm3aJLRNJW+//TZmz54NW1tbw2NqleHYv38/Ll++jIYNG6pyVfe42NhYo5U/0fwV\nhIWFheFNZ2trCzMzcRXQN2/ejM8//xxZWVnlNodv1KiRsBjKmjt3LlJTU9GjRw/s3LlTeJVKoKT/\n/86dO0hOToaVlRW++OILodvAKhk2bBi2b98uvN2pU6eiW7duOHHiBGrWrCmsmmxZhYWF6NOnDxo2\nbGj4bKjxvqhVqxZeeOEFoZ9PJUuWLMH169fRunVrfP/99zh+/DhmzJihakxff/01E4SxeHp6YurU\nqYYzZw8PD2FtDxkyBEOGDMHq1auFVQqtSFRUFPr164e6devi2LFjsLCwQOPGjYXHcfz4cWzevBnD\nhg1D3759sXXrVuExPE6ti+wqVapg3LhxuHbtGiIiIlRZPVy6ellthYWF8PX1hbu7u6HvX41E9dtv\nvxl6GIYPH46BAwcKj+Fxxnx/aj5BhISEIC4uDlevXkXPnj3RpUsX4TH4+/vjxx9/LLch+7hx44S1\nv2LFCly+fBm9e/eGhYUFateujXXr1iEjIwMTJkwQFgdQMihcUFAASZJQXFys+hkjoF75bUmSkJ6e\njtzcXOTl5Qm9gkhPT8fXX3+NKlWqYPTo0apXGBD5eXiaoqIi6PV6Qx0kUyjNbswYNJsgEhIS0Llz\nZ8Oc82rVqiE9PR1RUVHCZyWUFoa7dOkSrK2thfd3JyYmYtu2bYY3mouLC5YtWwZ/f3/hCWL48OF4\n++23kZGRgQEDBmDEiBHC2lbae6C05LYaAgICsHfvXvj6+sLHxwe+vr7C2p45cyZ8fHzw4MEDREZG\nYvbs2cLaVuLh4fHEim41vPnmmxg0aBBatWqFM2fOCC19XtHkmaysLKO1qdkEUXpQ1Zhb/jhZljF3\n7lwEBQVh/vz5wrsSqlSp8sQXo6WlpSrrMXr27ImXX34Z169fh4uLi9CZMxUNeqqxQA0oWb3dtm1b\nFBYWIiEhQehgvU6nM1QoFZmkKxIcHIy2bduid+/eSEpKwsyZM4Wu6C578lCrVi0kJCSgefPmyMjI\nEBaDGtubajZB9O3bFwDw4MED+Pn5qdLfXsrc3BwFBQXIz883dK2IVKlSJdy4cQOurq6Gx27cuCH0\n8tkUtnQsfU+o7cKFC/jkk09QvXp1vPnmm5g8eTKAkhILffr0ERJD2X8LtRbHlaX2iu6yJw8NGzZU\nZb2UGjSbIEp5e3sjMjISubm5ePvtt/HGG28IL2k8ZMgQrF+/Hq+88gpeffVVtGnTRmj706ZNw/jx\n49G+fXu4urri9u3bOHjwoND9uU1pS0e1hYWFYeLEiXjw4AEmTJiAnTt3wtHREWPGjBGWIPLz83Ht\n2jXo9Xo8evQI165dMwyGNmzYUEgMZam9ottUTh5E0/w6iFJpaWmIiIjAgQMHcOzYMdXiyMnJwb17\n99CgQQOh7WZnZyM+Ph5paWlwdnbGa6+9ZijzIELZbTUfJ3pMSG3Dhg0zlBjx9/c3zJoZMWIE1q1b\nJywGJWotUDt06BBCQ0PLrehu37698Di0RvNXELdv38b333+Pn3/+GR4eHli7dq2q8djY2GDEiBHC\n593b2toKOztVYgpjQcXFxSguLsaUKVOwbNkyyLIMWZYxduxYoV+KZbt3yu51IPKsWY0aWE/zyiuv\nmMSKbjWpUfFY8wli0qRJhn13RZ4xP40WL+rKbrP5+EpqUXbs2IHVq1fj3r176NGjB2RZhpmZGby9\nvYXFAABXrlzB1KlTIctyudvJyclC4zAFd+7cQWBgINasWYNq1arh0KFD2LhxI1asWIFatWqpHZ5Q\nalQ81nyCqFevnsn1f5vC3Gq1qLmSeuDAgRg4cCC2b98udIewx33yySeG22Xfm6b2PhVh9uzZGDNm\njKGKa69evWBhYYHZs2drbl8KNSoeaz5BFBUV4cKFC2jYsKHhi1nUFoZKM3dkWVateqgpMIWV1C1b\ntsTJkydhZmaGpUuX4r333hPa392uXTthbf2Vx3e3s7CwgIODg7Aqu7m5uU/UO+rZs6cq4yCmRFTF\nY80niJSUlHKLbkRuYVjRGaEWzxRLmcJK6rCwMISEhGDFihWYPHkyIiMjNTsgOm7cONy9excNGzbE\ntWvXULlyZRQVFWH69OlCFu5V1N2qxW7YUiIrHms+Qfy///f/AJTMsy7dtEcUUzpTNBVqrqQuZWVl\nBXd3d+h0Onh5eZlEuQ+1uLi4YP369XB0dMSDBw8wa9YszJs3D2PHjhWSIDw9PbFhw4ZyJdc3btyI\npk2bGr1tU+Xk5ISuXbsaxumMWQZF8wnit99+w5w5c1TfD5pK9OzZE15eXkhPT0eNGjVU2TxJkiR8\n+OGH6NSpE2JjY1XftEhN9+/fN8waqlatGu7duwd7e3thSXPy5MmYP38+OnbsCCcnJzx8+BAdOnRQ\n3JdZK0SO02l+HcSQIUOwatUqTJw4EV9++SUGDRqE7777Tu2wNGvlypUoLCzElClTMGnSJLRs2RLv\nvvuu0BgyMjLw+++/49VXX8XRo0fRtGlT2NvbC43BVMyZMwcPHjyAl5cXTp06BXt7e3h7e+PHH3/E\nZ599JiwOnU6HrKwsODg4wMJC2+e1Q4YMMYzTbdy4EQMHDsS2bduM0pa2jzRMZz9oKrFv3z5Dgl6+\nfDn8/f2FJwgrKyscOXIEmzdvRoMGDTTdnTF79mzEx8cjOTkZvr6+ePXVV3H16lXhpSYsLS3h5OQk\ntE1TJXKcTrudq//DVPaDphKSJKGwsBBAyVmjGhe4wcHBcHZ2xuTJk1G3bl3MnDlTeAymIicnBwUF\nBahZsyYyMzPx/fffw83NTZUd9qhE6Tjd5cuXMWDAAKMW99T8FUTZ/aCrVKmi2qbsVMLf3x+9evVC\nkyZNcPXqVYwZM0Z4DGoXhjMl48ePR82aNVGnTh0A6q3RKS3PX8qY22yaurIVj11dXeHg4GC0tjSd\nIC5cuIBmzZqhX79+2LZtG6ytrTXfv6m2AQMGGGZouLq6qlJWQe3CcKZElmWjTqP8KwkJCThx4gRi\nYmJw8uRJACVdLPv27dNsgni85IalpSVq166N999/Hy4uLs+1Lc1+G37zzTeIjY3F1q1b8fHHH+P2\n7dtwdnbGggULMGvWLLXD06w//vgDUVFRKCgoMDwWEREhNIYPPvgA/v7+5QrDaVXTpk1x+vRpNG/e\n3PCYqIWkANCsWTNkZWXB2traUHZFkiS89dZbwmIwNS4uLmjdujXatGmDU6dOISEhAV5eXvjoo4+w\nfv3659qWZmcx+fn5YcuWLZAkCS+//DL27NkDOzu7ctUzSTxfX18MHToUtWvXNjzWsWNHVWIpLQx3\n/fp11K9fX5UY1Na7d2/k5OQY7otcSFpW6TafVDIGUTYRjBw5Et988w2GDh2KTZs2Pde2NHsFUbVq\nVZibm+PcuXNwdXWFnZ0dAG2v0DQFNWrUMJl1KKXdW1OnThVeXddU7Nq1S+0QAABr167F2rVry+3V\nosYOa6ZAp9PhwIED+M9//oMTJ06gqKgIN27cQH5+/nNvS7MJQpIkpKSkYOfOnejSpQsA4Nq1azA3\nN1c5Mm2rW7cuvvjiCzRv3tzQz1rRXryiaPGkYe7cuQgNDYWfn98TA9NqXGHHxMTgwIEDnD0FYOHC\nhfj444+xYMECNGnSBAsWLMCpU6eMsnhQswnigw8+wIcffogaNWpg8uTJSEpKwvTp0/Hpp5+qHZqm\n6XQ6pKSkICUlxfCY2glCi9V1S+uTPb5Ct3QKsmguLi7Cd3o0VfXq1cPKlSsN99PS0tCrVy+jtKXZ\nMYjHFRYWQpIkTZdVMEVpaWmoWbOmkLYqqq576NAhHD16VEgMpmbt2rUYO3YsAODSpUuYMWMGdu7c\nKTyOsWPHIjU1FU2aNAFQkrRF7VVuaj755BN8++230Ol0ePToERo0aICYmBijtKXZK4jHiZyZQRX7\n9NNPsXXrViFv/sexuu6TLl++jK1btyIvLw/ff/89wsLCVImjNElRydTfxMRELFiwACNHjjRsJGQM\nTBBkUvbt2yfszf84Vtd90sKFCzFt2jRkZGRgx44dqp1INWnSBAcPHkRRURFkWUZaWppm/72cnJxg\nZWWF3Nxc1K9fHzqdzmhtaT5BfPXVVxg9erTaYdD/EPnmp4qVHZzW6XS4ePGioeS2GoPUAQEBcHNz\nw6VLl2Btba3pweratWtj+/btqFy5MpYsWYKHDx8arS3NJ4hffvkFI0aM4OwlEyHyzU8VE7XN67OS\nZRlz585FUFAQ5s+fb9T6Q6Zu7ty5SE1NRY8ePbBz506j/ltpPkFkZmaiY8eOcHFxgSRJkCSJC+VU\n9PibX6sDkWqrW7cuAODOnTtYsGABkpOT0aBBA9X2YTA3N0dBQQHy8/MNVUy16tatW0hISDBUG9i3\nbx8aNWpklLY0nyC0tvG5KYuKikK/fv1Qt25dHDt2DBYWFmjcuLHaYWnarFmzMGjQILRt2xZJSUlG\nKefwLIYMGYJ169bhlVdewauvvoo2bdoIj8FUjB8/Hq+//rphca8xaT5BWFhYIDIyEhkZGejRowea\nNm1qOHsicVasWIHLly+jd+/esLCwQO3atbFu3TpkZGRgwoQJaoenWQUFBejatSsAwMfHB998840q\ncXTv3t1wu2fPnrCxsVElDlNQp04dTJw4UUhbmk8QISEhGDlyJD777DN4e3tj5syZRtudiSqWmJiI\nbdu2GQZGXVxcsGzZMvj7+zNBqKi4uBgXL15E06ZNcfHiReGLBh+vXFrWhg0bhMZiKjp37ozFixeX\nu7ru06ePUdrSfIJ49OgR2rdvj88//xxubm6wtrZWOyRNqlKlyhNfBJaWltzhT2WzZs1CcHAw0tPT\nUbNmTeGVbUunOa9atQpdu3ZFmzZtcObMGSQkJAiNw5TExsbCzc0NycnJAIy70l/zCcLa2hoHDhyA\nXq/HqVOnuGBOJZUqVTLsAVHqxo0bmixzYUo8PDzw5Zdf4saNG3BxcRG+P4ebmxsA4N69e4b9H7p1\n64aNGzcKjcOUWFlZCVsfpPkEMW/ePCxatAiZmZn4+uuvVVspqnXTpk3D+PHj0b59e7i6uuL27ds4\nePAgFi1apHZomhYbG4tPP/0UjRs3xqVLlxAQEABfX19VYomOjoanpydOnjyp6ZI4zs7OWLNmDTw8\nPIxe0JK1mPDffXdLVa9eXcVotCs7Oxvx8fFIS0uDs7MzXnvtNU0PRpoCPz8/fP3116hatSpycnIw\nfPhw7NixQ3gc6enpWL16Na5du4bGjRvjvffeM+pWm6ZMaaqxsTbV0vwVxIcffogTJ07A1tYWsixD\nkiRVipERYGtra7TBNvrfkSTJMA5kY2Oj2hidk5MTxo8fbziRy8/P12yCeDwZpKWlGa0tzSeIlJQU\nxMXFqR0GkUlydXXFwoUL4e3tjWPHjqFevXqqxBEWFobExETUrFnTcCKn1QWtIgtaaj5BeHp64urV\nq4bBMCL6r4iICERFReHw4cNo1KgRpk6dqkocZ86cQVxcHLcdhdiClppPEDY2Nujfvz+qVKlieEyr\nWxkSlZWRkYHKlStjyJAh2LVrF4qKilSLpX79+igoKNB0kb5SrOYq0NGjR5GUlAQLC80fCiKDL7/8\nElFRUbC0tISXlxdSU1NRvXp1HD58GIsXLxYeT2pqKjp37oz69esDgKa7mFjNVaAGDRrg/v37qFWr\nltqhEJmMn376Cbt370ZeXh569uyJX375BRYWFhgyZIgq8bBo43/NnTsXd+7cEVLQUvMJ4sSJE+jS\npQvs7e0Nc4rZxURaV7lyZVhYWMDOzg5ubm6GK2y1rrRZM63EhQsX8PPPPyMzMxO1a9dGjx490KBB\nA6O1p/kRnz179uDcuXM4dOgQDh48yORA9D90Oh0KCwvL3dbr9arEEhISgn79+kGn08Hb2xvz589X\nJQ417d69G8HBwahTpw46duyIqlWrYtKkSUadhan5K4iLFy8iODgYd+/eRY0aNbBgwQJ4eHioHRaR\nqm7duoUePXqgdB1t6W21Sp+wZlpJccJNmzaVm1DTt29fvP/++/Dx8TFKm5pPEOHh4Zg/fz6aNWuG\nP/74A3PmzNHs4BdRqX379qkdQjmsmVbSzVY2OQAlszCNuRum5ruYAKBZs2YAgObNm3M2E5EJmjdv\nHr777jtDzTRRxepMSUVXb8bs9tP8t6GZmRkSEhLg7e2N3377TZNnJkSm7sCBA1i2bJnh/oYNG/DO\nO++oGJF4V65ceWKhoizLhrLfxqD5Yn23bt3CokWLcPXqVTRq1AgzZsyAs7Oz2mERmYSEhAR07tzZ\ncD82NtZQdluEH3/8Efv27cPRo0fx0ksvASg5Y7506ZLRykuYqqSkpAqfa9eunVHa1PwVxOHDh7F8\n+XLDfS2emRA9LiEhASdOnEBMTAxOnjwJoGR3uX379glNEB07doSTkxOysrLg5+cHoOSqv+y+IVph\nrCTwNJpNEGXPTI4cOQLgv2cmTBCkdc2aNUNWVhasra3RsGFDACV94G+99ZbQOKpVq4YXX3wRx48f\nL/cFuWTJEtXqQmmJZruYHjx4gAsXLmDNmjV47733APz3zISrqolK6PV6ZGVl4dGjR4bHRHbBRkdH\nY/v27UhOTjbswVxcXIyioiKW5RdAswmilCzLyM3NhSRJ2Lt3Lzp37oxq1aqpHRaRSQgNDcXhw4dR\no0YNVcpsFxYWIi0t7YkTuerVq3NCiQCaTxCTJ0/Ga6+9hpMnT0Kv1+P+/ftYtWqV2mERmYSBAwci\nKirKZPYGz8nJwc6dO7F161bExsaqHc6/nubXQaSlpcHX1xfJycmYO3cucnNz1Q6JyGTUrFnTJD4T\nV65cQVhYGHx8fHD58mUsXLhQ7ZA0QbOD1KV0Oh327NmDxo0bIyMjwyQ+DERq8/PzgyRJuH//Pl5/\n/XXDrCHRXUw///wzNm/eDJ1Oh7fffhspKSmYO3eusPa1TvNdTHv27EFMTAyCgoIQFRUFT0/PcvO+\nibTo1q1bFT4nsoqql5cX3nnnHYwcORIODg4YO3Ys1q5dK6x9rdN8giCiigUFBZW7b2lpidq1a2PI\nkCFCJnOkpaXhu+++w65du9CkSRPcvHkT27dvN3q7VELzCaJDhw4ASmYzPXjwAK6urti9e7fKURGZ\nhilTpsDV1RXe3t44ffo0fv/9dzRv3hwXLlzA6tWrhcby66+/Ytu2bTh9+jS6d++OGTNmCG1fizQ/\nBlF2/4dbt25h5cqVKkZDZFoyMjKwdOlSACWrmkeNGoXAwEBVdpZr37492rdvj4yMDOzatUt4+1qk\n+VlMZdWtWxdXr15VOwwik5GTk2MoBpecnIzc3FxkZmYiLy9PtZgcHR0xYsQI1drXEs13MU2ZMsUw\nxzstLQ22trb47LPPVI6KyDScOXMGYWFhSEtLQ506dRAaGoozZ86gRo0a6N69u9rhkZFpPkGUrZBo\nbW2Nli1bGnUDDiL6+9SuKqtVmh6DiIqKQr9+/WBhYYFjx47h/PnzaNWqldphEalu0qRJWL58uWES\nR1ki9203laqyWqXZBLFixQpcvnwZvXv3hoWFBWrXro1169bh/v37CAgIUDs8IlWVlsAXmQyUmEpV\nWa3SbBfTgAEDsG3btnI1ZnQ6Hfz9/bFjxw4VIyNSX9mxucctWbJEcDTqV5XVKs1eQVSpUuWJD4Cl\npSWqVq2qUkREpsPf31/tEMoJCwtTtaqsVmk2QVSqVAk3btwotzPVjRs3TKZqJZGajh07hvHjxwMo\nmd1Xs2ZNVeO5cOEC9u7dy8+nYJpNENOmTcP48ePRvn17uLq64vbt2zh48CAWLVqkdmhEqjty5Igh\nQUybNg0bNmxQNZ7SqrI2NjaqxqE1mk0Q7u7u2LJlC+Lj45GWloYWLVpgwoQJfAMSoaT0jNJt0Uyl\nqqxWaTZBAICtrS369OmjdhhEJqdsV46a3TqlZT5IHZqdxUREFWvTpg3c3d0hyzKuXLliuK3Wmbva\nVWW1StNXEESkzNSK4RUUFDxRVdbR0REzZswQXlVWS5ggiOgJIjcFehamVFVWS1jNlYhMnilWldUC\njkEQkcljVVl1MEEQEZEijkEQkckylaqyWsUrCCIiUsQrCCIyWaZWVVZrmCCIyGSZWlVZrWGCICKT\nZWpVZbWG6yCIyGQdOXLEcHvatGkqRqJNTBBEZLJMpaqsVjFBEJHJMpWqslrFaa5EZLJMraqs1jBB\nEJHJunXrVoXPmVpBwX8jJggiIlLEMQgiIlLEBEFERIqYIIj+pqioKPz4448AgE8//RTx8fEqR0Rk\nHFxJTfQ3nTx5Eu3atQMAfPDBBypHQ2Q8TBD0r3P06FGsWbMGlSpVQnJyMpo2bYrFixcjNjYW69ev\nh16vR4sWLTB79mxYW1sjNjYWy5cvR+XKleHh4YHi4mIsXLgQu3fvxjfffINHjx6hoKAA4eHh0Ol0\n2LdvH44cOQInJyfExMSgXbt2uHjxImrWrInRo0cDKClT/dZbb6F169YIDQ3FnTt3IEkSpk6dipdf\nfhkrVqzAqVOnkJqaiiFDhqBDhw4ICwtDVlYWKlWqhJCQEHh4eODSpUuYN28e8vLykJGRgZEjR+Kd\nd20urMYAAATgSURBVN7Br7/+isjISABAtWrVsGTJEjg6OmLHjh345ptvIEkSWrRogZCQEFStWhUd\nOnRA9+7dcfz4cZibm+OTTz6Bq6urmv9M9E8gE/3LHDlyRPby8pJTU1Pl4uJiuV+/fvK6devkQYMG\nyY8ePZJlWZYXL14sr1q1Sr5//778yiuvyHfu3JGLi4vlCRMmyDNmzJCLi4vld955R75//74sy7Ic\nHR0tjxs3TpZlWZ4xY4a8Y8eOcrfPnTsn9+3bV5ZlWc7OzpZfeeUVuaCgQA4MDJTj4uJkWZblu3fv\nyl27dpWzs7Pl5cuXy0OHDjXE7OfnJ587d06WZVm+fPmy/Prrr8uyLMvh4eHy4cOHZVmW5T///FP2\n8vKSZVmWhw4dKp8+fVqWZVlev369fODAAfnChQuyj4+PnJGRIcuyLIeFhckLFy6UZVmWmzRpIu/d\nu1eWZVmOiIiQIyIinvtxp38fXkHQv5K7uztq164NAGjUqBGys7Nx/fp1DBw4EACg0+ng4eGBY8eO\n4T//+Q9q1aoFAOjTpw/i4uJgZmaGVatWYd++fUhJSUFSUhLMzCoesvPw8EBhYSGuX7+OkydPonPn\nzrCyssLhw4dx9epVLF++HABQVFSEGzduAAA8PT0BALm5uTh79iyCgoIMvy8vLw+ZmZmYOXMmDhw4\ngDVr1uDixYuGPZi7du2KgIAA+Pj4oGvXrnjllVewadMmdO7cGQ4ODgAAPz+/cr+zY8eOhmNz7Nix\n//tBpn89Jgj6V7K2tjbcliQJtra26NmzJ2bNmgWg5Eu5uLgYSUlJ0Ov1T/x8bm4u+vXrB19fX7Rt\n2xZNmzbF5s2bn9pm7969ERsbi5MnT2Ls2LEAAL1ej/Xr18Pe3h4AcPfuXdSoUQNxcXGoVKmS4TVW\nVlb44YcfDL/rzp07sLe3x6RJk2BnZ4fOnTvjjTfeQExMDABgxIgR6Ny5MxISEhAZGYkzZ86gatWq\n5eKRZRlFRUVPHBNJkljXiJ4JZzGRZuzduxf379+HLMsICwvD+vXr0bp1a/z+++9IS0uDLMuIjY2F\nJEm4du0azMzM8N577+Gll15CYmIiiouLAQDm5uaG22X16tULsbGxuH79Ory9vQEAL730ErZs2QIA\nuHLlCnr37o38/PxyP2dra4sGDRoYEsShQ4cwZMgQw+1JkybBx8cHv/32GwCguLgYAwYMQG5uLkaM\nGIERI0bg/PnzaNeuHfbt24esrCwAwLZt2/Diiy8a4UiSVvAKgjTB1tYWAQEBGD58OPR6PZo3b453\n330X1tbWmDVrFkaNGgUrKyu4uLjAzs4OzZo1Q/PmzdGzZ09UqlQJbdu2xe3btwEAL7/8MpYuXQpb\nW9tybdSpUwcODg7w8vIyFJabNWsWQkND0atXLwDAxx9/DBsbmyfii4yMRFhYGL788ktYWlpi2bJl\nkCQJEydOxODBg2FnZ4eGDRuibt26uHnzJqZMmYKZM2fCwsIC1tbWmDNnDpo0aYJx48Zh2LBh0Ol0\naNGiBebMmWPkI0v/Ziy1QZqWmZmJjRs3IiAgAGZmZggPD0f9+vUxbNgwtUMjUh2vIEjT7O3t8fDh\nQ7z11lswNzdHixYtDAPZRFrHKwgiIlLEQWoiIlLEBEFERIqYIIiISBETBBERKWKCICIiRUwQRESk\n6P8DKibgbGxQUqIAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# solution\n", "sns.countplot(df['negativereason'], order=df['negativereason'].value_counts().index)\n", "plt.xticks(rotation=90);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Featurization\n", "\n", "How are we going to turn our tweets into numbers? Well first, I want to do some quick preprocessing to remove some junk:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['lol USER and USER are like soo HASHTAG HASHTAG saw it on URL HASHTAG',\n", " 'omg I am never flying on Delta again',\n", " 'I love USER so much HASHTAG']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_handle_pattern = r'@(\\w+)'\n", "hashtag_pattern = r'(?:^|\\s)[##]{1}(\\w+)'\n", "url_pattern = r'https?:\\/\\/.*.com'\n", "\n", "def clean_tweets(tweets):\n", " tweets = [re.sub(hashtag_pattern, ' HASHTAG', t) for t in tweets]\n", " tweets = [re.sub(twitter_handle_pattern, 'USER', t) for t in tweets]\n", " return [re.sub(url_pattern, 'URL', t) for t in tweets]\n", "\n", "my_tweets = [\"lol @justinbeiber and @BillGates are like soo #yesterday #amiright saw it on https://twitter.com #yolo\",\n", " 'omg I am never flying on Delta again',\n", " 'I love @VirginAmerica so much #friendlystaff']\n", "\n", "clean_tweets(my_tweets)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idairline_sentimentairline_sentiment_confidencenegativereasonnegativereason_confidenceairlineairline_sentiment_goldnamenegativereason_goldretweet_counttexttweet_coordtweet_createdtweet_locationuser_timezoneclean_text
0570306133677760513neutral1.0000NaNNaNVirgin AmericaNaNcairdinNaN0@VirginAmerica What @dhepburn said.NaN2015-02-24 11:35:52 -0800NaNEastern Time (US & Canada)USER What USER said.
1570301130888122368positive0.3486NaN0.0000Virgin AmericaNaNjnardinoNaN0@VirginAmerica plus you've added commercials t...NaN2015-02-24 11:15:59 -0800NaNPacific Time (US & Canada)USER plus you've added commercials to the expe...
2570301083672813571neutral0.6837NaNNaNVirgin AmericaNaNyvonnalynnNaN0@VirginAmerica I didn't today... Must mean I n...NaN2015-02-24 11:15:48 -0800Lets PlayCentral Time (US & Canada)USER I didn't today... Must mean I need to tak...
3570301031407624196negative1.0000Bad Flight0.7033Virgin AmericaNaNjnardinoNaN0@VirginAmerica it's really aggressive to blast...NaN2015-02-24 11:15:36 -0800NaNPacific Time (US & Canada)USER it's really aggressive to blast obnoxious...
4570300817074462722negative1.0000Can't Tell1.0000Virgin AmericaNaNjnardinoNaN0@VirginAmerica and it's a really big bad thing...NaN2015-02-24 11:14:45 -0800NaNPacific Time (US & Canada)USER and it's a really big bad thing about it
\n", "
" ], "text/plain": [ " tweet_id airline_sentiment airline_sentiment_confidence \\\n", "0 570306133677760513 neutral 1.0000 \n", "1 570301130888122368 positive 0.3486 \n", "2 570301083672813571 neutral 0.6837 \n", "3 570301031407624196 negative 1.0000 \n", "4 570300817074462722 negative 1.0000 \n", "\n", " negativereason negativereason_confidence airline \\\n", "0 NaN NaN Virgin America \n", "1 NaN 0.0000 Virgin America \n", "2 NaN NaN Virgin America \n", "3 Bad Flight 0.7033 Virgin America \n", "4 Can't Tell 1.0000 Virgin America \n", "\n", " airline_sentiment_gold name negativereason_gold retweet_count \\\n", "0 NaN cairdin NaN 0 \n", "1 NaN jnardino NaN 0 \n", "2 NaN yvonnalynn NaN 0 \n", "3 NaN jnardino NaN 0 \n", "4 NaN jnardino NaN 0 \n", "\n", " text tweet_coord \\\n", "0 @VirginAmerica What @dhepburn said. NaN \n", "1 @VirginAmerica plus you've added commercials t... NaN \n", "2 @VirginAmerica I didn't today... Must mean I n... NaN \n", "3 @VirginAmerica it's really aggressive to blast... NaN \n", "4 @VirginAmerica and it's a really big bad thing... NaN \n", "\n", " tweet_created tweet_location user_timezone \\\n", "0 2015-02-24 11:35:52 -0800 NaN Eastern Time (US & Canada) \n", "1 2015-02-24 11:15:59 -0800 NaN Pacific Time (US & Canada) \n", "2 2015-02-24 11:15:48 -0800 Lets Play Central Time (US & Canada) \n", "3 2015-02-24 11:15:36 -0800 NaN Pacific Time (US & Canada) \n", "4 2015-02-24 11:14:45 -0800 NaN Pacific Time (US & Canada) \n", "\n", " clean_text \n", "0 USER What USER said. \n", "1 USER plus you've added commercials to the expe... \n", "2 USER I didn't today... Must mean I need to tak... \n", "3 USER it's really aggressive to blast obnoxious... \n", "4 USER and it's a really big bad thing about it " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['clean_text'] = (df['text']\n", " .str.replace(hashtag_pattern, 'HASHTAG')\n", " .str.replace(twitter_handle_pattern, 'USER')\n", " .str.replace(url_pattern, 'URL')\n", " )\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bag of words\n", "\n", "Now that we've cleaned the text, we need to turn the text into numbers for our classifier. We're going to use a \"bag of words\" as our features. A bag of words is just like a frequency count of all the words that appear in a tweet. It's called a bag because we ignore the order of the words; we just care about what words are in the tweet. To do this, we can use `scikit-learn`'s `CountVectorizer`. `CountVectorizer` replaces each tweet with a vector (think a list) of counts. Each position in the vector represents a unique word in the corpus. The value of an entry in a vector represents the number of times that word appeared in that tweet. Below, we restrict the length of the vectors to be 5,000 and the counts to be 0 (not in the tweet) and 1 (in the tweet)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " ...,\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0]], dtype=int64)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "countvectorizer = CountVectorizer(max_features=5000, binary=True)\n", "X = countvectorizer.fit_transform(df['clean_text'])\n", "features = X.toarray()\n", "features" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['neutral', 'positive', 'neutral', ..., 'neutral', 'negative',\n", " 'neutral'], dtype=object)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = df['airline_sentiment'].values\n", "response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split into train/test datasets\n", "\n", "We don't want to train our classifier on the same dataset that we test it on, so let's split it into training and test sets." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(features, response, test_size=0.2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classification \n", "\n", "OK, so now that we've turned our data into numbers, we're ready to feed it into a classifier. We're not going to concentrate too much on the code below, but here's the big picture. In the `fit_model` function defined below, we're going to use logistic regression as a classifier to take in the numerical representation of the tweets and spit out whether it's positive, neutral or negative. Then we'll use `test_model` to test the model's performance against our test data and print out some results." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def fit_logistic_regression(X_train, y_train):\n", " model = LogisticRegressionCV(Cs=5, penalty='l1', cv=3, solver='liblinear', refit=True)\n", " model.fit(X_train, y_train)\n", " return model\n", "\n", "def conmat(model, X_test, y_test):\n", " \"\"\"Wrapper for sklearn's confusion matrix.\"\"\"\n", " labels = model.classes_\n", " y_pred = model.predict(X_test)\n", " c = confusion_matrix(y_test, y_pred)\n", " sns.heatmap(c, annot=True, fmt='d', \n", " xticklabels=labels, \n", " yticklabels=labels, \n", " cmap=\"YlGnBu\", cbar=False)\n", " plt.ylabel('Ground truth')\n", " plt.xlabel('Prediction')\n", " \n", "def test_model(model, X_train, y_train):\n", " conmat(model, X_test, y_test)\n", " print('Accuracy: ', model.score(X_test, y_test))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "lr = fit_logistic_regression(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.805327868852459\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXsAAAEFCAYAAAACFke6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcjeX/x/HXmTMbM8MwYxkMxjbGWr5ZWvBDIkvIUolK\nSZF9ZxhDJKUsCZFsWSrGPhqhlETaiEhmbI3MYsYwY5nlnN8fvp1vyjhjued0nPfz8fB4zLnPfd/X\n557l7TrXfd/XbbJarVZEROSu5uboAkRExHgKexERF6CwFxFxAQp7EREXoLAXEXEB7o4u4HoKlH3K\n0SXILUo7PtTRJcht8HTzc3QJcluq5PqOevYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2IiIuQGEvIuIC\nFPYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2IiIuQGEvIuICFPYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2\nIiIuQGEvIuICFPYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2IiIuQGEvIuICFPYiIi5AYS8i4gIU9iIi\nLkBhLyLiAhT2IiIuQGEvIuICFPYiIi5AYX8HzXvrZQb2am173at7c3Zteo0ft03lg+mv4Onpfs36\n/oV9+GXndDq0qmdb1rFNA76NmcK3MVPYvGIMFcuXzLf65Sqr1Ur4qLks+mDjNcvP/HGWZo1fITX1\nvG3ZgZ9j6d41kk4dRtHhsRFsWL8zv8sVO7Zu/YY6dboAkJmZRUTELB55pBft2w9gxowPsVgsDq4w\nfyjs74DQSqXYvGIMHds0sC1r17IuvZ9rQeuuk6jz8DAKeHvQv2era7Z7f1pvCvkVtL0uHliYma+9\nwOPPTaFeixGs+/Rbpr36XD4dhQDExcbTs8cktny6+5rl69d+ybPdxpOYmGpbZrVaGTRgOn36dmTV\nmsnMmTecqVM+5MTxP/K7bMnF8eOnmTJlIVarFYC5cz8mPj6JDRtmERU1jaSkVJYvj3ZwlflDYX8H\nvPzMIyz55AtWb/xfQDzdsSEz5m8iNS0Dq9VKv9ELWL76K9v7I/t34MChkxw8fMq2LDE5jXJ1Xub3\nP1Iwm90oWzqQlNT0fD0WV7di+Rbad2jMIy3/9x93YmIq27d9z+z3hl+zbmZmFr37PM79D9QEoGTJ\nAPyL+JGQkJKvNcv1Xbp0mWHD3mLkyBdsyw4ePErr1g3x8vLEzc2Nhx9uQEzM1w6sMv8o7O+AQRGL\nWBF17cf3ShWCKBZQiHVLRvJtzBTCB3Xi3PmLADRrWJOGDcKY8NYn/9hXdnYOdWpV4Oied3m+azPm\nLIrJl2OQq8LH9qBtu4bXLCtevAjT3xlExUplrlnu5eXJ452a2F5/8vE2Ll68TK3alfOlVrmxiIh3\neeKJloSGlrctq1UrlOjor8jIuERmZhYbNuy45tPa3czQsM/JyeGTTz5hxowZ7Nmzh5QU1+nxeLib\nadawJt36zODBNqMp6u/L+OFPEFwqgNfHduP5Ae9isVivu+0P++MIua833V+ZSdSi4RQuVPC668m/\nx/vz1zP7ndXMmj0Ub29PR5fj8pYt24S7u5lOnZpfs/zFFztSuXJZnnxyGD16jKVOnbB/nEu7Wxka\n9hEREZw+fZpdu3aRkZHBiBEjjGzuX+WPhHOsj9nLhfRLZGXlsGLNTurXqczjrRtQsIAX65aMZPfm\nydSpVYHXRj9Nz24PE1SiCA83qmXbx2c79nHhwiUqlCvhwCORG8nMzGL4kHfYvGkXH64cT2jVco4u\nSYA1a7bx88+/0a5df3r1Gs/ly5m0a9ef1NTz9OjRgQ0bZrFs2esUKVKIsmWDHF1uvjA07E+ePMmA\nAQPw8vKiadOmXLhwwcjm/lXWRO/h8dYN8PbyAKBti/v4fl8sM+ZvonrDgTR4dBQNHh3FD/vjGP3a\nMt7/cCteXh4sfbe/Ldwb3V8Nd3c3Dv8W78hDkRsYMnAG6emXWLo8ktKlizm6HPmvVaveZuPGd1m3\nbibz5o3D29uTdetm8sUXe4mIeBer1UpGxiUWLlxL27b/5+hy84Whn19ycnJsQzfp6em4ubnOKYL3\nlmyhiL8vu6Jfw+zmxk8HjtP31Q9vuM3xk4n0Hj6PFe8Nwmq1knb+Ih2fn8qly5n5VLXcjB9/+JUv\nPv+B8uWDeObp8bblg4Y8yYMP1XZgZZKbjh2bs2/fEdq0eYWcHAtdurSgZcsHHV1WvjBZ/7wmyQB7\n9+5lzJgxJCUlERQURHh4OA888IDd7QqUfcqoksRgaceHOroEuQ2ebn6OLkFuS5Vc3zG0Z+/n50dM\nTAwpKSkUKVIEk8lkZHMiIpILQ8dVpk+fzpNPPsnWrVu5dOmSkU2JiMgNGDqMA5CUlMS6devYunUr\nFStWZNKkSXa30TCO89IwjnPTMI6zy30Yx/AzptnZ2WRmZmKxWDCbzUY3JyIi12HomP0zzzxDZmYm\nnTp1YtGiRRQsqJuDREQcwdCwDw8PJzQ01MgmREQkDwwJ+wkTJhAREUFERITtChyr1YrJZGLlypVG\nNCkiIjdgSNj36dMHgClTpuDh4WFbnpaWZkRzIiJihyEnaK1WK8eOHWP48OFkZWWRmZnJ5cuXiYiI\nMKI5ERGxw5Ce/b59+1i8eDHHjh1j7NixALi5ufHQQw8Z0ZyIiNhh6HX2O3bsoHHjxje9na6zd166\nzt656Tp7Z+eg6RIKFy5MREQEWVlZACQmJrJgwQIjmxQRkesw9KaqyMhI6tWrR3p6OqVKlcLf39/I\n5kREJBeGhn2RIkVo06YNvr6+9OvXj4SEBCObExGRXBga9m5ubvz2229cunSJuLg4XXopIuIghob9\nyJEj+e233+jevTtDhw6lY8eORjYnIiK5MPRqnNOnT1/z2t3dnSJFilxzo9X16Goc56WrcZybrsZx\ndg66Guell14iISGBkJAQjh8/ToECBcjOzmbYsGG0a9fOyKZFROQvDB3GKVOmDJ9++ikfffQRW7Zs\noWbNmmzcuJEPP7zxs1hFROTOMjTsz549S9GiRYGr19wnJyfj7+/vUg8eFxH5NzB0GKd69eoMHjyY\ne+65h59++omwsDCio6MJCAgwslkREfkbwx9LuG3bNuLi4qhSpQqNGzcmLi6OoKAgChQokOs2OkHr\nvHSC1rnpBK2zc9BjCdPT09m/fz9xcXFcuXKFEydOUKFChRsGvYiI3HmGhv3o0aMJDg7mxIkTBAYG\nEh4ebmRzIiKSC0PD/ty5c3Tq1Al3d3fq1KmDxWIxsjkREcmF4ZfFxMbGAnDmzBnMZrPRzYmIyHUY\neoL2yJEjREREEBsbS4UKFRg3bhzVqlWzu51O0DovnaB1bjpB6+wcdIL2l19+IS0tDT8/P5KSkujX\nr5+RzYmISC4Mvc5+/vz5zJ07l6CgICObEREROwwN++DgYMqVK2dkEyIikgeGhr23tzc9e/YkLCwM\nk8kEwODBg41sUkRErsPQsL+Vh40DJMb2vMOVSH45fO4PR5cgt6FmUV9HlyC3wXSD9wwN+w4dOhi5\nexERySNNPyki4gIU9iIiLkBhLyLiAhT2IiIuQGEvIuICFPYiIi7A7qWXWVlZ7Nq1i9TU1GuWt2/f\n3rCiRETkzrIb9gMGDCApKYmKFSva7oIFhb2IiDOxG/ZxcXF8+umn+VGLiIgYxO6YfdmyZTl9+nR+\n1CIiIgbJtWffvXt3TCYTKSkptG3blqpVq17zpKklS5bkS4EiInL7cg17PWhEROTukWvY16tXD4BX\nX32VsWPHXvPeiBEjbO+LiMi/X65hHx4ezqlTpzhw4AC//fabbXlOTg7nz5/Pl+JEROTOyDXse/fu\nTXx8PJMmTaJv37625WazmYoVK+ZLcSIicmeYrFar9UYr5HYlTqlSpQwpCOBC1jbD9i3GOnbhkqNL\nkNtQs2hlR5cgt8FEaK7v2b3Ovlu3bphMJqxWK9nZ2SQnJxMWFsbq1avvaJEiImIcu2G/ffv2a17v\n37+fZcuWGVaQiIjceTc9EVqtWrU4ePCgEbWIiIhB7PbsZ82adc3ro0ePEhAQYFhBIiJy5930A8fr\n1q1L69atjahFREQMYjfs4+PjmTx5cn7UIiIiBrE7Zn/kyBEyMjLyoxYRETGI3Z69yWSiSZMmhISE\n4OXlZVuuidBERJyH3bAfPnx4ftQhIiIGshv2MTExmghNRMTJaSI0EREXoInQRERcgN2J0BxBE6E5\nL02E5tw0EZpzu9FEaDc9XYKIiDgfhb2IiAvIdcw+t3ns/2TkfPbOzmq1Mn7MUipWCqJ7j+akpWXw\n+oQV/Prr7xQo4EXb9g148ukmAHz37a/MmLqG7OwcvLw9GDqqCzVqlnfsAbiwzZ/sZMuaXZhMJkqU\nDuDlkZ1xM5uZ/+Yqjv92Gm9vT5q0qcujnRsCcCHtIh+8HcXvxxLIvJLF4889TONH73PwUQjA668v\nIObTrylc2A+AkJDSTHljEBMnzmPP7v0ULOhNkyb16NvvKdzc7v5+b65h/+c89leuXOHs2bMEBwfj\n5ubGyZMnCQ4OJiYmJj/rdBrHYv9gyqSP+Hn/MSpWagPA21NWUaCgF5+si8BisTCk/3uULh1IgwfD\nGDV0Ae+814+qYcF89cXPRIxaRNTGSMcehIuKPXyKDcu/4M2lQ/DxLcCSmetZOe9TsjKz8S7gxbTl\nw7FYLLw5YiHFgwL4z0PVeHfiCsqUL8GA8d04m3iOId2mUuM/lQgo7u/ow3F5P/54mLfeHkadOmG2\nZTNnLud0fCLrN7yDh4c74yJms3x5NN26tXFgpfkj1//Otm/fzrZt26hbty5Lly5ly5YtfPrpp6xc\nuZLQ0NxPAri6j1d+Sdv299O8xX9syw79cpJWbetjNrvh4eHOQ41qsO2zH/HwcGfztslUDQvGarUS\n/3sy/oV9HFi9a6tYNZiZn4zCx7cAmVeySElOw6+wD3G//k7jR/9j+/nVeSCMbz7fx4W0i+z/9gid\nX3gEgIDi/rz2/gB8CxV08JFIZmYWh36JY+EHa2j3WH/69ZvM6dNJHDx4lFatG+Hl5YmbmxvNHq5P\nTMwuR5ebL+zeVBUbG8t99/3vY2mtWrU4duzYDbe50fshISE3UZ7zGRH+BAB79/xqW1ajZnmiN+zh\nnnsrkpmVxfbPfsTd3QyAu4eZs8nn6dZlMudSM5g89XmH1C1Xubub+XbHz8yd/DHuHu488WJLzqVc\nYMfm7wmtFUJWZja7v/gZd3c3zvyeTJHAQmxcsYMfvzlMVlY2j3X9P0qVLebow3B5iQlnadCgFoMG\nP0NISGk+WLCGV/pMpHnz+9kc/RUtWjyAh4c7Gzd8SVJiqqPLzRd2w75kyZLMmDGDVq1aYbFYWL9+\nPeXLl7/hNhEREdddbjKZXHJOnUHDOjJ9ahRdO79GYGBh6t9flf0/xdneDwgsxObtkzn8y0l695xB\nSMUgypUv4cCKXVu9xjWp17gmW9ftZuLAebz+wSA+fHcDw599G/8AP2rXrcKvPx8nJzuHxNMpFPDx\nZuK8fvxxKpmI3rMoGRxIxarBjj4Ml1YmuCTz5o+zvX7+hQ7Mnv0RLR99iAvpF3nyyeEULuTDo60a\n8uuR444rNB/ZDfs333yTmTNnMnjwYAAeeOABu1MeL1269LrLMzMzb6FE55eRcZn+QzpQ+L9DNIsW\nbKFM2eKkX7jE3j2/0uThewCoWq0slauU4ehvpxX2DvDHqWTOpZwnrHYFAJq0qce8N1Zx+eJlur3S\nFr/CV4dn1i7dTskygRQpVgiA/2tdF4Cg4ECq1grh6C+nFPYO9uvhYxw+fJx27ZvYllmtULCgNz16\ntGfEiKufoKOjv6Jc2SBHlZmv7J6CLly4MGPHjmXDhg1s2LCBUaNG4evrm6edr1y5khYtWtCsWTOa\nNm1K27Ztb7tgZ7T6o6+YO2sjAGeTz7N21de0bHUfbmYTEyKW8tMPsQDEHj3NiWMJuhrHQc6dPc/0\nsR9y/lw6ADtjfqBshZJ8tnY3H83/9Oo6KRfYum43Dz1yLyVKBRASWoYd0Xtt7/3683EqVi3jsGOQ\nq0xubkyaNI/fT50BYMXyzYSGlmPHju8YFzEbq9VKRsYlFi1cR5u2jR1cbf6wewdtVFQUU6ZMsc2H\nY7VaMZlMHDp0yO7O27Zty4IFC5gzZw4tW7Zk8eLFzJ492+52d8MdtJHhS2yXXmZkXCZi1CJOnUwC\nKzzX8xFata0PwPd7jzBjahTZ2RY8PN3pO7Addes77wlwZ7+DNiZqFzGrv8bN7EbRwEK8MPRxChX2\n4Z0Jyznz+1msVisdnmlGo5ZXT8AnnUllwdQoEk6fxWqx0vqJRjTvcL+Dj+LW3U130K5f9znz568m\nJ8dCyZIBTJzUn+LFizIu4l327fuVnBwLnbs8wvPPd3B0qXfMje6gtRv2zZo1Y86cOVSpUuWmG37h\nhRdYsGABw4cP54033qB79+65DvH81d0Q9q7K2cPe1d1NYe+Kbmu6hBIlStxS0AP4+fmxdetWTCYT\nK1eu5Ny5c7e0HxERuT12e/aTJk0iISGBBx988JonVbVv397uztPT0zl58iQBAQEsXLiQJk2aUL9+\nfbvbqWfvvNSzd27q2Tu3G/Xs7V6Nk56ejo+PDz/99NM1y/MS9v379+eDDz4AYOTIkXbXFxERY9gN\ne3uXWd5IoUKF2Lp1KyEhIba5J+72m6pERP6N7IZ906ZNMZlM/1i+bZv9oZazZ8+yePFi22tXvalK\nRMTR7Ib9X6+eyc7O5rPPPsvzzVHPP/88TZr876aG6OjoWyhRRERu1y09qerxxx8nKioq1/c///xz\nfvjhBzZt2kSbNldnk7NYLGzbto3Nmzfb3b9O0DovnaB1bjpB69xu6wTt3r17bV9brVZ+++03rly5\ncsNtqlatyrlz5/Dy8rKN0ZtMJlq3bp3XmkVE5A6y27Pv3r37/1Y2mShSpAg9e/akZs2adnf+5922\nN0s9e+elnr1zU8/eud1Wz/7PMfv09HQsFguFChXKc8MNGza0fX3u3DmCg4PzNIwjIiJ3lt2wP3Xq\nFIMGDeLUqVNYrVZKlSrFtGnT8nQJ5c6dO21fx8fHM2vWrNurVkREbond6RIiIiLo2bMne/bs4dtv\nv6VXr165zld/I6VLlyYuLs7+iiIicsfZ7dmnpqbSsmVL2+tWrVoxZ86cPO188ODBtjH7xMREAgIC\nbrFMERG5HXbD3tPTk4MHD1K9enUADhw4QIECBfK08yeffNL2tZeXFzVq1LjFMkVE5HbYDfvw8HD6\n9euHv78/VquVtLQ0pk2blqedV6tWjfnz55OYmEiTJk3w9/enXLlyt120iIjcnDwN48TExHD8+HEs\nFgshISF4enrmaeejR4+mUaNG7N27l8DAQMLDw/nwww9vu2gREbk5dk/Qvvnmm3h4eFC5cmVCQ0Pz\nHPRw9XLLTp064e7uTp06dbBYLLdVrIiI3Bq7Pfvg4GBGjRpF7dq18fb2ti3PyxTHALGxV5+veubM\nGcxm8y2WKSIit8Nu2BcpUgSAffv2XbM8L2E/ZswYwsPDiY2NZcCAAYwbN+4WyxQRkdtxSxOh5dXa\ntWt57733bHPpmEymPE2NrOkSnJemS3Bumi7Bud3ydAnLly+nWLFiNG/enM6dO5OSkoLZbOb999+n\nbNmydhueP38+c+fOJSgo6OarFhGROybXE7TvvfceW7ZsoVKlSgBcvnyZJUuW8MwzzzB37tw87Tw4\nOJhy5crh6elp+yciIvkv15792rVrWbVqFT4+PgCYzWZKly5N165dadu2bZ527u3tTc+ePQkLC7Pd\nSTt48OA7ULaIiNyMXMPebDbbgh6gd+/eALi5ueW5h964cePbLE9ERO6EXMPeYrGQnp6Or68vAC1a\ntADgwoULed55hw4dbrM8ERG5E3Ids2/bti0jRowgPT3dtiwjI4PRo0fz2GOP5UtxIiJyZ+Qa9r16\n9aJo0aI0bNiQTp060blzZx566CECAgLo0aNHftYoIiK3ye519gkJCezfvx+A6tWrU6pUKcOL0nX2\nzkvX2Ts3XWfv3G7rsYQlSpSgefPmd7QgERHJX3YnQhMREeensBcRcQF2h3EcoYA50NElyC2qWVR3\nSTuzk+lHHF2C3IZyvrmP2atnLyLiAhT2IiIuQGEvIuICFPYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2\nIiIuQGEvIuICFPYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2IiIuQGEvIuICFPYiIi5AYS8i4gIU9iIi\nLkBhLyLiAhT2IiIuQGEvIuICFPYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2IiIuQGEvIuICFPYiIi5A\nYS8i4gIU9iIiLkBhLyLiAtwdXcDdbtmHm1m+7FO8vD2pUKE0Y8b2xGx2Y+yYORyLO43FaqFdu8b0\nfLG9o0uVv/n11+NMnDif9AsXcXNzY/yE3tSoUcn2/uTXFnD8xB+8994YB1Ypf7Xuo51sXPUNmKBU\nmUAGjulEkaJ+ACSeOceA52Yyd8UQChfxAeCnvUeZN20DOTkWChUuyMtD21GxSilHHoJhFPYG2rPn\nAAveX8fylZMoWTKA9eu+JHLcexQvXoQSJQKYPmMIFy9epl3bIdx3XzXuubeKo0uW/7p06Qo9XxjP\nxEmv0LjxfWzbuodhQ6ex+dN3AdgcvZP163dQq7Z+Zv8WRw79zqqlO5i7YjA+fgWYN20Di+fEMDC8\nE59t/I4lc2M4m3Tetn7GhUtMGLaYsW88w731KnPyWCKRQxYyd+UQPD3vvmjUMI6BfjkYR4P7a1Ky\nZAAADzevxxeff8/QYd0ZNrw7AElJ58jMzMLXr6AjS5W/+frrHwkOLknjxvcB0LRZPaZPHwZAbOwp\n3n9/DX1e6eLIEuVvqoSVYeGakfj4FSDzShbJSWkUKlyQs0lp7PriABNnvnDN+vGnkvHx9ebeepUB\nKBtSnII+Xhzaf9wB1RtPYW+gmjUrsWfPAU7HJwGwZs0XZGVlk3YuHXd3MyOGz6T9Y0OoW68aISF3\n50dHZ3X82GkCi/kTPvodOj4+hOd7jCM7J4eMjEsMHzadya/3x8engKPLlL9x9zDz9ecH6Proq/z8\nQxwtHqtLQLHCjJv6HOUqlLxm3dJli3Hp4hW+++ZXAH49eJITsQmkJF9wROmGMzTs09PTmTZtGqNG\njWLLli2cOHHCyOb+de6rW40+fTrTv99UunQaiZvJROHCvnh4XP2IOOWN/uzctYC0tAzmzF7l4Grl\nr7Kzc/hyx/d0eaIFq6Peolu31rzU61VGDJ9Ot+6tqVKlnKNLlFw82KQGq7ZPoHuvRxjVdz4Wi+W6\n6/n4ejP+7R6sXLiNl598i62bvueeupVw9zDnc8X5w9CwHz16NMHBwZw4cYLAwEDCw8ONbO5fJyPj\nEvfVrcaqqCl8vOp1mj9SH4ADB2NJTEwBwMfHm1atH+SXX+IcWar8TbHiRalQoQy1/zsm3+zh+iQn\nn+Pbbw+weNF62rcbyDszV/D9d7/Q68UJDq5W4OqwzIEfj9let2hXj8Q/Ukk/f+m661ssFrwLejF1\nXh/mrhzCK8M7cPr3s5QKDsyvkvOVoWF/7tw5OnXqhLu7O3Xq1Mn1f9i7VWJiKj2ejSQ9/SIAc+es\nplXrB4nZ/A2z312F1WolMzOLmM3fUL9+DQdXK3/VqFEd4uMTOXDgKAB79x6kaNHCfLVzIWvXTWft\nuun06/8U/7mvGvPmRzi4WgFIST7Pa6M/JC01A4Dtm3+gfMWSFPL3ue76JpOJMf3f58gvpwD48rN9\nuLubqVA5KN9qzk+Gn3KOjY0F4MyZM5jNd+fHo9yEhJTihRfb89QT4VgsFurUqUr42BfIzMxiQuR8\n2j82FJMJmjarS/dnWjm6XPmLYsWKMOvdUUwY/x6XLl3Bw9Odd94ZgZeXp6NLk1zUvLcCTz3fjKEv\nzcZsNhNQrBDj3nou1/VNJhOjJj3NtImfkJ2VQ9HAQkS+9Rwmkyn/is5HJqvVajVq50eOHGHs2LHE\nxsZSoUIFxo0bR/Xq1e1ul23ZZ1RJYjCzm8LQmZ1MP+roEuQ2lPNtm+t7hvbsT548yYoVK3Bz00U/\nIiKOZGgKf/PNN7Rr145p06Zx6tQpI5sSEZEbMHQYByAzM5Nt27YRFRVFVlYWixYtsruNhnGcl4Zx\nnJuGcZzbjYZxDB9f2b9/Pzt37uTs2bPcf//9RjcnIiLXYeiYfatWrahatSqdO3dm0qRJRjYlIiI3\nYGjYL1u2jCJFihjZhIiI5IEhYd+/f39mzpxJ27b/HD/auXOnEU2KiMgNGHqC9o8//iAo6H93o8XG\nxlKxYkW72+kErfPSCVrnphO0zi3fr7M/cuQICQkJTJ06leHDh2O1WrFYLLz11lusW7fOiCZFROQG\nDAn78+fPEx0dzdmzZ9m4cSNw9dbkrl27GtGciIjYYegwzsGDB/M0PcLfaRjHeWkYx7lpGMe55fsw\nzoQJE4iIiGDChAn/mFRo5cqVRjQpIiI3YEjPPjk5mcDAQOLj4//xXunSpe1ur56981LP3rmpZ+/c\n8v0O2sDAq5P/X7hwgcTERJKTkxk9ejQnT540ojkREbHD0OkSIiMj8fT0ZM6cOQwaNIhZs2YZ2ZyI\niOTC0LD39PSkcuXKZGVlcc8992iqYxERBzE0fU0mE8OHD6dRo0ZER0fj4eFhZHMiIpILQy+9TElJ\n4eeff6Zx48bs2bOH0NBQ/P397W6nE7TOSydonZtO0Do3hz2pytPTk927d7Ns2TLKly9PaGiokc2J\niEguDB3GGT16NKVKlWLQoEGULl2akSNHGtmciIjkwtCefWpqKt27dwcgLCyMmJgYI5sTEZFcGNqz\nv3LlCklJSQAkJSVhsViMbE5ERHJhaM9+4MCBPPXUU3h4eJCVlcWrr75qZHMiIpILQ3v26enpWCwW\nzGYzVquVnJwcI5sTEZFcGNqznz17Np988gkBAQEkJyfz8ssv89BDDxnZpIiIXIehPXt/f38CAgKA\nq/Pl+Pr6GtmciIjkwtCevY+PDy+88AJ169bl4MGDXL58mbfffhuAwYMHG9m0iIj8haFh//DDD9u+\nLlGihJGw9iF+AAAJoklEQVRNiYjIDRga9h06dDBy9yIikkeahlJExAUo7EVEXIDCXkTEBRg6xbGI\niPw7qGcvIuICFPYiIi5AYS8i4gIU9iIiLkBhLyLiAhT2IiIuQGEvIuICFPb/Ap999hkJCQkkJSUR\nGRnp6HIkD06fPs327dvzvH737t2JjY01sCKx569/X3v37uXw4cMA9O3b14FV5R+F/b/AkiVLSE9P\np1ixYgp7J7F7925++OEHR5chN+Gvf1+rV68mMTERgFmzZjmwqvxj6KyXd5uoqCh27NjB5cuXOXny\nJC+++CLVq1dn4sSJwNWHtbz22mv4+voyfvx4Dhw4QGBgIPHx8cyZM4eLFy/y+uuvk5OTQ2pqKpGR\nkZw/f55Dhw4xYsQI3nzzTUaMGMGECROYNGkSS5cuBeCll15iwIABpKenM23aNMxmM8HBwUyYMAEP\nDw9HfkucVl5/lr/88gsrV65k2rRpADz44IN8+eWXzJs3j8uXL3PvvfeyaNEiihYtSlpaGu+88w5j\nxozhwoULJCYm0rVrV7p27erIQ72rREVFsXXrVjIyMkhNTeWVV17B19eX6dOn4+XlZfu5ZWdnM3Dg\nQKxWK1euXGH8+PH4+fkxePBgIiIi+Oqrrzh48CCVKlWic+fObNiwgaeffpro6GhMJhMTJkzg/vvv\np2zZsv/4nfDz83Pwd+HWKOxvUnp6OgsWLOD48eO8/PLLFCpUiNdee41KlSrxySef8P7771OzZk3O\nnTvHqlWrSElJ4ZFHHgHg6NGjjBgxgtDQUDZs2EBUVBQTJ04kLCyMyMhIW3BXrVqVzMxM4uPj8fDw\nIDU1lbCwMFq2bMny5csJCAhg+vTprFmzhi5dujjy2+HU8vKzfOCBB/6xndlsplevXsTFxdGsWTMW\nLVpEmzZtaN68OQcPHqR169Y88sgjJCQk0L17d4X9HXbp0iUWLlxISkoKnTt3xmQysWLFCkqUKMHi\nxYuZM2cO9evXx9/fnzfeeIOjR49y8eJFW0jXqFGDhg0b0qpVK0qVKgVA0aJFCQ0N5bvvvqN27drs\n2bOH0aNH07Vr13/8TgwaNMiRh3/LFPY3qWrVqgAEBQWRmZlJbGws48ePByArK4vy5cvj4+PDPffc\nA1z9JapQoQIAxYsXZ/bs2Xh7e5ORkXHDxzR26tSJtWvX4unpyeOPP05KSgqJiYkMHDgQgMuXL183\niCTv8vKz/LvcppIKCQkBrj5+c/HixWzZsgVfX1+ys7ONKd6F1a1bFzc3NwIDAylYsCDZ2dm2hyPV\nrVuXt99+m2HDhnH8+HH69OmDu7s7vXv3trvfLl26sGbNGpKSkmjatCnu7u55+p1wFgr7m2Qyma55\nHRISwpQpUyhVqhTff/89SUlJeHl5sW7dOgDS0tI4fvw4AJMmTWLq1KlUrFiRmTNnEh8fb9vn30Ok\nVatWPPfcc7i5ubFgwQIKFixIyZIlmT17Nn5+fmzbto2CBQsaf8B3sbz+LJOSkgCIj48nLS0NADc3\nNywWyz/29cEHH3DPPffQtWtXdu/ezY4dO/LpaFzHwYMHAUhOTubSpUsAJCYmUrx4cb799lvKly/P\nnj17KF68OB988AE//vgjb7/9NpMnT7bt43p/c/fffz9vvvkmCQkJjBs3Drj+74SzUtjfpsjISEaM\nGEF2djYmk4lJkyZRvnx5vvzyS5588kkCAwPx9vbGw8ODxx57jAEDBlCoUCFKlixJamoqAPfeey/D\nhw/n1Vdfte3Xx8eHqlWrkp2dbfsEEB4eTq9evbBarfj4+PDGG2845JjvVtf7WQYHB+Pn50fnzp2p\nWLEiZcqUAaBKlSrMmTOH6tWrX7OPJk2aMHHiRKKjo/Hz88NsNpOZmemIw7lrJScn8+yzz3LhwgUi\nIyNxd3enX79+mEwmChcuzOTJkzGZTAwePJgVK1aQnZ3NK6+8cs0+ateuzdSpU20/T7j6H0CLFi3Y\ntWsXZcuWBa7/O+GsNMWxAWJjYzl8+DCtW7cmNTWVNm3a8Pnnn+Pp6eno0kScWlRUFHFxcQwdOtTR\npTgd9ewNEBQUxNSpU1m8eDE5OTkMHTpUQS8iDqWevYiIC9BNVSIiLkBhLyLiAhT2IiIuQGEvd53f\nf/+dGjVq0K5dO9q3b0/r1q3p0aMHZ86cuaX9RUVFMXLkSABefPFFEhIScl135syZfPfdd8DVS2V/\n/vnnW2pT5E5T2MtdqXjx4qxbt461a9eyadMmatSocc19DLdq/vz5trs1r2fv3r3k5OQAV2+iq1mz\n5m23KXIn6NJLcQn33Xcf27dvp2nTptSqVYtDhw6xfPlyvvrqKxYvXozFYqF69eqMGzcOLy8v1q5d\ny5w5c/D19aV06dK2u5WbNm3KkiVLKFasGOPHj+f777/Hw8ODPn36kJmZyYEDBxgzZgyzZs1i4sSJ\n9O3bl/r16zN37lzWr1+P2WzmwQcfZNiwYfzxxx/07duXypUrc+jQIQICApgxYwb+/v4O/m7J3Ug9\ne7nrZWVlsXnzZurUqQNAo0aNiImJISUlhY8//piVK1eybt06AgICWLBgAQkJCUydOpVly5bx0Ucf\nkZGR8Y99Ll26lIsXL7J582YWLlzIu+++S6tWrahRowYTJ04kNDTUtu6OHTvYvn07UVFRrFmzhhMn\nTrBy5UoADh8+TI8ePdi4cSOFChViw4YN+fNNEZejnr3clRITE2nXrh0AmZmZ1KpViyFDhvD1119T\nu3ZtAPbs2cOJEydsM4dmZWVRrVo1fvzxR+69914CAwMBaNu2Lbt3775m/3v37qVLly64ublRrFgx\nNm3alGstu3fvpnXr1nh7ewPQsWNH1q5dS+PGjQkICKBatWoAVK5c2Tb3jsidprCXu9KfY/bX4+Xl\nBUBOTg6PPvooY8aMASAjI4OcnBy++eabayY5c3f/55/J35edOHGCoKCg67b313396c/ZMP+sBa4/\nOZfInaJhHHFZ9evX57PPPuPs2bNYrVYiIyNZvHgx//nPf9i3bx8JCQlYLBaio6P/sW3dunXZvHkz\nVquVs2fP0q1bNzIzMzGbzbYTtH9q0KABmzZt4vLly2RnZ7N69WoaNGiQX4cpAqhnLy6satWq9O3b\nl2effRaLxUJYWBi9evXCy8uLMWPG8Nxzz1GgQAEqVar0j227du3KxIkTeeyxxwAYO3Ysvr6+NGzY\nkHHjxjFlyhTbuk2aNOHQoUN07NiR7OxsGjZsSLdu3W75UlCRW6G5cUREXICGcUREXIDCXkTEBSjs\nRURcgMJeRMQFKOxFRFyAwl5ExAUo7EVEXMD/AylL3LxQ1ryVAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "test_model(lr, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Challenge\n", "\n", "Use the `fit_random_forest` function below to train a random forest classifier on the training set and test the model on the test set. Which performs better?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def fit_random_forest(X_train, y_train):\n", " model = RandomForestClassifier()\n", " model.fit(X_train, y_train)\n", " return model" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.7414617486338798\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXsAAAEFCAYAAAACFke6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcjeX/x/HXmR0zDDNoMHbGWLIUItJoUfY9ibJlKcKE\n0QxjnUpkZ0iyLyljyZJC8VURpUT60lizzGLGMMx+zu8Pv843ZcyEe47jvJ+PR4+Huc+57+tzz5l5\nz9V1X/d1mywWiwUREXmgOdm6ABERMZ7CXkTEASjsRUQcgMJeRMQBKOxFRByAi60LuJV8pV+0dQly\nh66eGmnrEuQuuDjls3UJclcqZ/uKevYiIg5AYS8i4gAU9iIiDkBhLyLiABT2IiIOQGEvIuIAFPYi\nIg5AYS8i4gAU9iIiDkBhLyLiABT2IiIOQGEvIuIAFPYiIg5AYS8i4gAU9iIiDkBhLyLiABT2IiIO\nQGEvIuIAFPYiIg5AYS8i4gAU9iIiDkBhLyLiABT2IiIOQGEvIuIAFPYiIg5AYS8i4gAU9iIiDkBh\nLyLiABT2IiIOQGEvIuIAFPYiIg7AxdYFPEg+eL8/v/73LNM/2AxA3+7P0KNLEPk83Dj4y0n6j5hP\nenomNQJLMyOiFwW98nP1agpjp6xh17dHbnssyXvbv/yeObPXYHIyUbCgJ+Mn9KN06YcAuHAhnq5d\nwohaP5nChQvauFK5leXLN7Fq1RZMJhP+/n5MnDiQAgXyMW7cPA4fPo7ZbObhhwMYM6Y/Hh7uti7X\ncOrZ3wMBFUuwddUoOrR8zLqtzXN1GdCjGS26RlDn6eHk83DljT7NAfjkw2EsWvUVjz4zgi79pjIz\nohfFixbK9liS91JT0xkZMovpM4cRtW4yQUGP8E7EIgA2rN/Fy93GEBubaOMqJTuHD//ORx+tY/Xq\nyWzaNIeyZf2YMWM5kZFryMrKYsOGmWzcOIu0tHTmz//E1uXmCfXs74H+Lz/L0k++5uz5eOu2lzo0\nZsaCzSQmXQNgUOhC3Fxd8CnsRakSPqxYuxuAmLgkDv92hmea1GT5p7tveSzJe1lZZiwWC8nJ1wG4\nfj0VN3dXYmMT2LljP5Hz36JNq2AbVynZqV69Itu2zcfV1YW0tHRiYhIoVao4detWo2TJ4jg53ejn\nBgaW5/ffz9i42ryhsL8HhoYvBiDo8erWbRXL+1H0p2g2LB2JX/HCfPP9b4S9vZLrKWmcOhtLt45N\nWLrma8qWLkbDulU4ePhUtseSvFeggAfhY17lpRdH4e3thdlsZtmKCRQrVoQZs4bZujzJBVdXF7Zv\n/46wsFm4ubnyxhsvUbZsCevr587FsmTJRiZMeN2GVeYdQ4dxsrKy+OSTT5gxYwb79u0jISHByObu\nK64uzjzVuAbdXpvB4y1DKeLtybgRLwDQsfcU2jWvx/4vJhEe3InPdx4kIz3TxhXLXx07dobIyE/Z\nuGkqX++eT99+7Rgy+H0sFoutS5N/4emnG7Bv30oGDepK797hmM1m4MYwz0svhdCtWwuCgurZuMq8\nYWjYh4eHc/78eb799luuXbtGSEiIkc3dVy7EXGbjtv1cTU4hIyOLVev2UL9OJQCcTCY69p5C3WdD\n6DVkDn7FCxN96qKNK5a/+mbPT9SuHWC9IPti1+f4/fgZLl++auPKJDdOnz7PgQP/m/TQocPTnD8f\nR1JSMps376ZXr9G8+eYr9O/f2YZV5i1Dw/7MmTMMHjwYd3d3mjZtytWrjvOLsm7LPtq3eAwPd1cA\nWjV7lB9+jgZgzruv0rrZowA89kglqgX4s3PPYZvVKv8UWLU8B/YfJT7+MgA7dnxPyVLFNPPGTsTF\nJRIcPJmEhCQAPvtsF5UqlWbfvl+YOPEDFi4cT6tWT9q2yDxm6Jh9VlaWdegmOTnZelHEEcxf+gWF\nvT35dsvbODs58dPhUwycsByA10cuIPK9voQO6cC1a6l0fvV9rqek2bhi+avHHqtOz16t6PnKOFxc\nXShUyJPZs0fYuizJpUcfrUb//p15+eVQnJ2dKVasCHPmhNG7dzgWi4VRo2ZZ31unTiBjxgywYbV5\nw2QxcBBy//79jBo1iri4OPz8/AgLC6Nhw4Y57pev9ItGlSQGu3pqpK1LkLvg4pTP1iXIXamc7SuG\n9uy9vLzYtm0bCQkJFC5cGJPJZGRzIiKSDUPHVaZPn06XLl3Yvn07KSkpRjYlIiK3YegwDkBcXBwb\nNmxg+/btVKhQgYiIiBz30TCO/dIwjn3TMI69y34Yx/ArppmZmaSnp2M2m3F2dja6ORERuQVDx+xf\nfvll0tPT6dixI4sXLyZ//vxGNiciItkwNOzDwsIICAgwsgkREckFQ8J+/PjxhIeHEx4ebp2BY7FY\nMJlMrF692ogmRUTkNgwJ+9deew2ASZMm4erqat2elJRkRHMiIpIDQy7QWiwWTp48yYgRI8jIyCA9\nPZ3U1FTCw8ONaE5ERHJgSM/+559/ZsmSJZw8eZLRo0cD4OTkRKNGjYxoTkREcmDoPPtdu3bRpEmT\nf72f5tnbL82zt2+aZ2/vbLRcQqFChQgPDycjIwOA2NhYFi5caGSTIiJyC4beVDV27Fjq1atHcnIy\nJUqUwNvb28jmREQkG4aGfeHChWnZsiWenp4MGjSImJgYI5sTEZFsGBr2Tk5OHD9+nJSUFE6cOKGp\nlyIiNmJo2I8cOZLjx4/TvXt3hg0bRocOHYxsTkREsmHobJzz58/f9LWLiwuFCxe+6UarW9FsHPul\n2Tj2TbNx7J2NZuP069ePmJgYypUrx6lTp8iXLx+ZmZkMHz6cNm3aGNm0iIj8haHDOKVKleLzzz/n\n448/5osvvqBGjRps2rSJ5cuXG9msiIj8jaFhf+nSJYoUKQLcmHMfHx+Pt7e3Qz14XETkfmDoME61\natUIDg6mVq1a/PTTTwQGBrJlyxZ8fHyMbFZERP7G8McS7tixgxMnTlC5cmWaNGnCiRMn8PPzI1++\n7C8E6QKt/dIFWvumC7T2zkaPJUxOTubQoUOcOHGCtLQ0Tp8+Tfny5W8b9CIicu8ZGvahoaH4+/tz\n+vRpfH19CQsLM7I5ERHJhqFhf/nyZTp27IiLiwt16tTBbDYb2ZyIiGTD8Gkx0dHRAFy8eBFnZ2ej\nmxMRkVsw9ALtsWPHCA8PJzo6mvLlyzNmzBiqVq2a4366QGu/dIHWvukCrb2z0QXaX3/9laSkJLy8\nvIiLi2PQoEFGNiciItkwdJ79ggULmDdvHn5+fkY2IyIiOTA07P39/SlTpoyRTYiISC4YGvYeHh70\n6dOHwMBATCYTAMHBwUY2KSIit2Bo2N/Jw8YBDv3y0j2uRPJKYvppW5cgd8HXo5KtS5C7YLrNa4aG\nfbt27Yw8vIiI5JKWnxQRcQAKexERB6CwFxFxAAp7EREHoLAXEXEACnsREQeQ49TLjIwMvv32WxIT\nE2/a3rZtW8OKEhGReyvHsB88eDBxcXFUqFDBehcsKOxFROxJjmF/4sQJPv/887yoRUREDJLjmH3p\n0qU5f/58XtQiIiIGybZn3717d0wmEwkJCbRq1YoqVarc9KSppUuX5kmBIiJy97INez1oRETkwZFt\n2NerVw+ACRMmMHr06JteCwkJsb4uIiL3v2zDPiwsjLNnz3L48GGOHz9u3Z6VlcWVK1fypDgREbk3\nsg37AQMGcO7cOSIiIhg4cKB1u7OzMxUqVMiT4kRE5N4wWSwWy+3ekN1MnBIlShhSEMDxpE2GHVuM\n5e1utnUJchf08BL7ZiIw29dynGffrVs3TCYTFouFzMxM4uPjCQwMZO3atfe0SBERMU6OYb9z586b\nvj506BArVqwwrCAREbn3/vVCaA8//DBHjhwxohYRETFIjj372bNn3/T177//jo+Pj2EFiYjIvfev\nHzhet25dWrRoYUQtIiJikBzD/ty5c7zzzjt5UYuIiBgkxzH7Y8eOce3atbyoRUREDJJjz95kMhEU\nFES5cuVwd3e3btdCaCIi9iPHsB8xYkRe1CEiIgbKMey3bdumhdBEROycFkITEXEAWghNRMQB5LgQ\nmi1oITT7pYXQ7JsWQrNvt1sI7V8vlyAiIvZHYS8i4gCyHbPPbh37Pxm5nr29+2zNHrau/RZMJvxK\n+TAotBPeRbysr0eMWEyRogUZMLw9AJfikpg+fjWJl65isVjo+HJTgp5/xFblO7Rtm35g5ZJdmEzg\n4eHGkJA2VKnmT68u00lPy8DF1RmAZ5vXoWuPJ0m+mkLrpuMpXa6o9RhvDGtNnXoVbXUK8jfbt+8l\nZMQMfvhxFQDt2weTlpqOq+uN+GvVqgm9+7SzZYl5Ituw/3Md+7S0NC5duoS/vz9OTk6cOXMGf39/\ntm3blpd12o3fj55l3YqvmbXiTQp45mPhjI0sn/85A9/qBMCnS3dy5KcTNH6mlnWfpXO3EFC9DN36\nPUd8bBIDOk+iVt1KFPYtaKvTcEhnTsUyd9pmFq4egm/Rgnz3n6OEBi9lxbrhnP/jEpu+GmsN+z8d\nOXSGmnXKMW1+XxtVLbdz6tR53pu0mD8vTV6/nsrZMxf59rul1rB3FNme7Z/r2A8dOpSXXnqJRx99\nFLixnv2HH36YN9XZoYqB/nyw9i1cXJxJT8vgUlwSxUvcWCX00IHf+fG7//J8+wYkX02x7mM2W7iW\nnIrFYiEtNR0nZydMTiZbnYLDcnV1IWRMJ3yL3vgjW6WqPwnxVzl08CT58rkzfOBCLsVf5dHHKtFv\n0PO4e7hy+OdTXLmSwoBX5pCakk7rjvVp17mhjc9EAFJS0hgxfBojR/Zi2LCpABw6dJz8+fPRr98E\n4uISadigJkODu+Hh4Z7D0exfjn/aoqOjrUEPN9azP3ny5G33ud3r5cqV+xfl2ScXF2e++/oXZkWs\nwcXNhZf6PseluCQ+mLqe8TP7sjXqu5ve/8przQnpO4dvdvxMUmIyvYe0vmnYR/KGX8ki+JUsAoDF\nYmHWlI00erIqGRmZ1KlbgeDQdri4OjP+rZXMm7mFwSPa4OzsxONNAnnl1adJiL/KoD7z8PEtyBNN\nq9v4bGRM+FxeeKEZlQPKWLddv5ZC/frVCR/TD1dXF4YNm8rU95cRGtbHhpXmjRzD/qGHHmLGjBk0\nb94cs9nMxo0bKVu27G33CQ8Pv+V2k8nkMGvqNHiyBg2erMHn6/cSPugDfIt78+rQNhS5xdDMlPAV\ndOgeRPOODTl3Jo63BswloHoZAqqVtkHlknI9nYjwj4m9eJn35/bBq2A+Gj1Zzfp69z5PERa8hMEj\n2tCj3zPW7UWLF6JNx8fYvfOwwt7GVq7YgrOLMx06Ps0ff8RYtzd9qh5Nn/rf3f/9+nXkjUGTFPYA\nkydPZubMmQQHBwPQsGHDHJc8XrZs2S23p6en30GJ9uX82XgSL12hWq3yADzTqh5z3/2UK0nX+HD6\nRgASL13FbDaTkZbBKwNb8OvPJ4mY0x+AkqWLUrteZY4cjFbY28DFC4mEvLGIsuWKMevD/rh7uLLn\n61/x9PKg1iM3PlMsFlxcbozdf7pyD42CqvGQX+EbL/G/18R21q37itTUNNq2GUJGRiapqem0bTOE\nV3q0plSp4tSte+OPt8WCw3xeOYZ9oUKF/rE2Tm6tXr2aRYsWkZmZicViwdXV9YG/sJsQf4XJo5cz\nc3kwhbw9+frzHyld/iFmrxxmfc+KD7ZxJekaA4a3x2Kx4FPMmz07D9Hk2dokXU7m8METPNO6vg3P\nwjFdSbrOoF6RPN/mUXr1f9a6PS72Mks+OMDsjwbg4urM6mW7adqsJgCHDp7i7Ol4hr7VlitJ19m8\nbj9DQtrY6hTk/33y6WTrv//4I4bWrQazfsN0Vq3cynuTFrNs+URcXV1YvHgDzzdvZMNK806OYR8V\nFcWkSZOs6+FYLBZMJhNHjx7N8eArVqxg2bJlREZG8txzz7FkyZK7r/g+V712eV7o8TRv9Y/E2dmJ\nIkULMmpyz2zfbzKZGD2lF/OnrOPjhV9icjLRqcdTVK9dPg+rFoB1a74j5uJldu88zO6dh63bZ3zQ\nj/N/JNCry3SysszUrluBnv8/fDP0rbZMnrCWbu2mkJmZRYcuj1O3QWVbnYLk4IUuzTh7Nob27d4k\nKyuLevWr8/rrL9i6rDyR43IJTz31FJGRkVSu/O9/gHv37s3ChQsZMWIE7733Ht27d892iOevtFyC\n/dJyCfZNyyXYt7taLqF48eJ3FPQAXl5ebN++HZPJxOrVq7l8+fIdHUdERO5Ojj37iIgIYmJiePzx\nx296UlXbtm1zPHhycjJnzpzBx8eHRYsWERQURP36OY9Fq2dvv9Szt2/q2du32/XscxyzT05OpkCB\nAvz00083bc9N2L/xxht89NFHAIwcOTLH94uIiDFyDPucplneTsGCBdm+fTvlypXDyenGiJEj3FQl\nInK/yTHsmzZtisn0z1v3d+zYkePBL126dNMMHEe6qUpE5H6SY9j/dfZMZmYmX375Za5vjurVqxdB\nQUHWr7ds2XIHJYqIyN26oydVtW/fnqioqGxf/+qrr/jxxx/ZvHkzLVu2BMBsNrNjxw62bt2a4/F1\ngdZ+6QKtfdMFWvt2Vxdo9+/fb/23xWLh+PHjpKWl3XafKlWqcPnyZdzd3a1j9CaTiRYtWuS2ZhER\nuYdy7Nl37979f282mShcuDB9+vShRo0aOR78z7tt/y317O2Xevb2TT17+3a7nn2uh3GSk5Mxm80U\nLJj7B2o0avS/NScuX76Mv7+/hnEecAp7+6awt293NYxz9uxZhg4dytmzZ7FYLJQoUYJp06blagrl\nnj17rP8+d+4cs2fPzmXJIiJyL+W4XEJ4eDh9+vRh3759fP/99/Tt2zfb9epvp2TJkpw4ceKOihQR\nkbuTY88+MTGR5557zvp18+bNiYyMzNXBg4ODrWP2sbGx+Pj43GGZIiJyN3IMezc3N44cOUK1ajcW\n+z98+DD58uXL1cG7dOli/be7uzvVq+vpPSIitpBj2IeFhTFo0CC8vb2xWCwkJSUxbdq0XB28atWq\nLFiwgNjYWIKCgvD29qZMmTI57ygiIvdUroZxtm3bxqlTpzCbzZQrVw43N7dcHTw0NJQnnniC/fv3\n4+vrS1hYGMuXL7/rokVE5N/J8QLt5MmTcXV1pVKlSgQEBOQ66OHGdMuOHTvi4uJCnTp1MJs1LU9E\nxBZy7Nn7+/vz1ltvUbNmTTw8PKzbc7PEMUB0dDQAFy9exNnZMR7sKyJyv8kx7AsXLgzAzz//fNP2\n3IT9qFGjCAsLIzo6msGDBzNmzJg7LFNERO7GHS2Ellvr169n/vz51rV0TCZTrpZG1h209kt30No3\n3UFr3+74DtqVK1dStGhRnnnmGTp16kRCQgLOzs58+OGHlC5dOseGFyxYwLx58/Dz8/v3VYuIyD2T\n7QXa+fPn88UXX1CxYkUAUlNTWbp0KS+//DLz5s3L1cH9/f0pU6YMbm5u1v9ERCTvZduzX79+PZ9+\n+ikFChQAwNnZmZIlS9K1a1datWqVq4N7eHjQp08fAgMDrXfSBgcH34OyRUTk38g27J2dna1BDzBg\nwAAAnJycct1Db9KkyV2WJyIi90K2YW82m0lOTsbT0xOAZs2aAXD16tVcH7xdu3Z3WZ6IiNwL2Y7Z\nt2rVipCQEJKTk63brl27RmhoKK1bt86T4kRE5N7INuz79u1LkSJFaNy4MR07dqRTp040atQIHx8f\nevbsmZc1iojIXcpxnn1MTAyHDh0CoFq1apQoUcLwojTP3n5pnr190zx7+3ZPHkuYlxT29kthb98U\n9vbtdmGf40JoIiJi/xT2IiIOIMeF0Gzhofy5exKW3H/yuxS3dQlyF5LST9q6BLkL3m4axhERcWgK\nexERB6CwFxFxAAp7EREHoLAXEXEACnsREQegsBcRcQAKexERB6CwFxFxAAp7EREHoLAXEXEACnsR\nEQegsBcRcQAKexERB6CwFxFxAAp7EREHoLAXEXEACnsREQegsBcRcQAKexERB6CwFxFxAAp7EREH\noLAXEXEACnsREQegsBcRcQAKexERB6CwFxFxAAp7EREHoLAXEXEACnsREQegsBcRcQAuti7gQWSx\nWBg3ahkVKvrRveczZGWZeS/iY348cByAxxtXY/Cw9phMJus+5/6Ip3vnd5n9wSCqVi9jq9LlL479\n9zQRExdyNfk6zk5OjB3Xj/Xrv+bAgV+t74mNSaBoUW/Wb5xmw0oFYOtnB1i++CtMJvDwcOPNt9oR\nEFiKOdM38c3uX3EyOeFfxpeR4Z0pXMST5KspPP9kOGXKFbMeY8iItjxar5INz8I4Cvt77GT0BSZF\nfMwvh05SoWJLALZ8to/Tp2JYvW4UFrOFXt0ms+OLgzzdrA4AaWkZjB65mIyMLFuWLn+RkpJGnz7j\nmTDxNZo0eYQdO75nxPDpbN46y/qec3/E0r3bKN6Z9IYNKxWA0ydjmTV1I0vXvIlv0UJ8s/tXQoYs\nonf/Zvz26x8sXTMMNzcXZk3dyIwpGxj79kscPnSaWo+UZ9YHA2xdfp5Q2N9ja1bvplXbBjzkV8S6\nLSvLTEpKOhnpmZgtZjIysnBz/9+3ftLE1bRq+xgfffC5LUqWW/jmm58o7f8QTZo8AkDTpnUpVarY\nTe8JD4/klR6tCAwsZ4sS5S9c3VwIHfcCvkULARBYzZ9L8VfxL+3LoOBWuLnd+H0LrOrPp6u/AeDQ\nTye5knSdV1+eSWpKOm07NqDDC4/b7ByMprC/x0LCXgBg/77/Wre1atuAHV8c5Pmn3iIr00z9hoE8\n8eTDAKz/9BsyM7No17GRwv4+cvrUBXx9vRkVNof//nYar4L5GTbsZevru3f/yMUL8XTr3tyGVcqf\nSpQsQomSNzpYFouFGZM30DioGnXqVrS+50rSdRbO/4L2nRoC4OzsTOMnq9Gz77Ncir/Ca73n4utb\nkCZP1bDJORjN0LBPTk5mwYIFxMbGEhQUREBAAGXKON549ILIzXgX9uSLXZNIS83gzTfmsXzxdh6t\nV5m1a/7DgiXBti5R/iYjM5Pdu39k0ZJx1KxZmR07vqdfvwh27JyHm5srSxdvos+r7XB2drZ1qfIX\nKdfTGD9qFTExl5kR2c+6/Y+z8Yx44yNq1i5HxxcbAdC7/7PW14sV96ZdpwZ8vfPQAxv2hs7GCQ0N\nxd/fn9OnT+Pr60tYWJiRzd23dm7/iTbtG+Dq6oKnVz5atnmMA98fY/PGfVy7lkqvblPo2uFt4mKT\nGDVyEbu+OmTrkh1esaJFKFe+JDVrVgbgqafqYc4yc/ZsDAkJSRw6dJxmzzW0cZXyVxcvJNKn+0yc\nnJ2Yu/A1vArmA+DA98fp020GzdvUZWR4Z+vEiDUrdnPxQqJ1f4sFXFwe3D/ehob95cuX6dixIy4u\nLtSpUwez2Wxkc/etKoH+fPn5jwBkZmSx+6tDVK9ZjjdHdiJq81hWrg1l5dpQihYrxMR3e9Ik6GEb\nVyyNn6jNuXNxHDkcDcCB/UcwmaBUqWL8+ONvVK9Rgfz5PWxcpfwpKeka/XvOJujpGkRMfhkPDzfg\nxrh8yJCPGBPRlW49gm7a5+eDJ1m+aKd1/8+i9vJ0s9p5XnteMXzMPjr6xi/LxYsXHfZ/eYNDOjL5\n7TV0aDUOZycn6tYPoEevZ3PeUWymaNHCzJ49gvHjF5CSkoqbqyszZo3A3d2N06cvULJksZwPInkm\n6uNvibmQyNc7fuHrHb9Ytxcu7InFAnOmb2LO9E0AlCjpw3szejEstAPvjl9Dl7bvkplpptOLjajf\nMMBWp2A4k8VisRh18GPHjjF69Giio6MpX748Y8aMoVq1ajnudzVjh1ElicHyuxS3dQlyF65mnLF1\nCXIXvN2ynzBgaM/+zJkzrFq1Cicn3agrImJLhqbwd999R5s2bZg2bRpnz541sikREbkNQ4dxANLT\n09mxYwdRUVFkZGSwePHiHPfRMI790jCOfdMwjn273TCO4eMrhw4dYs+ePVy6dIkGDRoY3ZyIiNyC\noWP2zZs3p0qVKnTq1ImIiAgjmxIRkdswNOxXrFhB4cKFjWxCRERywZCwf+ONN5g5cyatWrX6x2t7\n9uwxokkREbkNQy/QXrhwAT8/P+vX0dHRVKhQIcf9dIHWfukCrX3TBVr7lufz7I8dO0ZMTAxTpkxh\nxIgRWCwWzGYz77//Phs2bDCiSRERuQ1Dwv7KlSts2bKFS5cusWnTjVuUTSYTXbt2NaI5ERHJgaHD\nOEeOHMnV8gh/p2Ec+6VhHPumYRz7lufDOOPHjyc8PJzx48ff9JxVgNWrVxvRpIiI3IYhPfv4+Hh8\nfX05d+7cP14rWbJkjvurZ2+/1LO3b+rZ27c8v4PW19cXgKtXrxIbG0t8fDyhoaGcOaMfJBERWzB0\nuYSxY8fi5uZGZGQkQ4cOZfbs2UY2JyIi2TA07N3c3KhUqRIZGRnUqlVLSx2LiNiIoelrMpkYMWIE\nTzzxBFu2bMHV1dXI5kREJBuGTr1MSEjgl19+oUmTJuzbt4+AgAC8vb1z3E8XaO2XLtDaN12gtW82\ne1KVm5sbe/fuZcWKFZQtW5aAgAf3+Y4iIvczQ4dxQkNDKVGiBEOHDqVkyZKMHDnSyOZERCQbhvbs\nExMT6d69OwCBgYFs27bNyOZERCQbhvbs09LSiIuLAyAuLg6z2WxkcyIikg1De/ZDhgzhxRdfxNXV\nlYyMDCZMmGBkcyIikg1De/bJycmYzWacnZ2xWCxkZWUZ2ZyIiGTD0J793Llz+eSTT/Dx8SE+Pp7+\n/fvTqFEjI5sUEZFbMLRn7+3tjY+PD3BjvRxPT08jmxMRkWwY2rMvUKAAvXv3pm7duhw5coTU1FSm\nTp0KQHBwsJFNi4jIXxga9k8//bT138WL685KERFbMTTs27VrZ+ThRUQkl7QMpYiIA1DYi4g4AIW9\niIgDMHSJYxERuT+oZy8i4gAU9iIiDkBhLyLiABT2IiIOQGEvIuIAFPYiIg5AYS8i4gAU9veBL7/8\nkpiYGOLi4hg7dqyty5FcOH/+PDt37sz1+7t37050dLSBFUlO/vr7tX//fn777TcABg4caMOq8o7C\n/j6wdOnMI+nNAAAJDUlEQVRSkpOTKVq0qMLeTuzdu5cff/zR1mXIv/DX36+1a9cSGxsLwOzZs21Y\nVd4xdNXLB01UVBS7du0iNTWVM2fO8Oqrr1KtWjUmTpwI3HhYy9tvv42npyfjxo3j8OHD+Pr6cu7c\nOSIjI7l+/TrvvvsuWVlZJCYmMnbsWK5cucLRo0cJCQlh8uTJhISEMH78eCIiIli2bBkA/fr1Y/Dg\nwSQnJzNt2jScnZ3x9/dn/PjxuLq62vJbYrdy+1n++uuvrF69mmnTpgHw+OOPs3v3bj744ANSU1Op\nXbs2ixcvpkiRIiQlJTFr1ixGjRrF1atXiY2NpWvXrnTt2tWWp/pAiYqKYvv27Vy7do3ExERef/11\nPD09mT59Ou7u7tbPLTMzkyFDhmCxWEhLS2PcuHF4eXkRHBxMeHg4//nPfzhy5AgVK1akU6dOfPbZ\nZ7z00kts2bIFk8nE+PHjadCgAaVLl/7Hz4SXl5eNvwt3RmH/LyUnJ7Nw4UJOnTpF//79KViwIG+/\n/TYVK1bkk08+4cMPP6RGjRpcvnyZTz/9lISEBJ599lkAfv/9d0JCQggICOCzzz4jKiqKiRMnEhgY\nyNixY63BXaVKFdLT0zl37hyurq4kJiYSGBjIc889x8qVK/Hx8WH69OmsW7eOzp072/LbYddy81k2\nbNjwH/s5OzvTt29fTpw4wVNPPcXixYtp2bIlzzzzDEeOHKFFixY8++yzxMTE0L17d4X9PZaSksKi\nRYtISEigU6dOmEwmVq1aRfHixVmyZAmRkZHUr18fb29v3nvvPX7//XeuX79uDenq1avTuHFjmjdv\nTokSJQAoUqQIAQEBHDhwgJo1a7Jv3z5CQ0Pp2rXrP34mhg4dasvTv2MK+3+pSpUqAPj5+ZGenk50\ndDTjxo0DICMjg7Jly1KgQAFq1aoF3PghKl++PADFihVj7ty5eHh4cO3atds+prFjx46sX78eNzc3\n2rdvT0JCArGxsQwZMgSA1NTUWwaR5F5uPsu/y24pqXLlygE3Hr+5ZMkSvvjiCzw9PcnMzDSmeAdW\nt25dnJyc8PX1JX/+/GRmZlofjlS3bl2mTp3K8OHDOXXqFK+99houLi4MGDAgx+N27tyZdevWERcX\nR9OmTXFxccnVz4S9UNj/SyaT6aavy5Urx6RJkyhRogQ//PADcXFxuLu7s2HDBgCSkpI4deoUABER\nEUyZMoUKFSowc+ZMzp07Zz3m30OkefPm9OjRAycnJxYuXEj+/Pl56KGHmDt3Ll5eXuzYsYP8+fMb\nf8IPsNx+lnFxcQCcO3eOpKQkAJycnDCbzf841kcffUStWrXo2rUre/fuZdeuXXl0No7jyJEjAMTH\nx5OSkgJAbGwsxYoV4/vvv6ds2bLs27ePYsWK8dFHH3Hw4EGmTp3KO++8Yz3GrX7nGjRowOTJk4mJ\niWHMmDHArX8m7JXC/i6NHTuWkJAQMjMzMZlMREREULZsWXbv3k2XLl3w9fXFw8MDV1dXWrduzeDB\ngylYsCAPPfQQiYmJANSuXZsRI0YwYcIE63ELFChAlSpVyMzMtP4fQFhYGH379sVisVCgQAHee+89\nm5zzg+pWn6W/vz9eXl506tSJChUqUKpUKQAqV65MZGQk1apVu+kYQUFBTJw4kS1btuDl5YWzszPp\n6em2OJ0HVnx8PK+88gpXr15l7NixuLi4MGjQIEwmE4UKFeKdd97BZDIRHBzMqlWryMzM5PXXX7/p\nGDVr1mTKlCnWzxNu/AFo1qwZ3377LaVLlwZu/TNhr7TEsQGio6P57bffaNGiBYmJibRs2ZKvvvoK\nNzc3W5cmYteioqI4ceIEw4YNs3Updkc9ewP4+fkxZcoUlixZQlZWFsOGDVPQi4hNqWcvIuIAdFOV\niIgDUNiLiDgAhb2IiANQ2MsD548//qB69eq0adOGtm3b0qJFC3r27MnFixfv6HhRUVGMHDkSgFdf\nfZWYmJhs3ztz5kwOHDgA3Jgq+8svv9xRmyL3msJeHkjFihVjw4YNrF+/ns2bN1O9evWb7mO4UwsW\nLLDerXkr+/fvJysrC7hxE12NGjXuuk2Re0FTL8UhPProo+zcuZOmTZvy8MMPc/ToUVauXMl//vMf\nlixZgtlsplq1aowZMwZ3d3fWr19PZGQknp6elCxZ0nq3ctOmTVm6dClFixZl3Lhx/PDDD7i6uvLa\na6+Rnp7O4cOHGTVqFLNnz2bixIkMHDiQ+vXrM2/ePDZu3IizszOPP/44w4cP58KFCwwcOJBKlSpx\n9OhRfHx8mDFjBt7e3jb+bsmDSD17eeBlZGSwdetW6tSpA8ATTzzBtm3bSEhIYM2aNaxevZoNGzbg\n4+PDwoULiYmJYcqUKaxYsYKPP/6Ya9eu/eOYy5Yt4/r162zdupVFixYxZ84cmjdvTvXq1Zk4cSIB\nAQHW9+7atYudO3cSFRXFunXrOH36NKtXrwbgt99+o2fPnmzatImCBQvy2Wef5c03RRyOevbyQIqN\njaVNmzYApKen8/DDD/Pmm2/yzTffULNmTQD27dvH6dOnrSuHZmRkULVqVQ4ePEjt2rXx9fUFoFWr\nVuzdu/em4+/fv5/OnTvj5ORE0aJF2bx5c7a17N27lxYtWuDh4QFAhw4dWL9+PU2aNMHHx4eqVasC\nUKlSJevaOyL3msJeHkh/jtnfiru7OwBZWVk8//zzjBo1CoBr166RlZXFd999d9MiZy4u//w1+fu2\n06dP4+fnd8v2/nqsP/25GuaftcCtF+cSuVc0jCMOq379+nz55ZdcunQJi8XC2LFjWbJkCY888gg/\n//wzMTExmM1mtmzZ8o9969aty9atW7FYLFy6dIlu3bqRnp6Os7Oz9QLtnx577DE2b95MamoqmZmZ\nrF27lsceeyyvTlMEUM9eHFiVKlUYOHAgr7zyCmazmcDAQPr27Yu7uzujRo2iR48e5MuXj4oVK/5j\n365duzJx4kRat24NwOjRo/H09KRx48aMGTOGSZMmWd8bFBTE0aNH6dChA5mZmTRu3Jhu3brd8VRQ\nkTuhtXFERByAhnFERByAwl5ExAEo7EVEHIDCXkTEASjsRUQcgMJeRMQBKOxFRBzA/wGoyLLGBoM8\nEgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# solution\n", "rf = fit_random_forest(X_train, y_train)\n", "test_model(rf, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Challenge\n", "\n", "Use the `test_tweet` function below to test your classifier's performance on a list of tweets. Write your tweets " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def clean_tweets(tweets):\n", " tweets = [re.sub(hashtag_pattern, ' HASHTAG', t) for t in tweets]\n", " tweets = [re.sub(twitter_handle_pattern, 'USER', t) for t in tweets]\n", " return [re.sub(url_pattern, 'URL', t) for t in tweets]\n", "\n", "def test_tweets(tweets, model):\n", " tweets = clean_tweets(tweets)\n", " features = countvectorizer.transform(tweets)\n", " predictions = model.predict(features)\n", " return list(zip(tweets, predictions))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[('lol USER and USER are like soo HASHTAG HASHTAG saw it on URL HASHTAG',\n", " 'neutral'),\n", " ('omg I am never flying on Delta again', 'negative'),\n", " ('I love USER so much HASHTAG', 'positive')]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solution\n", "my_tweets = [\"lol @justinbeiber and @BillGates are like soo #yesterday #amiright saw it on https://twitter.com #yolo\",\n", " 'omg I am never flying on Delta again',\n", " 'I love @VirginAmerica so much #friendlystaff']\n", "\n", "test_tweets(my_tweets, lr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpret \n", "\n", "Now we can interpret the classifier by the features that it found important." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "vocab = [(v,k) for k,v in countvectorizer.vocabulary_.items()]\n", "vocab = sorted(vocab, key=lambda x: x[0])\n", "vocab = [word for num,word in vocab]\n", "coef = list(zip(vocab, lr.coef_[0]))\n", "important = pd.DataFrame(lr.coef_).T\n", "important.columns = lr.classes_\n", "important['word'] = vocab" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
negativeneutralpositiveword
49464.964530-2.968088-3.294442worst
44893.331741-0.843596-0.944500third
47282.919172-0.085944-0.007325unless
37872.867997-2.220034-0.722136ridiculous
44492.712353-1.538292-1.787034terrible
5092.680487-1.033508-0.625724alternate
43382.639722-1.549369-1.465791sucks
322.601016-1.0193970.000000140
19272.574252-1.348797-0.788616forced
39172.5640910.000000-2.122057screwed
\n", "
" ], "text/plain": [ " negative neutral positive word\n", "4946 4.964530 -2.968088 -3.294442 worst\n", "4489 3.331741 -0.843596 -0.944500 third\n", "4728 2.919172 -0.085944 -0.007325 unless\n", "3787 2.867997 -2.220034 -0.722136 ridiculous\n", "4449 2.712353 -1.538292 -1.787034 terrible\n", "509 2.680487 -1.033508 -0.625724 alternate\n", "4338 2.639722 -1.549369 -1.465791 sucks\n", "32 2.601016 -1.019397 0.000000 140\n", "1927 2.574252 -1.348797 -0.788616 forced\n", "3917 2.564091 0.000000 -2.122057 screwed" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "important.sort_values(by='negative', ascending=False).head(10)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
negativeneutralpositiveword
4943-3.8079550.0000004.271496worries
1680-2.042055-1.4641924.088187excellent
3396-2.2381080.0000003.882890pleasure
2620-1.874195-0.4661273.604312kudos
4462-2.593646-1.3045083.432240thank
711-2.232277-1.8419483.316700awesome
1683-2.0616440.0000003.227271exceptional
4929-2.188021-0.7091583.120052wonderful
518-2.387029-0.7925973.021397amazing
4465-1.838381-1.2139822.923657thanks
\n", "
" ], "text/plain": [ " negative neutral positive word\n", "4943 -3.807955 0.000000 4.271496 worries\n", "1680 -2.042055 -1.464192 4.088187 excellent\n", "3396 -2.238108 0.000000 3.882890 pleasure\n", "2620 -1.874195 -0.466127 3.604312 kudos\n", "4462 -2.593646 -1.304508 3.432240 thank\n", "711 -2.232277 -1.841948 3.316700 awesome\n", "1683 -2.061644 0.000000 3.227271 exceptional\n", "4929 -2.188021 -0.709158 3.120052 wonderful\n", "518 -2.387029 -0.792597 3.021397 amazing\n", "4465 -1.838381 -1.213982 2.923657 thanks" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "important.sort_values(by='positive', ascending=False).head(10)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }