{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Preprocessing\n",
    "\n",
    "Like other data types, text data never comes clean. Moreover, most of our downstream methods only accept data structured in a particular way. Because of this, before we do any computational text analysis techniques, we will always need to perform some level of preprocessing. Text data has its own unique kind of preprocessing. In this notebook, we will cover the core preprocessing methods in preparation for our next two weeks:\n",
    "\n",
    "- Reading in files\n",
    "- Character encoding\n",
    "- Tokenization\n",
    "- Sentence segmentation\n",
    "- Removing punctuation\n",
    "- Stripping whitespace\n",
    "- Text normalization\n",
    "- Stop words\n",
    "- Stemming/Lemmatizing\n",
    "- POS tagging\n",
    "- DTM/TF-IDF\n",
    "\n",
    "### Time\n",
    "- Teaching: 50 minutes\n",
    "- Exercises: 60 minutes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading in files\n",
    "\n",
    "The first step is to read in the files containing the data. As we discussed last week, the most common file types for text data are: `.txt`, `.csv`, `.json`, `.html` and `.xml`.\n",
    "\n",
    "#### Reading in `.txt` files\n",
    "\n",
    "Python has built-in support for reading in `.txt` files.\n",
    "\n",
    "- What type of object is `raw`?\n",
    "- How many characters are in `raw`?\n",
    "- Get the first 1000 characters of `raw`?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "DATA_DIR = 'data'\n",
    "fname = 'pride-and-prejudice.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname, encoding='utf-8') as f:\n",
    "    raw = f.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading in `.csv`\n",
    "\n",
    "Python has a built-in module called `csv` for reading in csv files.\n",
    "\n",
    "- What type is `tweets`?\n",
    "- How many entries are in `raw`?\n",
    "- Which entry is the header row?\n",
    "- How can we get the text of the first question?\n",
    "- How can we get a list of the texts of all questions?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "fname = 'trump-tweets.csv'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "tweets = []\n",
    "with open(fname) as f:\n",
    "    reader = csv.reader(f)\n",
    "    tweets = list(reader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading in `.csv` with `pandas`\n",
    "\n",
    "`pandas` is a third-party library that makes working with tabular data much easier. This is the recommended way to read in a `.csv` file.\n",
    "\n",
    "- How many tweets are there?\n",
    "- What happened to the header row?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "fname = 'trump-tweets.csv'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "tweets = pd.read_csv(fname)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Time</th>\n",
       "      <th>Tweet_Text</th>\n",
       "      <th>Type</th>\n",
       "      <th>Media_Type</th>\n",
       "      <th>Hashtags</th>\n",
       "      <th>Tweet_Id</th>\n",
       "      <th>Tweet_Url</th>\n",
       "      <th>twt_favourites_IS_THIS_LIKE_QUESTION_MARK</th>\n",
       "      <th>Retweets</th>\n",
       "      <th>Unnamed: 10</th>\n",
       "      <th>Unnamed: 11</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>16-11-11</td>\n",
       "      <td>15:26:37</td>\n",
       "      <td>Today we express our deepest gratitude to all ...</td>\n",
       "      <td>text</td>\n",
       "      <td>photo</td>\n",
       "      <td>ThankAVet</td>\n",
       "      <td>7.970000e+17</td>\n",
       "      <td>https://twitter.com/realDonaldTrump/status/797...</td>\n",
       "      <td>127213</td>\n",
       "      <td>41112</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>16-11-11</td>\n",
       "      <td>13:33:35</td>\n",
       "      <td>Busy day planned in New York. Will soon be mak...</td>\n",
       "      <td>text</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.970000e+17</td>\n",
       "      <td>https://twitter.com/realDonaldTrump/status/797...</td>\n",
       "      <td>141527</td>\n",
       "      <td>28654</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>16-11-11</td>\n",
       "      <td>11:14:20</td>\n",
       "      <td>Love the fact that the small groups of protest...</td>\n",
       "      <td>text</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.970000e+17</td>\n",
       "      <td>https://twitter.com/realDonaldTrump/status/797...</td>\n",
       "      <td>183729</td>\n",
       "      <td>50039</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Date      Time                                         Tweet_Text  \\\n",
       "0  16-11-11  15:26:37  Today we express our deepest gratitude to all ...   \n",
       "1  16-11-11  13:33:35  Busy day planned in New York. Will soon be mak...   \n",
       "2  16-11-11  11:14:20  Love the fact that the small groups of protest...   \n",
       "\n",
       "   Type Media_Type   Hashtags      Tweet_Id  \\\n",
       "0  text      photo  ThankAVet  7.970000e+17   \n",
       "1  text        NaN        NaN  7.970000e+17   \n",
       "2  text        NaN        NaN  7.970000e+17   \n",
       "\n",
       "                                           Tweet_Url  \\\n",
       "0  https://twitter.com/realDonaldTrump/status/797...   \n",
       "1  https://twitter.com/realDonaldTrump/status/797...   \n",
       "2  https://twitter.com/realDonaldTrump/status/797...   \n",
       "\n",
       "   twt_favourites_IS_THIS_LIKE_QUESTION_MARK  Retweets  Unnamed: 10  \\\n",
       "0                                     127213     41112          NaN   \n",
       "1                                     141527     28654          NaN   \n",
       "2                                     183729     50039          NaN   \n",
       "\n",
       "   Unnamed: 11  \n",
       "0          NaN  \n",
       "1          NaN  \n",
       "2          NaN  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tweets.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Today we express our deepest gratitude to all those who have served in our armed forces. #ThankAVet https://t.co/wPk7QWpK8Z',\n",
       " 'Busy day planned in New York. Will soon be making some very important decisions on the people who will be running our government!',\n",
       " 'Love the fact that the small groups of protesters last night have passion for our great country. We will all come together and be proud!',\n",
       " 'Just had a very open and successful presidential election. Now professional protesters, incited by the media, are protesting. Very unfair!']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tweet_text = list(tweets['Tweet_Text'])\n",
    "tweet_text[:4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading in `.json` files\n",
    "\n",
    "Python has built-in support for reading in `.json` files.\n",
    "\n",
    "- How many questions are there in the dataset?\n",
    "- What data type is each question?\n",
    "- How can we access the question text of the first question?\n",
    "- How can we get a list of the texts of all questions?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "fname = 'jeopardy.json'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    data = json.load(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'air_date': '2004-12-31',\n",
       "  'answer': 'Copernicus',\n",
       "  'category': 'HISTORY',\n",
       "  'question': \"'For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory'\",\n",
       "  'round': 'Jeopardy!',\n",
       "  'show_number': '4680',\n",
       "  'value': '$200'},\n",
       " {'air_date': '2004-12-31',\n",
       "  'answer': 'Jim Thorpe',\n",
       "  'category': \"ESPN's TOP 10 ALL-TIME ATHLETES\",\n",
       "  'question': \"'No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves'\",\n",
       "  'round': 'Jeopardy!',\n",
       "  'show_number': '4680',\n",
       "  'value': '$200'},\n",
       " {'air_date': '2004-12-31',\n",
       "  'answer': 'Arizona',\n",
       "  'category': 'EVERYBODY TALKS ABOUT IT...',\n",
       "  'question': \"'The city of Yuma in this state has a record average of 4,055 hours of sunshine each year'\",\n",
       "  'round': 'Jeopardy!',\n",
       "  'show_number': '4680',\n",
       "  'value': '$200'}]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading in `.html` files\n",
    "\n",
    "The best way to read in `.html` files in Python is with the `BeautifulSoup` package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system (\"lxml\"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.\n",
      "\n",
      "The code that caused this warning is on line 193 of the file /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py. To get rid of this warning, change code that looks like this:\n",
      "\n",
      " BeautifulSoup(YOUR_MARKUP})\n",
      "\n",
      "to this:\n",
      "\n",
      " BeautifulSoup(YOUR_MARKUP, \"lxml\")\n",
      "\n",
      "  markup_type=markup_type))\n"
     ]
    }
   ],
   "source": [
    "from bs4 import BeautifulSoup\n",
    "fname = 'time.html'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    html = f.read()\n",
    "    soup = BeautifulSoup(html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['html', '\\n', '\\n', '\\n', 'Time - Wikipedia']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "texts = soup.findAll(text=True)\n",
    "#texts = soup.getText()\n",
    "texts[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading in `.xml` files\n",
    "\n",
    "We read in `.xml` files using the `ElementTree` module of Python's standard library. We can think of `.xml` files as trees where each branch has a tag name. We can find all the branches with a certain name as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from xml.etree import ElementTree as ET\n",
    "fname = 'books.xml'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "e = ET.parse(fname)\n",
    "root = e.getroot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['An in-depth look at creating applications \\n      with XML.',\n",
       " 'A former architect battles corporate zombies, \\n      an evil sorceress, and her own childhood to become queen \\n      of the world.',\n",
       " 'After the collapse of a nanotechnology \\n      society in England, the young survivors lay the \\n      foundation for a new society.']"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "descriptions = root.findall('*/description')\n",
    "text = [d.text for d in descriptions]\n",
    "text[:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading in multiple files\n",
    "\n",
    "Often, our text data is split across multiple files in a folder. We want to be able to read them all into a single variable.\n",
    "\n",
    "- What type is `austen`?\n",
    "- What type is `fnames` after it is first assigned a value?\n",
    "- What type is `fnames` after it is assigned a second value?\n",
    "- How "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import glob\n",
    "fnames = os.path.join(DATA_DIR, 'austen', '*.txt')\n",
    "fnames = glob.glob(fnames)\n",
    "austen = ''\n",
    "for fname in fnames:\n",
    "    with open(fname) as f:\n",
    "        text = f.read()\n",
    "        austen += text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "Read in all the `.csv` files in the folder `amazon`. Extract out only the text column from each file and store them all in a list called `reviews`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Character encoding\n",
    "\n",
    "Character encoding was more of a problem in Python 2 and early years in general. With Python 3 and most text files being encoded in `UTF-8`, we don't often need to think about it. If you're getting nonsense when reading in a file, try adding `encoding='utf-8'` to the `open` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname = 'dante.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    text = f.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'oglia.\\n\\n  Questi non ciberà terra né peltro,\\n  ma sapïenza, amore e virtute,\\n  e sua nazion sarà tra feltro e feltro.\\n\\n  Di quella umile Italia fia salute\\n  per cui morì la vergine Cammilla,\\n  Eurialo e Turno e Niso di ferute.\\n\\n  Questi la caccerà per ogne villa,\\n  fin che l’avrà rimessa ne lo ’nferno,\\n  là onde ’nvidia prima dipartilla.\\n\\n  Ond’ io per lo tuo me’ penso e discerno\\n  che tu mi segui, e io sarò tua guida,\\n  e trarrotti di qui per loco etterno;\\n\\n  ove udirai le disperate strida,\\n  vedrai li antichi spiriti dolenti,\\n  ch’a la seconda morte ciascun grida;\\n\\n  e vederai color che son contenti\\n  nel foco, perché speran di venire\\n  quando che sia a le beate genti.\\n\\n  A le quai poi se tu vorrai salire,\\n  anima fia a ciò più di me degna:\\n  con lei ti lascerò nel mio partire;\\n\\n  ché quello imperador che là sù regna,\\n  perch’ i’ fu’ ribellante a la sua legge,\\n  non vuol che ’n sua città per me si vegna.\\n\\n  In tutte parti impera e quivi regge;\\n  quivi è la sua città e l’alto seggio:\\n'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text[5000:6000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname = 'akutagawa.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    text = f.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'二人は屍骸の中で、暫、無言のまま、つかみ合った。しか\\nし勝負は、はじめから、わかっている。下人はとうとう、老婆の腕をつかんで、無理に\\nそこへねじ倒した。丁度、鶏（とり）の脚のような、骨と皮ばかりの腕である。\\n\\u3000「何をしていた。さあ何をしていた。云え。云わぬと\\u3000これだぞよ。」\\n\\u3000下人は、老婆をつき放すと、いきなり、太刀の鞘を払って、白い鋼（はがね）の色を\\nその眼の前へつきつけた。けれども、老婆は黙っている。両手をわなわなふるわせて、\\n肩で息を切りながら、眼を、眼球がまぶたの外へ出そうになる程、見開いて、唖のよう\\nに執拗（しゅうね）く黙っている。これを見ると、下人は始めて明白にこの老婆の生死\\nが、全然、自分の意志に支配されていると云う事を意識した。そうして、この意識は、\\n今まではげしく燃えていた憎悪の心を何時（いつ）の間にか冷ましてしまった。後に残っ\\nたのは、唯、或仕事をして、それが円満に成就した時の、安らかな得意と満足とがある\\nばかりである。そこで、下人は、老婆を、見下げながら、少し声を柔げてこう云った。\\n\\u3000「己は検非違使（けびいし）の庁の役人などではない。今し方この門の下を通りかかっ\\nた旅の者だ。だからお前に縄をかけて、どうしようと云うような事はない。唯今時分、\\nこの門の上で、何をしていたのだか、それを己に話さえすればいいのだ。」\\n\\u3000すると、老婆は、見開いた眼を、一層大きくして、じっとその下人の顔を見守った。\\nまぶたの赤くなった、肉食鳥のような、鋭い眼で見たのである。それから、皺で、殆、\\n鼻と一つになった唇を何か物でも噛んでいるように動かした。細い喉で、尖った喉仏の\\n動いているのが見える。その時、その喉から、鴉（からす）の啼くような声が、喘ぎ喘\\nぎ、下人の耳へ伝わって来た。\\n\\u3000「この髪を抜いてな、この女の髪を抜いてな、鬘（かつら）にしようと思うたの\\nじゃ。」\\n\\u3000下人は、老婆の答が存外、平凡なのに失望した。そうして失望すると同時に、又前の\\n憎悪が、冷な侮蔑と一しょに、心の中へはいって来た。すると\\u3000その気色（けしき）が、\\n先方へも通じたのであろう。老婆は、片手に、まだ屍骸の頭から奪（と）った長い抜け\\n毛を持ったなり、蟇（ひき）のつぶやくような声で、口ごもりながら、こんな事を云っ\\nた。\\n\\u3000成程、死人の髪の毛を抜くと云う事は、悪い事かね知れぬ。しかし、こう云う死人の\\n多くは'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text[5000:6000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tokenization\n",
    "\n",
    "Once we've read in the data, our next step is often to split it into words. This step is referred to as \"tokenization\". That's because each occurrence of a word is called a \"token\". Each distinct word used is called a word \"type\". So the word type \"the\" may correspond to multiple tokens of \"the\" in a text.\n",
    "\n",
    "#### Tokenizing by whitespace\n",
    "\n",
    "- What problems do you notice with tokenizing by whitespace?\n",
    "- What type is `text`?\n",
    "- What type is `tokens`?\n",
    "- What type is each element of `tokens`?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "fname = 'example1.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    text = f.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['In',\n",
       " 'this',\n",
       " 'little',\n",
       " 'example,',\n",
       " \"we're\",\n",
       " 'going',\n",
       " 'to',\n",
       " 'see',\n",
       " 'some',\n",
       " 'of']"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text.split()[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tokenizing with regular expressions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['In', 'this', 'little', 'example', 'we', 're', 'going', 'to', 'see', 'some']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import re\n",
    "word_pattern = r'\\w+'\n",
    "tokens = re.findall(word_pattern, text)\n",
    "tokens[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tokenizing with `nltk`\n",
    "\n",
    "[Just a bunch of regular expressions under the hood](https://github.com/nltk/nltk/blob/develop/nltk/tokenize/treebank.py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['In', 'this', 'little', 'example', ',', 'we', \"'re\", 'going', 'to', 'see']"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from nltk.tokenize import word_tokenize\n",
    "tokens = word_tokenize(text)\n",
    "tokens[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Challenge\n",
    "\n",
    "A while ago you read in a bunch of Jane Austen books into a variable called `austen`. Tokenize that using a method of your choice. Find all the unique words types (you might want the `set` function). Sort the resulting set object to create a vocabulary (you might want to use the `sorted` function)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\ufeffThe'"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokens = word_tokenize(austen)\n",
    "tokens[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['\\ufeffThe',\n",
       " 'Project',\n",
       " 'Gutenberg',\n",
       " 'EBook',\n",
       " 'of',\n",
       " 'Emma',\n",
       " ',',\n",
       " 'by',\n",
       " 'Jane',\n",
       " 'Austen']"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokens[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sentence segmentation\n",
    "\n",
    "Sentence segmentation involves identifying the boundaries of sentences.\n",
    "\n",
    "#### Sentence segmentation by splitting on punctuation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\"In this little example, we're going to see some of the problems that regularly appear in tokenization\",\n",
       " \" Tokenization may seem simple, but it's harder than it first appears\",\n",
       " \" Why is it so hard? Punctuations, contractions (like don't, won't and would've) get in the way\",\n",
       " ' \\n']"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text.split('.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could improve on this by using regular expressions. They'll allow us to split strings based on a number of characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\"In this little example, we're going to see some of the problems that regularly appear in tokenization\",\n",
       " \" Tokenization may seem simple, but it's harder than it first appears\",\n",
       " ' Why is it so hard',\n",
       " \" Punctuations, contractions (like don't, won't and would've) get in the way\",\n",
       " ' \\n']"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sent_boundary_pattern = r'[.?!]'\n",
    "re.split(sent_boundary_pattern, text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "The file `example2.txt1` has more punctuation problems. Read it in and see what the problems are. Try your best to modify the code from above to work for as many cases as you can."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Sentence segmentation by `nltk`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\"In this little example, we're going to see some of the problems that regularly appear in tokenization.\",\n",
       " \"Tokenization may seem simple, but it's harder than it first appears.\",\n",
       " 'Why is it so hard?',\n",
       " \"Punctuations, contractions (like don't, won't and would've) get in the way.\",\n",
       " \"We can split text into sentences using punctuation, but unfortunately that's not always going to work.\",\n",
       " \"For example, if I wanted to tell you about Dr. Frankenstein, or Mrs. Doubtfire, we'd be in trouble.\",\n",
       " 'What if I wanted to write about U.C.',\n",
       " 'Berkeley?',\n",
       " 'When you think about it, URLs like www.google.com are troublesome too.',\n",
       " 'How would we settle on a price of $10.50?',\n",
       " 'The main point is that these punctuation characters serve a variety of purposes in writing.',\n",
       " 'Moreover, the functions they serve change depending on the domain (medical vs forum text) and language.']"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from nltk.tokenize import sent_tokenize\n",
    "fname = 'example2.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    text = f.read()\n",
    "sent_tokenize(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Removing punctuation\n",
    "\n",
    "Sometimes (although admittedly less frequently than tokenizing and sentence segmentation), you might want to keep only the alphanumeric characters (i.e. the letters and numbers) and ditch the punctuation. Here's how we can do that.\n",
    "\n",
    "- What type is `punctuation`?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from string import punctuation\n",
    "punctuation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'In this little example were going to see some of the problems that regularly appear in tokenization Tokenization may seem simple but its harder than it first appears Why is it so hard Punctuations contractions like dont wont and wouldve get in the way \\n\\nWe can split text into sentences using punctuation but unfortunately thats not always going to work For example if I wanted to tell you about Dr Frankenstein or Mrs Doubtfire wed be in trouble What if I wanted to write about UC Berkeley When you think about it URLs like wwwgooglecom are troublesome too How would we settle on a price of 1050 The main point is that these punctuation characters serve a variety of purposes in writing Moreover the functions they serve change depending on the domain medical vs forum text and language'"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "no_punct = ''.join([ch for ch in text if ch not in punctuation])\n",
    "no_punct"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Strip whitespace\n",
    "\n",
    "This is an extremely common step. It's simple to perform and nicely pre-packaged in Python. It's particularly common for user-generated text (think survey forms)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname = 'example3.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    text = f.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "This is a text file that has some extra whitespace at the start and end. Whitespace is a catch-all term for spaces, tabs, newlines, and a bunch of other things that computers distinguish but to us all look like spaces, tabs and newlines.\n",
      "\n",
      "\n",
      "The Python method called \"strip\" only catches whitespace at the start and end of a string. But it won't catch it in       the middle,\t\tfor example,\n",
      "\n",
      "in this sentence.\t\tOnce again, regular expressions will\n",
      "\n",
      "help\t\tus    with this.\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is a text file that has some extra whitespace at the start and end. Whitespace is a catch-all term for spaces, tabs, newlines, and a bunch of other things that computers distinguish but to us all look like spaces, tabs and newlines.\n",
      "\n",
      "\n",
      "The Python method called \"strip\" only catches whitespace at the start and end of a string. But it won't catch it in       the middle,\t\tfor example,\n",
      "\n",
      "in this sentence.\t\tOnce again, regular expressions will\n",
      "\n",
      "help\t\tus    with this.\n"
     ]
    }
   ],
   "source": [
    "stripped_text = text.strip()\n",
    "print(stripped_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' This is a text file that has some extra whitespace at the start and end. Whitespace is a catch-all term for spaces, tabs, newlines, and a bunch of other things that computers distinguish but to us all look like spaces, tabs and newlines. The Python method called \"strip\" only catches whitespace at the start and end of a string. But it won\\'t catch it in the middle, for example, in this sentence. Once again, regular expressions will help us with this. '"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "whitespace_pattern = r'\\s+'\n",
    "clean_text = re.sub(whitespace_pattern, ' ', text)\n",
    "clean_text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text normalization\n",
    "\n",
    "Text normalization means making our text fit some standard patterns. Lots of steps come under this wide umbrella, but the most common are:\n",
    "\n",
    "- case folding\n",
    "- removing URLs, digits, hashtags\n",
    "- OOV (removing infequent words)\n",
    "\n",
    "#### Case folding\n",
    "\n",
    "Case folding means dealing with upper and lower cases characters. This is usually done by making all characters lower cased."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Upper and lower case characters can be annoying. Characters are the individual letters and numbers that we see on the page. Case folding is the generic term we use for dealing with upper and lower case characters. Lower case is often what people want. Title Case refers to a multi-word expression with the first character of every word in upper case. '"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fname = 'example4.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname) as f:\n",
    "    text = f.read()\n",
    "text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'list' object has no attribute 'lower'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-77-07e32ce368b4>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m[\u001b[0m\u001b[0;34m'One'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Two'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlower\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m: 'list' object has no attribute 'lower'"
     ]
    }
   ],
   "source": [
    "['One', 'Two'].lower()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "The `lower` method we used above is a string method, that is, it works on strings. But what if you want to lowercase every word in a list (say you've already tokenized the text). Take the list of tokens below and make each one lower case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "tokens = word_tokenize(text)\n",
    "lowercase_tokens = []\n",
    "for token in tokens:\n",
    "    lowercased_version = token.lower()\n",
    "    lowercase_tokens.append(lowercased_version)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Removing URLs, digits and hashtags\n",
    "\n",
    "We rarely care about the exact URL used in a tweet, or the exact number. We could remove them completely (think about how we'd do that), but it's often informative to know that there is a URL or a digit in the text. So we want to replace individual URLs asnd digits with a symbol that preserves the fact that a URL was there. It's standard to just use the strings \"URL\" and \"DIGIT\".\n",
    "\n",
    "How do we do this? Once again, regular expressions save the day."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Today we express our deepest gratitude to all those who have served in our armed forces. #ThankAVet https://t.co/wPk7QWpK8Z'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url_pattern = r'https?:\\/\\/.*[\\r\\n]*'\n",
    "single_tweet = tweet_text[0]\n",
    "single_tweet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Today we express our deepest gratitude to all those who have served in our armed forces. #ThankAVet  URL '"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "URL_SIGN = ' URL '\n",
    "re.sub(url_pattern, URL_SIGN, single_tweet)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Challenge\n",
    "\n",
    "Above we replaced the URL in a single tweet. Now replace all the URLs in all tweets in `tweet_text`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "url_pattern = r'https?:\\/\\/.*[\\r\\n]*'\n",
    "URL_SIGN = ' URL '\n",
    "list_of_url_less_tweets = []\n",
    "for facebook_post in tweet_text:\n",
    "    url_less_tweet = re.sub(url_pattern, URL_SIGN, facebook_post)\n",
    "    list_of_url_less_tweets.append(url_less_tweet)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "list_of_url_less_tweets = [re.sub(url_pattern, URL_SIGN, facebook_post) for facebook_post in tweet_text]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Today we express our deepest gratitude to all those who have served in our armed forces. #ThankAVet  URL ',\n",
       " 'Busy day planned in New York. Will soon be making some very important decisions on the people who will be running our government!',\n",
       " 'Love the fact that the small groups of protesters last night have passion for our great country. We will all come together and be proud!',\n",
       " 'Just had a very open and successful presidential election. Now professional protesters, incited by the media, are protesting. Very unfair!',\n",
       " 'A fantastic day in D.C. Met with President Obama for first time. Really good meeting, great chemistry. Melania liked Mrs. O a lot!',\n",
       " 'Happy 241st birthday to the U.S. Marine Corps! Thank you for your service!!  URL ',\n",
       " 'Such a beautiful and important evening! The forgotten man and woman will never be forgotten again. We will all come together as never before',\n",
       " 'Watching the returns at 9:45pm.\\n#ElectionNight #MAGA__  URL ',\n",
       " 'RT @IvankaTrump: Such a surreal moment to vote for my father for President of the United States! Make your voice heard and vote! #Election2_',\n",
       " 'RT @EricTrump: Join my family in this incredible movement to #MakeAmericaGreatAgain!! Now it is up to you! Please #VOTE for America! https:_',\n",
       " 'RT @DonaldJTrumpJr: FINAL PUSH! Eric and I doing dozens of radio interviews. We can win this thing! GET OUT AND VOTE! #MAGA #ElectionDay ht_',\n",
       " 'Still time to #VoteTrump!\\n#iVoted #ElectionNight  URL ',\n",
       " 'Dont let up, keep getting out to vote - this election is FAR FROM OVER! We are doing well but there is much time left. GO FLORIDA!',\n",
       " 'Just out according to @CNN: \"Utah officials report voting machine problems across entire country\"',\n",
       " 'I will be watching the election results from Trump Tower in Manhattan with my family and friends. Very exciting!',\n",
       " '#ElectionDay  URL ',\n",
       " 'We need your vote. Go to the POLLS! Lets continue this MOVEMENT! Find your poll location:  URL ',\n",
       " 'VOTE TODAY! Go to  URL ',\n",
       " 'TODAY WE MAKE AMERICA GREAT AGAIN!',\n",
       " 'Today we are going to win the great state of MICHIGAN and we are going to WIN back the White House! Thank you MI!_  URL ',\n",
       " 'RT @EricTrump: Sean Hannity: If Hillary wins, you own it  URL ',\n",
       " 'RT @DonaldJTrumpJr: Thanks New Hampshire!!!\\n#NH #NewHampshire #MAGA  URL ',\n",
       " 'RT @detroitnews: .@IvankaTrump in Michigan: This is your movement۪  URL ',\n",
       " 'Unbelievable evening in New Hampshire - THANK YOU! Flying to Grand Rapids, Michigan now.\\nWatch NH rally here:_  URL ',\n",
       " 'Big news to share in New Hampshire tonight! Polls looking great! See you soon.',\n",
       " 'Today in Florida, I pledged to stand with the people of Cuba and Venezuela in their fight against oppression- cont:  URL ',\n",
       " 'Thank you Pennsylvania! Going to New Hampshire now and on to Michigan. Watch PA rally here:  URL ',\n",
       " 'LIVE on #Periscope: Join me for a few minutes in Pennsylvania. Get out &amp; VOTE tomorrow. LETS #MAGA!!  URL ',\n",
       " 'Hey Missouri lets defeat Crooked Hillary &amp; @koster4missouri! Koster supports Obamacare &amp; amnesty! Vote outsider Navy SEAL @EricGreitens!',\n",
       " 'America must decide between failed policies or fresh perspective, a corrupt system or an outsider\\n URL ',\n",
       " 'What I Like About Trump ... and Why You Need to Vote for Him\\n URL ',\n",
       " 'Why Trump  URL ',\n",
       " 'I love you North Carolina- thank you for your amazing support! Get out and  URL Watch:_  URL ',\n",
       " 'On my way!  URL ',\n",
       " 'Just landed in North Carolina- heading to the J.S. Dorton Arena. See you all soon! Lets #MakeAmericaGreatAgain!  URL ',\n",
       " 'Starting tomorrow its going to be #AmericaFirst! Thank you for a great morning Sarasota, Florida!\\nWatch here:_  URL ',\n",
       " 'Thank you for you support Virginia! In ONE DAY - get out and #VoteTrumpPence16! #ICYMI:  URL ',\n",
       " 'Thank you Pennsylvania- I am forever grateful for your amazing support. Lets MAKE AMERICA GREAT AGAIN! #MAGA_  URL ',\n",
       " 'Thank you Michigan! This is a MOVEMENT that will never be seen again- its our last chance to #DrainTheSwamp! Watch_  URL ',\n",
       " 'Our American comeback story begins 11/8/16. Together, we will MAKE AMERICA SAFE &amp; GREAT again for everyone! Watch:_  URL ',\n",
       " 'Thank you Minnesota! It is time to #DrainTheSwamp &amp; #MAGA!\\n#ICYMI- watch:  URL ',\n",
       " 'MONDAY - 11/7/2016\\n\\nScranton, Pennsylvania at 5:30pm.\\n URL Grand Rapids, Michigan at 11pm._  URL ',\n",
       " 'Thank you Iowa - Get out &amp; #VoteTrumpPence16!\\n URL ',\n",
       " 'RT @IvankaTrump: Thank you New Hampshire! __  URL ',\n",
       " 'Van Jones: There Is A Crack in the Blue Wall۪  It Has to Do With Trade:  URL ',\n",
       " 'Great night in Denver, Colorado- thank you! Together, we will MAKE AMERICA GREAT AGAIN!\\n#ICYMI watch rally here:_  URL ',\n",
       " 'RT @DanScavino: Join @realDonaldTrump LIVE in Denver, Colorado via his #Facebook page- we are here!!\\n#MakeAmericaGreatAgain__\\n URL ',\n",
       " 'Thank you Reno, Nevada.\\nNOTHING will stop us in our quest to MAKE AMERICA SAFE AND GREAT AGAIN! #AmericaFirst_  URL ',\n",
       " 'Join me live in Reno, Nevada!\\n URL ',\n",
       " 'JOIN ME TOMORROW!\\nMINNESOTA ۢ 2pm\\n URL MICHIGAN ۢ 6pm\\n URL VIRGINIA ۢ 9:30p_  URL ',\n",
       " '#DrainTheSwamp!\\n URL ',\n",
       " 'Top Clinton Aides Bemoan Campaign All Tactics,۪ No Vision:  URL ',\n",
       " 'Must Act Immediately۪: Clinton Charity Lawyer Told Execs They Were Breaking The Law\\n URL ',\n",
       " 'Watch Coach Mike Ditka- a great guy and supporter tonight at 8pmE on #WattersWorld with @jessebwatters @FoxNews.',\n",
       " 'Thank you Wilmington, North Carolina. We are 3 days away from the CHANGE youve been waiting for your entire life!_  URL ',\n",
       " 'Thank you for the incredible support this morning Tampa, Florida! #ICYMI- watch here:  URL ',\n",
       " 'Join me in Denver, Colorado tonight at 9:30pm:  URL NEW- Scranton, Pennsylvania Monday @ 5:30pm:  URL ',\n",
       " 'MAKE AMERICA GREAT AGAIN!',\n",
       " 'Thank you Hershey, Pennsylvania. Get out &amp; VOTE on November 8th &amp; we will #MAGA! #RallyForRiley\\n#ICYMI, watch here_  URL ',\n",
       " 'Join me in Denver, Colorado tomorrow at 9:30pm!\\nTickets:  URL ',\n",
       " 'Join me live in Hershey, Pennsylvania!\\n#MakeAmericaGreatAgain\\nLIVE:  URL ',\n",
       " 'The only thing that can stop this corrupt machine is YOU. The only force strong enough to save our country is US._  URL ',\n",
       " 'Thank you Ohio! VOTE so we can replace Obamacare and save healthcare for every family in the United States! Watch:_  URL ',\n",
       " 'Join me live in Wilmington, Ohio!\\n URL ',\n",
       " 'If Obama worked as hard on straightening out our country as he has trying to protect and elect Hillary, we would all be much better off!',\n",
       " 'ICE OFFICERS WARN HILLARY IMMIGRATION PLAN WILL UNLEASH GANGS, CARTELS &amp; DRUG VIOLENCE NATIONWIDE_  URL ',\n",
       " 'Thank you NH! We will end illegal immigration, stop the drugs, deport all criminal aliens&amp;save American lives! Watc_  URL ',\n",
       " 'Clinton Aides: Definitely۪ Not Releasing Some HRC Emails:\\n URL ',\n",
       " 'RT @TeamTrump: Mrs. Sauciers son is in prison for having classified info on an unsecured device. @HillaryClinton did FAR WORSE &amp; is runnin_',\n",
       " '\"The Clinton Campaign at Obama Justice\" #DrainTheSwamp\\n URL ',\n",
       " 'Join me today in Wilmington, Ohio at 4pm:  URL Tomorrow- Tampa, Florida at 10am:  URL ',\n",
       " 'There is no challenge too great, no dream outside of our reach! Thank you Selma, North Carolina!\\n#ICYMI, watch here_  URL ',\n",
       " 'RT @PaulaReidCBS: .@CBSNews confirms FBI found emails on #AnthonyWeiner computer, related to Hillary Clinton server, that are \"new\" &amp; not p_',\n",
       " 'Join me in Wilmington, Ohio tomorrow at 4:00pm! It is time to #DrainTheSwamp! Tickets:  URL ',\n",
       " '#CrookedHillary is unfit to serve.  URL ',\n",
       " 'Thank you Concord, North Carolina! When WE win on November 8th, we are going to Washington, D.C. and we are going t_  URL ',\n",
       " 'Watching my beautiful wife, Melania, speak about our love of country and family. We will make you all very proud._  URL ',\n",
       " 'Looking at Air Force One @ MIA. Why is he campaigning instead of creating jobs &amp; fixing Obamacare? Get back to work for the American people!',\n",
       " 'ObamaCare is a total disaster. Hillary Clinton wants to save it by making it even more expensive. Doesnt work, I will REPEAL AND REPLACE!',\n",
       " 'My wife, Melania, will be speaking in Pennsylvania this afternoon. So exciting, big crowds! I will be watching from North Carolina.',\n",
       " 'Thank you Arlene! We will MAKE AMERICA SAFE AND GREAT AGAIN!\\n#ImWithYou #DrainTheSwamp\\n URL ',\n",
       " 'RT @T_Lineberger: Thanks @IvankaTrump for coming to help win Michigan! More people here than a Hillary rally with less than 24 hours notice_',\n",
       " '#MakeAmericaGreatAgain #6Days  URL ',\n",
       " 'Thank you Orlando, Florida! We are just six days away from delivering justice for every forgotten man, woman and ch_  URL ',\n",
       " 'After decades of lies and scandal, Crooked Hillarys corruption is closing in. #DrainTheSwamp!  URL ',\n",
       " 'Clinton camp fumed when surrogate told supporters Clinton planned to betray labor on TPP post-election:\\n URL ',\n",
       " '\"It pays to have friends in high places- like the Justice Department. Clearly the Clintons do.\"\\n#DrainTheSwamp!  URL ',\n",
       " 'Thank you Miami! In 6 days, we are going to WIN the GREAT STATE of FLORIDA - and we are going to win back the White_  URL ',\n",
       " 'Praying for the families of the two Iowa police who were ambushed this morning. An attack on those who keep us safe is an attack on us all.',\n",
       " '\"@PYNance: Evangelical women live at #trumptower @pdpryor1 @CissieGLynch @SaysGabrielle  URL ',\n",
       " '\"@Ravenrantz: #Billygrahams grand daughter #SupportsTrump  URL ',\n",
       " 'Crooked Hillary Clinton deleted 33,000 e-mails AFTER they were subpoenaed by the United States Congress. Guilty - cannot run. Rigged system!',\n",
       " 'I am going to repeal and replace ObamaCare. We will have MUCH less expensive and MUCH better healthcare. With Hillary, costs will triple!',\n",
       " 'You can change your vote in six states. So, now that you see that Hillary was a big mistake, change your vote to MAKE AMERICA GREAT AGAIN!',\n",
       " 'Join me in Florida tomorrow!\\n\\nMIAMIۢ12pm\\n URL ORLANDOۢ4pm\\n URL PENSACOLAۢ7p_  URL ',\n",
       " 'Thank you for your incredible support Wisconsin and Governor @ScottWalker! It is time to #DrainTheSwamp &amp; #MAGA!_  URL ',\n",
       " 'RT @DanScavino: Join @realDonaldTrump LIVE in Wisconsin with Gov. @ScottWalker, @MayorRGiuliani, @Reince &amp; Coach Bobby Knight! LIVE: https:_',\n",
       " 'WikiLeaks emails reveal Podesta urging Clinton camp to dump emails.\\nTime to #DrainTheSwamp!\\n URL ',\n",
       " 'Podesta urged Clinton team to hand over emails after use of private server emerged  URL ',\n",
       " 'Mika Brzezinski: Dem Criticism of Comey Reinforcing Idea There۪s Something There\\n URL ',\n",
       " 'Hillary Advisers Wanted Her To Avoid Supporting Israel When Talking To Democrats:  URL ',\n",
       " 'Trump promises special session to repeal Obamacare:  URL ',\n",
       " 'Kept me out of jail: Top DOJ official involved in Clinton probe represented her campaign chairman:  URL ',\n",
       " '.@DarrellIssa is a very good man. Help him win his congressional seat in California.',\n",
       " '#ICYMI: Governor @mike_pence and I were in Valley Forge, Pennsylvania today. You can watch it here:_  URL ',\n",
       " 'So terrible that Crooked didnt report she got the debate questions from Donna Brazile, if that were me it would have been front page news!',\n",
       " 'JOIN ME TOMORROW IN FLORIDA!\\n\\nMIAMIۢ12pm\\n URL ORLANDOۢ4pm\\n URL PENSACOLAۢ7p_  URL ',\n",
       " 'Crooked Hillary should not be allowed to run for president. She deleted 33,000 e-mails AFTER getting a subpoena from U.S. Congress. RIGGED!',\n",
       " 'Wow, now leading in @ABC /@washingtonpost Poll 46 to 45. Gone up 12 points in two weeks, mostly before the Crooked Hillary blow-up!',\n",
       " 'Look at the way Crooked Hillary is handling the e-mail case and the total mess she is in. She is unfit to be president. Bad judgement!',\n",
       " 'Wow! I hear you Warren, Michigan. Streaming live - join us America. It is time to DRAIN THE SWAMP!\\nWatch:  URL ',\n",
       " '#ObamacareFail #HillarycareFail  URL ',\n",
       " 'Thank you Grand Rapids, Michigan! #ICYMI- watch:  URL ',\n",
       " '$25 Million+ raised online in just one week! RECORD WEEK. #DrainTheSwamp Today we set a bigger record. Contribute &gt; URL ',\n",
       " 'Legendary basketball coach Bobby Knight who has 900+ wins, many championships and a gold medal will be introducing_  URL ',\n",
       " 'Hillarys Two Official Favors To Morocco Resulted In $28 Million For Clinton Foundation #DrainTheSwamp\\n URL ',\n",
       " 'Join me tomorrow in Michigan!\\n\\nGrand Rapids at 12pm:\\n URL Warren at 3pm:\\n URL ',\n",
       " 'Beautiful rally in Albuquerque, New Mexico this evening - thank you. Get out &amp; VOTE! #DrainTheSwamp\\nWatch rally:_  URL ',\n",
       " '\"@slh: I follow Mr.Trump at all of his rallies by watching them on  URL ',\n",
       " 'Thank you Greeley, CO! REAL change means restoring honesty to the govt. Our plan will END govt. corruption! Watch:_  URL ',\n",
       " 'Thank you Las Vegas, Nevada- I love you! Departing for Greeley, Colorado now. Get out &amp; VOTE! #ICYMI- watch here:_  URL ',\n",
       " 'See you tomorrow Michigan!\\n\\nGrand Rapids, MI tomorrow at noon:\\n URL Warren, MI tomorrow at 3pm:_  URL ',\n",
       " 'Wow, Twitter, Google and Facebook are burying the FBI criminal investigation of Clinton. Very dishonest media!',\n",
       " 'Hillary and the Dems loved and praised FBI Director Comey just a few days ago. Original evidence was overwhelming, should not have delayed!',\n",
       " 'We are now leading in many polls, and many of these were taken before the criminal investigation announcement on Friday - great in states!',\n",
       " 'Great day in Colorado &amp; Arizona. Will be in Nevada, Colorado and New Mexico tomorrow - join me!\\nTickets:_  URL ',\n",
       " 'THANK YOU Phoenix, Arizona! Time for new POWERFUL leadership. Just imagine what WE can accomplish in our first 100_  URL ',\n",
       " 'So nice - great Americans outside Trump Tower right now. Thank you!  URL ',\n",
       " 'Departing Golden, CO. for Arizona now - after an unbelievable rally. Watch here:  URL Overflow:  URL ',\n",
       " '#ObamacareFail  URL ',\n",
       " 'Tomorrow!\\n\\nLas Vegas, NV- 11a:  URL Greeley, CO- 4p:  URL Albuquerque, NM- 7p:  URL ',\n",
       " '\"@DeplorableCBTP: \"In my mind, #DonaldTrump is the only way out of this mess.\" - #PhilRobertson of TVs #DuckDynasty\"   Thank you Phil!',\n",
       " 'I am in Colorado - big day planned - but nothing can be as big as yesterday!',\n",
       " 'Join me in Colorado at 12pm tomorrow - or Arizona at 3pm!\\n\\nTICKETS:\\nGolden:  URL Phoenix:_  URL ',\n",
       " 'Thank you Maine, New Hampshire and Iowa. The waiting is OVER! The time for change is NOW! We are going to_  URL ',\n",
       " '\"@piersmorgan: BOMBSHELL: FBI reopening its investigation into HillaryClintons email server after new discovery!',\n",
       " 'Just landed in Iowa - speaking soon!',\n",
       " 'We must not let #CrookedHillary take her CRIMINAL SCHEME into the Oval Office. #DrainTheSwamp  URL ',\n",
       " 'Just out: Neera Tanden, Hillary Clinton adviser said, Israel is depressing.\\u06dd I think Israel is inspiring!',\n",
       " 'RT @DRUDGE_REPORT: WSJ:  The Cold Clinton Reality...  URL ',\n",
       " 'Heading to New Hampshire. Will be talking about the disaster known as ObamaCare!',\n",
       " 'If my people said the things about me that Podesta &amp; Hillarys people said about her, I would fire them out of self respect. \"Bad instincts\"',\n",
       " 'Join me tonight in Cedar Rapids, Iowa at 7pm:  URL Phoenix, Arizona tomorrow night at 3pm:  URL ',\n",
       " '\"@Jmoschetti1363: @Johnatsrs1949 FBI must be outraged that their hands r tied she has no regard or t secret service, FBI, or (Dallas)police\"',\n",
       " 'RT @DRUDGE_REPORT: WSJ:  Grifters-in-Chief...   URL ',\n",
       " 'Thank you Geneva, Ohio.\\nIf I am elected President, I am going to keep RADICAL ISLAMIC TERRORISTS OUT of our countr_  URL ',\n",
       " 'Join @TeamTrump on Facebook &amp; watch tonights rally from Geneva, Ohio- our 3rd rally of the day. #AmericaFirst #MAGA  URL ',\n",
       " 'Crooked Hillary launched her political career by letting terrorists off the hook. #DrainTheSwamp_  URL ',\n",
       " '\"@KeithRowland: People in Arizona just got a taste of Obamacare with a 116% increase in premiums. @realDonaldTrump\" Repeal and replace!',\n",
       " 'I will be interviewed on @oreillyfactor tonight at 8:00 P.M. Enjoy!',\n",
       " 'I delivered a speech in Charlotte, North Carolina yesterday. I appreciate all of the feedback &amp; support. Lets #MAGA_  URL ',\n",
       " 'Join me live in Toledo, Ohio. Time to #DrainTheSwamp &amp; #MAGA!\\n URL ',\n",
       " 'Join me in Cedar Rapids, Iowa tomorrow at 7:00pm! #MAGA\\n URL ',\n",
       " '#ICYMI: I agree- To all Americans, I see you &amp; I hear you. I am your voice. Vote to #DrainTheSwamp with me on 11/8._  URL ',\n",
       " 'Thank you Springfield, Ohio. Get out and #VoteTrumpPence16!\\n#ICYMI - watch here:  URL ',\n",
       " 'Join me live in Springfield, Ohio!\\n URL ',\n",
       " 'Inside Bill Clinton Inc.۪: Hacked memo reveals intersection of charity and personal income. #DrainTheSwamp!\\n URL ',\n",
       " 'JOIN ME! #MAGA\\nTODAY:\\nSpringfield, OH\\nToledo, OH\\nGeneva, OH\\nFRIDAY:\\nManchester, NH\\nLisbon, ME\\nCedar Rapids, IA\\n URL ',\n",
       " 'I agree, @MMFlint- To all Americans, I see you &amp; I hear you. I am your voice. Vote to #DrainTheSwamp w/ me on 11/8.  URL ',\n",
       " '\"Clinton Foundation۪s Fundraisers Pressed Donors to Steer Business to Former President\"\\n URL ',\n",
       " 'Ron Fournier: \"Clinton Used Secret Server To Protect #CircleOfEnrichment\\u06dd\\n URL ',\n",
       " 'WikiLeaks Drip-Drop Releases Prove One Thing: Theres No Nov. 8 Deadline on Clintons Dishonesty and Scandals\\n URL ',\n",
       " 'A lot of call-ins about vote flipping at the voting booths in Texas. People are not happy. BIG lines. What is going on?',\n",
       " 'Obamacare is a disaster. We must REPEAL &amp; REPLACE. Tired of the lies, and want to #DrainTheSwamp? Get out &amp; VOTE_  URL ',\n",
       " 'RT @EricTrump: Tune into @GMA right now to catch a great interview with my father &amp; the entire family! __ #VoteTrumpPence16  URL ',\n",
       " 'Beautiful evening in Kinston, North Carolina - thank you! Get out and VOTE!! You can watch tonights rally here:_  URL ',\n",
       " 'Thank you Charlotte, North Carolina. Great afternoon! #ICYMI - I delivered a speech on urban renewal. Full speech:_  URL ',\n",
       " 'Hillary said she was under sniper fire (while surrounded by USSS.) Turned out to be a total lie. She is not fit to_  URL ',\n",
       " 'JOIN ME IN OHIO TOMORROW!\\nSpringfield-1pm:\\n URL Toledo-4pm:\\n URL Geneva-7pm_  URL ',\n",
       " 'REPEAL AND REPLACE OBAMACARE!',\n",
       " 'Thank you Tallahassee, Florida! A beautiful evening with the MOVEMENT! Get out &amp; VOTE!\\n#ICYMI- watch here:  URL ',\n",
       " 'Thank you Sanford, Florida. Get out &amp; VOTE #TrumpPence16! #ICYMI- watch this afternoons rally here:_  URL ',\n",
       " 'REPEAL AND REPLACE!!!\\n#ObamaCareInThreeWords',\n",
       " 'I have met &amp; spent a lot of time with families @ The Remembrance Project. I will fight for them everyday!_  URL ',\n",
       " 'Get your ballots in Colorado - I will see you soon -- and we will win!\\n#MakeAmericaGreatAgain  URL ',\n",
       " 'Obama Warned Of Rigged Elections In 2008. Time to #DrainTheSwamp\\n URL ',\n",
       " 'Obamacare is a disaster. Rates going through the sky - ready to explode. I will fix it. Hillary cant!\\n#ObamacareFailed',\n",
       " 'Truly honored to receive the first ever presidential endorsement from the Bay of Pigs Veterans Association. #MAGA_  URL ',\n",
       " 'Obamacare is a disaster - as Ive been saying from the beginning. Time to repeal &amp; replace!\\n#ObamacareFail  URL ',\n",
       " '#ObamacareFail  URL ',\n",
       " 'Obamacare is a disaster! Time to repeal &amp; replace! #ObamacareFail\\n URL ',\n",
       " '#ObamacareFail  URL ',\n",
       " '#ObamacareFail  URL ',\n",
       " 'Get out and vote! I am your voice and I will fight for you! We will make America great again!  URL ',\n",
       " 'Key Obamacare premiums to jump 25% next year:\\n URL ',\n",
       " '#Obamacare premiums are about to SKYROCKET --- again. Crooked H will only make it worse. We will repeal &amp; replace!  URL ',\n",
       " 'As election looms, some bad news for Clinton, Democrats:\\n URL ',\n",
       " 'Record crowd in Tampa, Florida- thank you! We will WIN FLORIDA, #DrainTheSwamp in Washington D.C. and MAKE AMERICA_  URL ',\n",
       " 'Thank you Bobby Bowden for the intro tonight and your support! I hope I can do as well for Florida as you have done!  URL ',\n",
       " 'My contract with the American voter will restore honesty, accountability &amp; CHANGE to Washington! #DrainTheSwamp  URL ',\n",
       " 'Peter Navarro: Trump the Bull vs. Clinton the Bear #DrainTheSwamp\\n URL ',\n",
       " 'Democratic operative caught on camera: Hillary PERSONALLY ordered Donald Duck troll campaign that broke the law\\n URL ',\n",
       " 'Join me tomorrow in Sanford or Tallahassee, Florida!\\n\\nSanford at 3pm:\\n URL Tallahassee at 6pm:\\n URL ',\n",
       " 'THANK YOU St. Augustine, Florida! Get out and VOTE! Join the MOVEMENT - and lets #DrainTheSwamp! Off to Tampa now!_  URL ',\n",
       " 'Join me LIVE on my Facebook page in St. Augustine, Florida! Lets #DrainTheSwamp &amp; MAKE AMERICA GREAT AGAIN!_  URL ',\n",
       " 'Honored to receive an endorsement from @SJSOPIO - thank you! Together, we are going to MAKE AMERICA SAFE &amp; GREAT AG_  URL ',\n",
       " 'Hillary Clinton Had Gun Control Supporters Planted In Town Hall Audience  URL ',\n",
       " 'Leaving West Palm Beach, Florida now - heading to St. Augustine for a 3pm rally. Will be in Tampa at 7pm - join me:_  URL ',\n",
       " 'The Clinton Foundation۪s Most Questionable Foreign Donations\\n#PayToPlay #DrainTheSwamp\\n URL ',\n",
       " 'Departing Farmers Round Table in Boynton Beach, Florida. Get out &amp; VOTE- lets #MAGA!\\nEARLY VOTING BY FL. COUNTY:_  URL ',\n",
       " 'Get out to VOTE on 11/8/2016- and we will #DrainTheSwamp!\\nRASMUSSEN NATIONAL\\nTrump 43%\\nClinton 41%  URL ',\n",
       " 'We are winning and the press is refusing to report it. Dont let them fool you- get out and vote! #DrainTheSwamp on November 8th!',\n",
       " 'Why has nobody asked Kaine about the horrible views emanated on WikiLeaks about Catholics? Media in the tank for Clinton but Trump will win!',\n",
       " 'Major story that the Dems are making up phony polls in order to suppress the the Trump . We are going to WIN!',\n",
       " 'Wow, just came out on secret tape that Crooked Hillary wants to take in as many Syrians as possible. We cannot let this happen - ISIS!',\n",
       " 'Clinton Ally Aided Campaign of FBI Official۪s Wife  URL ',\n",
       " 'Clinton Charity Got Up To $56 Million From Nations That Are Anti-Women, Gays #CrookedHillary\\n URL ',\n",
       " 'Thank you Naples, Florida! Get out and VOTE #TrumpPence16 on 11/8.\\nLets #MakeAmericaGreatAgain!\\nFull Naples rally_  URL ',\n",
       " 'The attack on Mosul is turning out to be a total disaster. We gave them months of notice. U.S. is looking so dumb. VOTE TRUMP and WIN AGAIN!',\n",
       " '#CrookedHillary #PayToPlay  URL ',\n",
       " 'Join me in Naples, Florida this evening at 6:00pm! Tickets:  URL ',\n",
       " 'Remember - get out on November 8th &amp; VOTE #TrumpPence16. It is time to #DrainTheSwamp -- this is our last chance!  URL ',\n",
       " 'Former Prosecutor: The Clintons Are So Corrupt, Everything They Touch Turns To Molten Lead۪\\n URL ',\n",
       " 'Thank you Las Vegas Review Journal!\\nEDITORIAL: Donald Trump for president  URL ',\n",
       " 'Well, Iran has done it again. Taken two of our people and asking for a fortune for their release. This doesnt happen if Im president!',\n",
       " '\"If you cant run your own house you certainly cant run the White House\" A statement made by Mrs. Obama about Crooked Hillary Clinton',\n",
       " 'Thank you for the massive turnout tonight- Cleveland, Ohio! Get out &amp; VOTE #TrumpPence16 on 11/8.\\nWatch rally here:_  URL ',\n",
       " 'WikiLeaks: Clinton-Kaine Even Lied About Timing of Veep Pick\\n URL ',\n",
       " 'Trump lays out policies for first 100 days in White House\\n URL ',\n",
       " 'Huma Abedin told Clinton her secret email account caused problems\\n URL ',\n",
       " 'Just arrived in Cleveland, Ohio- join Governor @Mike_Pence and I now, LIVE via:  URL ',\n",
       " 'Hillary Clinton: Architect of failure\\n#DrainTheSwamp #CrookedHillary\\n URL ',\n",
       " 'Unbelievable crowd of supporters in Virginia Beach, Virginia. Thank you! Next stop - Cleveland, Ohio._  URL ',\n",
       " 'In order to #DrainTheSwamp &amp; create a new GOVERNMENT of, by, &amp; for the PEOPLE, I need your VOTE! Go to  URL ',\n",
       " 'Today I introduced my Contract with the American Voter - our economy will be STRONG &amp; our people will be SAFE._  URL ',\n",
       " 'Change has to come from outside our very broken system. #MAGA  URL ',\n",
       " 'Crooked Hillary Clinton Tops Middle East Forum۪s Islamist Money List\\n URL ',\n",
       " 'Thank you Gettysburg, Pennsylvania! #DrainTheSwamp  URL ',\n",
       " 'Landing in Pennsylvania now. Great new poll this morning, thank you. Lets #DrainTheSwamp and #MakeAmericaGreatAgain_  URL ',\n",
       " 'Will be in Cleveland, Ohio w/ @mike_pence tonight- join us:  URL Naples, Florida-tomorrow @ 6pm:  URL ',\n",
       " '\"@AZTRUMPTRAIN: I #Voted for DonaldTrump! #Arizona  #Economy #Immigration #Jobs #Veterans #BorderControl #Trade_  URL ',\n",
       " '\"@jensen4law: Best way to pay Hillary back for what she did to @BernieSanders #DNCleak is a DonaldTrump LANDSLIDE  URL ',\n",
       " 'The media refuses to talk about the three new national polls that have me in first place. Biggest crowds ever - watch what happens!',\n",
       " 'Just returned from Pennsylvania where we will be bringing back their jobs. Amazing crowd. Will be going back tomorrow, to Gettysburg!',\n",
       " 'Governor @Mike_Pence and I will be in Cleveland, Ohio tomorrow night at 7pm - join us! #MAGA\\nTickets:_  URL ',\n",
       " 'Thank you to the great crowd of supporters in Newtown, Pennsylvania. Get out &amp; VOTE on 11/8/16. Lets #MAGA! Watch:_  URL ',\n",
       " '#CrookedHillary sending U.S. intelligence info. to Podesta۪s hacked email is unquestionably an OPSEC violation۪  URL ',\n",
       " 'WikiLeaks reveals Clinton camp۪s work with VERY friendly and malleable reporters۪\\n#DrainTheSwamp #CrookedHillary\\n URL ',\n",
       " 'Donna Brazile Shreds Obama Economy - Acting DNC chair says people are more in despair about how things are  URL ',\n",
       " '\"{Crooked Hillary Clinton} created this mess, and she knows it.\"\\n#DrainTheSwamp  URL ',\n",
       " 'Clinton Campaign And Harry Reid Worked With New York Times To Smear State Dept Watchdog\\nTime to #DrainTheSwamp!\\n URL ',\n",
       " 'VERY IRONIC: \"In 2010 video, Clinton lectured underlings on cybersecurity and guarding sensitive information۪\"\\n URL ',\n",
       " 'Great crowd in Johnstown, Pennsylvania- thank you. Get out &amp; VOTE on 11/8! Watch the MOVEMENT in PA. this afternoon_  URL ',\n",
       " 'A top Clinton Foundation official said he could name 500 different examples\\u06dd of conflicts of interest.\\n URL ',\n",
       " '#CrookedHillary was at center of negotiating $12M commitment from King Mohammed VI of Morocco\\u06dd to Clinton Fdn.  URL ',\n",
       " 'Great crowd in Fletcher, North Carolina- thank you! Heading to Johnstown, Pennsylvania now! Get out on November 8th_  URL ',\n",
       " 'The results are in on the final debate and it is almost unanimous, I WON! Thank you, these are very exciting times.',\n",
       " 'Huma calls it a \"MESS,\" the rest of us call it CORRUPT! WikiLeaks catches Crooked in the act - again.\\n#DrainTheSwamp  URL ',\n",
       " 'Hillary &amp; Obamas Broken Promises.\\n#RepealObamacare  URL ',\n",
       " 'In addition to those without health coverage- those that have disastrous #Obamacare are seeing MASSIVE PREMIUM INCR_  URL ',\n",
       " 'RT @EricTrump: On behalf of the entire family, we would truly be honored to have your vote! Lets #MakeAmericaGreatAgain #EarlyVote https:/_',\n",
       " 'RT @TeamTrump: When Obama took office in 2009 employer-provided premiums cost $13,375. Today they are $18,142. Thanks, Obama.',\n",
       " 'Crooked Hillary promised 200k jobs in NY and FAILED. Well create 25M jobs when Im president, and I will DELIVER!  URL ',\n",
       " 'Crooked took MILLIONS from oppressive ME countries. Will she give the $$$ back? Probably not. Dont forget her slog_  URL ',\n",
       " 'Trump won the third debate\\n URL ',\n",
       " 'UPCOMING RALLIES - JOIN ME!\\n\\nTOMORROW\\nFletcher, NC @ 12pm.\\n URL SATURDAY\\nCleveland, OH @ 7pm.\\n URL ',\n",
       " '#ICYMI - OHIO RALLY!\\nWatch here:  URL ',\n",
       " 'Want access to Crooked Hillary? Dont forget - its going to cost you!\\n#DrainTheSwamp #PayToPlay  URL ',\n",
       " 'Thank you Delaware County, Ohio! Remember- either we WIN this election, or we are going to LOSE this country!_  URL ',\n",
       " 'If elected POTUS - I will stop RADICAL ISLAMIC TERRORISM in this country! In order to do this, we need to_  URL ',\n",
       " 'Why didnt Hillary Clinton announce that she was inappropriately given the debate questions - she secretly used them! Crooked Hillary.',\n",
       " 'Thank you America! #MAGA\\n\\nRasmussen National Poll\\nDonald Trump 43%\\nHillary Clinton 40%  URL ',\n",
       " 'Just landed in Ohio. Thank you America- I am honored to win the final debate for our MOVEMENT. It is time to_  URL ',\n",
       " 'Totally dishonest Donna Brazile chokes on the truth. Highly illegal!\\nWatch:  URL ',\n",
       " 'The Washington Times Presidential Debate Poll:\\nTRUMP 77% (18,290)\\nCLINTON 17% (4,100)\\n#DrainTheSwamp #Debate  URL ',\n",
       " 'Join the MOVEMENT to #MAGA!\\n URL ',\n",
       " 'Great poll - thank you America! Once we #DrainTheSwamp, together we will #MAGA__#Debate  URL ',\n",
       " 'That was really exciting. Made all of my points. MAKE AMERICA GREAT AGAIN!',\n",
       " 'Join my team over on my Facebook page- live now! #Debates\\n URL ',\n",
       " 'The era of division is coming to an end. We will create a new future of #AmericanUnity. First, we need to_  URL ',\n",
       " 'We cannot take four more years of Barack Obama and that۪s what you۪ll get if you vote for Hillary. #BigLeagueTruth',\n",
       " 'I started this campaign to Make America Great Again. That۪s what I۪m going to do. #MAGA #debate',\n",
       " 'HILLARYS HEALTH CARE POLICIES\\n#DrainTheSwamp #Debate  URL ',\n",
       " 'RT @TeamTrump: .@realDonaldTrump is going to cut taxes BIG LEAGUE -- Crooked is going to raise taxes BIG LEAGUE! #DrainTheSwamp #Debate htt_',\n",
       " 'We have to repeal &amp; replace #Obamacare! Look at what is doing to people! #DrainTheSwamp  URL ',\n",
       " 'ISIS has infiltrated countries all over Europe by posing as refugees, and @HillaryClinton will allow it to happen h_  URL ',\n",
       " 'The economy cannot take four more years of these same failed policies.\\n#BigLeagueTruth #DrainTheSwamp  URL ',\n",
       " 'Together we can save American JOBS, American LIVES, and AMERICAN FUTURES! #Debates  URL ',\n",
       " 'USA has the greatest business people in the world but we let political hacks negotiate our deals. We need change! #BigLeagueTruth #Debate',\n",
       " 'I WILL DEFEAT ISIS. THEY HAVE BEEN AROUND TOO LONG! What has our leadership been doing?\\n#DrainTheSwamp  URL ',\n",
       " 'RT @TeamTrump: What They Are Saying About @realDonaldTrumps GREAT Debate and @HillaryClintons Bad Performance\\n URL ',\n",
       " 'After Crooked @HillaryClinton allowed ISIS to rise, she now claims shell defeat them? LAUGHABLE! Heres my plan:  URL ',\n",
       " 'RT @TeamTrump: \"Her instincts are suboptimal.\"\\n URL ',\n",
       " 'I opposed going into Iraq. Hillary voted for it. As with everything else shes supported, it was a DISASTER.  URL ',\n",
       " '#BigLeagueTruth #DrainTheSwamp  URL ',\n",
       " 'Bernie Sanders on HRC: Bad Judgement. John Podesta on HRC: Bad Instincts. #BigLeagueTruth #Debate',\n",
       " 'HILLARY FAILED ALL OVER THE WORLD. #BigLeagueTruth\\nLIBYA\\nSYRIA\\nIRAN\\nIRAQ\\nASIA PIVOT\\nRUSSIAN RESET\\nBENGHAZI_  URL ',\n",
       " 'Hillary says \"take back Mosul?\" We would have NEVER lost Mosul- if it wasnt for #CrookedHillary. #DrainTheSwamp  URL ',\n",
       " 'RT @TeamTrump: LIVE FACT-CHECK: Trumps RIGHT. The Clinton Foundation has taken MILLIONS from the Middle East. #DrainTheSwamp  URL ',\n",
       " 'You should give the money back @HillaryClinton! #DrainTheSwamp  URL ',\n",
       " 'Crooked۪s top aides were MIRED in massive conflicts of interests at the State Dept. We MUST #DrainTheSwamp  URL ',\n",
       " 'Crooked @HillaryClintons foundation is a CRIMINAL ENTERPRISE. Time to #DrainTheSwamp!   URL ',\n",
       " 'Moderator: Respectfully, you won۪t answer the pay-to-play question.\\u06dd #Debate #BigLeagueTruth',\n",
       " '.@HillaryClinton loves to lie. America has had enough of the CLINTONS! It is time to #DrainTheSwamp! Debates  URL ',\n",
       " 'Shell say anything and change NOTHING! #MAGA #BigLeagueTruth  URL ',\n",
       " 'I will do more in the first 30 days in office than Hillary has done in the last 30 years! #Debate #BigLeagueTruth  URL ',\n",
       " 'Crookeds camp incited violence at my rallies. These incidents werent \"spontaneous\" - like she claimed in Benghazi!  URL ',\n",
       " '#CrookedHillary is nothing more than a Wall Street PUPPET! #BigLeagueTruth #Debate  URL ',\n",
       " 'Brought to you by @HillaryClinton &amp; her campaign- in Chicago, Illinois.\\n#BigLeagueTruth #DrainTheSwamp  URL ',\n",
       " 'RT @TeamTrump: .@realDonaldTrump will do more in the first 30 days in office than Hillary has done in the last 30 years! #Debate #BigLeague_',\n",
       " 'Our country is stagnant. We۪ve lost jobs and business. We don۪t make things anymore b/c of the bill Hillary۪s husband signed and she blessed',\n",
       " 'Crooked Hillary has never created a job in her life. We will create 25 million jobs. Think she can do that? Not a c_  URL ',\n",
       " '.@HillaryClinton has been doing this for THIRTY YEARS....where has she been? #BigLeagueTruth',\n",
       " '#CrookedHillary gives Obama an A\\u06dd for an economic recovery that۪s the slowest since WWII... #BigLeagueTruth_  URL ',\n",
       " 'THE CHOICE IS CLEAR!\\n#BigLeagueTruth #DrainTheSwamp  URL ',\n",
       " 'RT @TeamTrump: #CrookedHillarys plan will add $1.15 TRILLION in new taxes. We cannot afford her! #DrainTheSwamp #Debate  URL ',\n",
       " 'I will renegotiate NAFTA. If I can۪t make a great deal, we۪re going to tear it up. We۪re going to get this economy running again. #Debate',\n",
       " 'This is what we can expect from #CrookedHillary. More Taxes. More Spending. #BigLeageTruth #DrainTheSwamp #Debates  URL ',\n",
       " '#BigLeagueTruth  URL ',\n",
       " '.@HillaryClintons tax hikes will CRUSH our economy. I will cut taxes -- BIG LEAGUE.  URL ',\n",
       " '.@HillaryClinton talking about jobs? Remember what she promised upstate New York. #BigLeagueTruth\\n#Debates  URL ',\n",
       " '.@HillaryClinton has been a foreign policy DISASTER for the American people. I will #MakeAmericaStrongAgain #Debate_  URL ',\n",
       " 'Moderator: Hillary plan calls for more regulation and more government spending. #Debate #BigLeagueTruth',\n",
       " '.@HillaryClinton- you have failed, failed, and failed. #BigLeagueTruth\\nTime to #DrainTheSwamp!  URL ',\n",
       " 'Hillary has called for 550% more Syrian immigrants, but won۪t even mention radical Islamic terrorists.\\u06dd #Debate_  URL ',\n",
       " 'Hey @POTUS - WE AGREE!\\n#BigLeagueTruth #DrainTheSwamp  URL ',\n",
       " 'Moderator: Hillary paid $225,000 by a Brazilian bank for a speech that called for open borders.\\u06dd That۪s a quote! #Debate #BigLeagueTruth',\n",
       " 'RT @TeamTrump: .@RealDonaldTrump wants a SAFE America w/ stronger borders, no amnesty, and an END to sanctuary cities. He is #AmericaFirst!_',\n",
       " 'TRUMP &amp; CLINTON ON IMMIGRATION\\n#Debate #BigLeagueTruth  URL ',\n",
       " 'Hillary is too weak to lead on border security-no solutions, no ideas, no credibility.She supported NAFTA, worst deal in US history. #Debate',\n",
       " 'Plain &amp; Simple: We should only admit into this country those who share our VALUES and RESPECT our people.  URL ',\n",
       " 'One of my first acts as President will be to deport the drug lords and then secure the border. #Debate #MAGA',\n",
       " 'Hillary Clinton will use American tax dollars to provide amnesty for thousands of illegals. I will put_  URL ',\n",
       " 'Drugs are pouring into this country. If we have no border, we have no country. That۪s why ICE endorsed me. #Debate #BigLeagueTruth',\n",
       " 'RT @TeamTrump: .@realDonaldTrump is PRO-LIFE, PRO-FAMILY #BigLeagueTruth #Debates2016  URL ',\n",
       " '#SecondAmendment #2A\\n#Debates  URL ',\n",
       " 'It is so imperative that we have the right justices. #DrainTheSwamp #Debates#BigLeagueTruth  URL ',\n",
       " '.@HillaryClinton lists litany of ways she plans to restrict gun rights. 2A will not survive a Hillary presidency. #Debate #BigLeagueTruth',\n",
       " 'RT @TeamTrump: .@realDonaldTrump will PROTECT and DEFEND the Constitution #Debate #BigLeagueTruth #DrainTheSwamp  URL ',\n",
       " 'The 2nd Amendment is under siege. We need SCOTUS judges who will uphold the US Constitution. #Debate #BigLeagueTruth',\n",
       " 'Hillary Clinton wants to create the most liberal Supreme Court in history #debate #DrainTheSwamp  URL ',\n",
       " 'Ready to lead. Ready to Make America Great Again. #Debate #MAGA',\n",
       " 'This is an incredible MOVEMENT- WE are going to take our country BACK! #November8th #BigLeagueTruth #Debate  URL ',\n",
       " 'Tune in at  URL ',\n",
       " 'I will be handing over my Twitter account to my team of deplorables for tonights #debate\\n#MakeAmericaGreatAgain',\n",
       " 'UNBELIEVABLE!\\nClinton campaign contractor caught in voter-fraud video is a felon who visited White House 342 times:  URL ',\n",
       " 'Over 250,000 to Lose Health Insurance in Battleground North Carolina Due to #Obamacare\\n URL ',\n",
       " 'Join my team tonight at 8:30pmE!\\n URL ',\n",
       " 'I will issue a lifetime ban against senior executive branch officials lobbying on behalf of a FOREIGN GOVERNMENT!_  URL ',\n",
       " 'I am going to expand the definition of LOBBYIST - so we close all the LOOPHOLES! #DrainTheSwamp  URL ',\n",
       " 'Obamacare premiums increasing 33% in Pennsylvania - a complete disaster. It must be repealed and replaced!_  URL ',\n",
       " 'Hillary Clinton Deleted Emails With Her Email Server Technician\\n URL ',\n",
       " 'Join me in Delaware, Ohio tomorrow at 12:30pm! #DrainTheSwamp\\nTickets:  URL ',\n",
       " 'Top Hillary Adviser Mocked, Plotted Attacks on Pro-Sanders Civil Rights Leader #DrainTheSwamp\\n URL ',\n",
       " 'Dem Operative Who Oversaw Trump Rally Agitators Visited White House 342 Times #DrainTheSwamp\\n URL ',\n",
       " 'State works hard, and illegally, for Clinton #DrainTheSwamp  URL ',\n",
       " 'Scandals surround Clintons gatekeeper at State\\n#DrainTheSwamp  URL ',\n",
       " 'The State Departments shadow government #DrainTheSwamp\\n URL ',\n",
       " 'More Anti-Catholic Emails From Team Clinton:  URL ',\n",
       " 'It is time to #DrainTheSwamp!\\n URL ',\n",
       " 'Food Groups۪  Emails Show Clinton Campaign Organized Potential VPs By Race And Gender:  URL ',\n",
       " '#DrainTheSwamp\\n URL ',\n",
       " 'Time to #DrainTheSwamp in Washington, D.C. and VOTE #TrumpPence16 on 11/8/2016. Together, we will MAKE AMERICA SAFE_  URL ',\n",
       " 'Hillary is the most corrupt person to ever run for the presidency of the United States. #DrainTheSwamp  URL ',\n",
       " 'Clinton Campaign Tried to Limit Damage From Classified Info on Email Server #DrainTheSwamp\\n URL ',\n",
       " 'Trump rally disrupter was once on Clintons payroll\\n URL ',\n",
       " 'RT @TeamTrump: It is time to #DrainTheSwamp in Washington, D.C! Vote Nov. 8th to take down the #RIGGED system!  URL ',\n",
       " 'Thank you Colorado Springs. If I۪m elected President I am going to keep Radical Islamic Terrorists out of our count_  URL ',\n",
       " 'FL, KS, ME, MD, MN, NJ, OR &amp; WV! Its the LAST DAY to mail in voter reg forms. Get the forms at_  URL ',\n",
       " 'RT @EricTrump: Nevada: A quick reminder that today is your last day to register to vote!  URL ',\n",
       " 'Hillary۪s Aides Urged Her to Take Foreign Lobbyist Donation And Deal With Attacks:  URL ',\n",
       " 'If we let Crooked run the govt, history will remember 2017 as the year America lost its independence. #DrainTheSwamp  URL ',\n",
       " 'Pay-to-play. Collusion. Cover-ups. And now bribery? So CROOKED. I will #DrainTheSwamp.  URL ',\n",
       " 'I will Make Our Government Honest Again -- believe me. But first, Im going to have to #DrainTheSwamp in DC.  URL ',\n",
       " '\"@THEREALMOGUL: 41% of American voters believe the  election could be \"stolen\" from DonaldTrump due to widespread voter fraud. - Politico\"',\n",
       " 'Great night in WI. I۪m going to fight for every person in this country who believes government should serve the PEO_  URL ',\n",
       " 'Donald J. Trump Ethics Reform Plan For Washington D.C.\\n URL ',\n",
       " 'EXCLUSIVE: FBI Agents Say Comey Stood In The Way۪ Of Clinton Email Investigation:\\n URL ',\n",
       " 'Get rich quick! Crooked Hillary Clintons pay to play guide:  URL ',\n",
       " 'Yet more evidence of a media-rigged election:  URL ',\n",
       " 'My wife, Melania, will be interviewed tonight at 8:00pm by Anderson Cooper on @CNN. I have no doubt she will do very well. Enjoy!',\n",
       " 'I will sign the first bill to repeal #Obamacare and give Americans many choices and much lower rates!',\n",
       " 'Trump Virginia Office Announces Statewide TV Ad Strategy and Leadership Team:  URL ',\n",
       " 'Join me in Colorado Springs, Colorado tomorrow at 1:00pm! #MAGA\\nTickets:  URL ',\n",
       " 'Crooked Hillary colluded w/FBI and DOJ and media is covering up to protect her. Its a #RiggedSystem! Our country d_  URL ',\n",
       " 'New polls are good because the media has deceived the public by putting women front and center with made-up stories and lies, and got caught',\n",
       " '\"Journalists shower Hillary Clinton with campaign cash\"\\n URL ',\n",
       " '\"State Department official accused of offering quid pro quo in Clinton email scandal\"  URL ',\n",
       " 'RT @TeamTrump: Our thoughts are with the forces fighting ISIS in Iraq. We must never back down against this extreme radical Islamic terrori_',\n",
       " 'Wow, new polls just came out from @CNN   Great numbers, especially after total media hit job. Leading Ohio 48 - 44.',\n",
       " 'RT @TeamTrump: CORRUPTION CONFIRMED: FBI confirms State Dept. offered quid pro quo to cover up classified emails\\n URL ',\n",
       " 'Voter fraud! Crooked Hillary Clinton even got the questions to a debate, and nobody says a word. Can you imagine if I got the questions?',\n",
       " 'Unbelievable.  URL ',\n",
       " 'RT @TeamTrump: __BREAKING__: \"State Departments Kennedy pressured FBI to unclassify Clinton emails: FBI documents\"\\n URL ',\n",
       " 'WikiLeaks proves even the Clinton campaign knew Crooked mishandled classified info, but no one gets charged? RIGGED!  URL ',\n",
       " 'We have all got to come together and win this election. We cant have four more years of Obama (or worse!).',\n",
       " 'Of course there is large scale voter fraud happening on and before election day. Why do Republican leaders deny what is going on? So naive!',\n",
       " '\"@RosieGray: Peter Thiel chooses now to give $1.25mil in support of Trump  URL ',\n",
       " '\"@PrisonPlanet: Trump accuser praised him in an email as recently as April! This is all yet another hoax.  URL ',\n",
       " '\"@MarkSimoneNY: Watch Joe Bidens Long History Of Grabbing, Kissing and Groping Women Who Are Cringing:  URL ',\n",
       " 'Cant believe these totally phoney stories, 100% made up by women (many already proven false) and pushed big time by press, have impact!',\n",
       " 'ALL SAFE IN ORANGE COUNTY, NORTH CAROLINA. With you all the way, will never forget. Now we have to win. Proud of you all!  @NCGOP',\n",
       " 'Animals representing Hillary Clinton and Dems in North Carolina just firebombed our office in Orange County because we are winning @NCGOP',\n",
       " 'Wow, interview released by Wikileakes shows \"quid pro quo\" in Crooked Hillary e-mail probe.Such a dishonest person - &amp; Paul Ryan does zilch!',\n",
       " 'Finally, in the new ABC News/Washington Post Poll, Hillary Clinton is down 11 points with WOMEN VOTERS and the election is close at 47-43!',\n",
       " 'Paul Ryan, a man who doesnt know how to win (including failed run four years ago), must start focusing on the budget, military, vets etc.',\n",
       " 'The Democrats have a corrupt political machine pushing crooked Hillary Clinton. We have Paul Ryan, always fighting the Republican nominee!',\n",
       " 'Join me in Wisconsin tomorrow or Colorado on Tuesday!\\nGreen Bay- 6pm\\n URL Colorado Springs- 1pm_  URL ',\n",
       " 'The vast majority felt she should be prosecuted...\" -- even senior FBI officials thought Crooked was guilty.\\n URL ',\n",
       " 'The election is absolutely being rigged by the dishonest and distorted media pushing Crooked Hillary - but also at many polling places - SAD',\n",
       " 'Weve all wondered how Hillary avoided prosecution for her email scheme. Wikileaks may have found the answer. Obama!  URL ',\n",
       " 'Hillarys staff thought her email scandal might just blow over. Who would trust these people with national security?  URL ',\n",
       " 'A country that Crooked Hillary says has funded ISIS also gave Wild Bill $1 million for his birthday? SO CORRUPT!  URL ',\n",
       " 'They let Crooked &amp; the Gang off the hook for the crime, but it looks like the cover-up is just as bad. Unbelievable!  URL ',\n",
       " 'Election is being rigged by the media, in a coordinated effort with the Clinton campaign, by putting stories that never happened into news!',\n",
       " 'Polls close, but can you believe I lost large numbers of women voters based on made up events THAT NEVER HAPPENED. Media rigging election!',\n",
       " 'Watched Saturday Night Live hit job on me.Time to retire the boring and unfunny show. Alec Baldwin portrayal stinks. Media rigging election!',\n",
       " '\"@davidshiloach: @realDonaldTrump Go Mr. Trump! Israel is behind you!\"',\n",
       " 'Thank you for sharing Amy.  URL ',\n",
       " 'A great day in New Hampshire and Maine. Fantastic crowds and energy! #MAGA',\n",
       " 'Thank you Bangor, Maine! Get out &amp; #VoteTrumpPence16 on 11/8/16- and together we will MAKE AMERICA SAFE AND GREAT A_  URL ',\n",
       " 'The failing @nytimes reporters dont even call us anymore, they just write whatever they want to write, making up sources along the way!',\n",
       " 'Nothing ever happened with any of these women. Totally made up nonsense to steal the election. Nobody has more respect for women than me!',\n",
       " 'The MOVEMENT in Portsmouth, New Hampshire w/ 7K supporters. THANK YOU! This is the biggest election of our lifetime_  URL ',\n",
       " 'Landing in New Hampshire soon to talk about the massive drug problem there, and all over the country.',\n",
       " 'The truth is a beautiful weapon.  URL ',\n",
       " 'RT @DanScavino: Mr. Trump removing the broken teleprompter in North Carolina-in front of a massive crowd. He goes on&amp;delivers the best spee_',\n",
       " 'Hillary Clinton should have been prosecuted and should be in jail. Instead she is running for president in what looks like a rigged election',\n",
       " 'Will be in Bangor, Maine today at 3pm- join me! #MAGA\\nTickets:  URL ',\n",
       " 'This election is being rigged by the media pushing false and unsubstantiated charges, and outright lies, in order to elect Crooked Hillary!',\n",
       " '100% fabricated and made-up charges, pushed strongly by the media and the Clinton Campaign, may poison the minds of the American Voter. FIX!',\n",
       " 'Thank you @TrumpWomensTour!\\n#MakeAmericaGreatAgain  URL ',\n",
       " 'Thank you Charlotte, North Carolina! We are going to have an AMAZING victory on November 8th...because this is all_  URL ',\n",
       " 'Make sure youre registered to vote! Lets #MakeAmericaGreatAgain! We cant afford more years of FAILURE! All info:_  URL ',\n",
       " 'Thank you for your support Greensboro, North Carolina. Next stop - Charlotte! #MAGA\\n URL ',\n",
       " 'WHAT THEY ARE SAYING ABOUT THE CLINTON CAMPAIGN۪S ANTI-CATHOLIC BIGOTRY:\\n URL ',\n",
       " 'Thank you to our U.S. Navy for protecting our country, both in times of peace &amp; war. Together, WE WILL MAKE AMERICA_  URL ',\n",
       " 'Join me live in Cincinnati, Ohio!\\n#TrumpRally #MAGA\\n URL ',\n",
       " 'Join me in Greensboro, North Carolina tomorrow at 2:00pm! #TrumpRally\\n URL ',\n",
       " 'RT @TeamTrump: \"This is a crossroads in the history of our civilization that will determine whether or not We The People reclaim control ov_',\n",
       " 'Dem Gov. of MN. just announced that the Affordable Care Act (Obamacare) is no longer affordable. Ive been saying this for years- disaster!',\n",
       " 'Great event in Columbus- taking off for Cincinnati now. Great new Ohio poll out- thank you!\\nOHIO NBC/WSJ/MARIST POLL\\nTrump 42%\\nClinton 41%',\n",
       " 'Just left a great rally in Florida - now heading to Ohio for two more. Will be there soon.',\n",
       " 'I am making a major speech in West Palm Beach, Florida at noon. Tune in!',\n",
       " 'Thank you! #MAGA #AmericaFirst  URL ',\n",
       " 'Join me in Ohio &amp; Maine!\\nCincinnati, Ohio- tonight @ 7:30pm:  URL Bangor, Maine - Saturday @ 3pm_  URL ',\n",
       " 'The phony story in the failing @nytimes is a TOTAL FABRICATION. Written by same people as last discredited story on women. WATCH!',\n",
       " 'Why didnt the writer of the twelve year old article in People Magazine mention the \"incident\" in her story. Because it did not happen!',\n",
       " 'I will be in Cincinnati, Ohio tomorrow night at 7:30pm- join me! #OhioVotesEarly #VoteTrumpPence16\\nTickets:_  URL ',\n",
       " 'The MOVEMENT in Lakeland, Florida. Voter registration extended to 10/18. REGISTER ASAP @  URL ',\n",
       " 'The people of Cuba have struggled too long. Will reverse Obamas Executive Orders and concessions towards Cuba until freedoms are restored.',\n",
       " 'PAY TO PLAY POLITICS.\\n#CrookedHillary  URL ',\n",
       " 'Very little pick-up by the dishonest media of incredible information provided by WikiLeaks. So dishonest! Rigged system!',\n",
       " 'Crooked Hillary Clinton likes to talk about the things she will do but she has been there for 30 years - why didnt she do them?',\n",
       " 'Thank you Florida- a MOVEMENT that has never been seen before and will never be seen again. Lets get out &amp;_  URL ',\n",
       " 'Join me Thursday in Florida &amp; Ohio!\\nWest Palm Beach, FL at noon:\\n URL Cincinnati, OH this 7:30pm:\\n URL ',\n",
       " 'Wow, @CNN Town Hall questions were given to Crooked Hillary Clinton in advance of big debates against Bernie Sanders. Hillary &amp; CNN FRAUD!',\n",
       " 'Thank you Texas! If you havent registered to VOTE- today is your last day. Go to:  URL ',\n",
       " 'VOTER REGISTRATION DEADLINES TODAY. You can register now at:  URL ',\n",
       " 'DONT LET HER FOOL US AGAIN.  URL ',\n",
       " 'Crookeds State Dept gave special attention to \"Friends of Bill\" after the Haiti Earthquake. Unbelievable!  URL ',\n",
       " 'RT @DonaldJTrumpJr: 13 states have voter registration deadlines TODAY: FL, OH, PA, MI, GA, TX, NM, IN, LA, TN, AR, KY, SC.\\n\\nRegister: https_',\n",
       " 'I hope people are looking at the disgraceful behavior of Hillary Clinton as exposed by WikiLeaks. She is unfit to run.',\n",
       " 'In Texas now, leaving soon for BIG rally in Florida!',\n",
       " 'The very foul mouthed Sen. John McCain begged for my support during his  primary (I gave, he won), then dropped me over locker room remarks!',\n",
       " 'Wow. Unbelievable.  URL ',\n",
       " 'Disloyal Rs are far more difficult than Crooked Hillary. They come at you from all sides. They don۪t know how to win - I will teach them!',\n",
       " 'With the exception of cheating Bernie out of the nom the Dems have always proven to be far more loyal to each other than the Republicans!',\n",
       " 'It is so nice that the shackles have been taken off me and I can now fight for America the way I want to.',\n",
       " 'RT @EricTrump: 13 states have voter registration deadlines TODAY: FL, OH, PA, MI, GA, TX, NM, IN, LA, TN, AR, KY, SC.\\n\\nRegister:  URL ',\n",
       " 'Our very weak and ineffective leader, Paul Ryan, had a bad conference call where his members went wild at his disloyalty.',\n",
       " 'Despite winning the second debate in a landslide (every poll), it is hard to do well when Paul Ryan and others give zero support!',\n",
       " 'Thank you Pennsylvania. This is a MOVEMENT like we have never seen before! #VoteTrumpPence16 on 11/8/16- together,_  URL ',\n",
       " 'Is this really America? Terrible!  URL ',\n",
       " 'Wow, @CNN got caught fixing their \"focus group\" in order to make Crooked Hillary look better. Really pathetic and totally dishonest!',\n",
       " 'Debate polls look great - thank you!\\n#MAGA #AmericaFirst  URL ',\n",
       " 'CNN is the worst - fortunately they have bad ratings because everyone knows they are biased.  URL ',\n",
       " 'Paul Ryan should spend more time on balancing the budget, jobs and illegal immigration and not waste his time on fighting Republican nominee',\n",
       " 'Thank you for all of the great comments on the debate last night. Very exciting!',\n",
       " 'Thank you St. Louis, Missouri!\\n#MakeAmericaGreatAgain\\n URL ',\n",
       " 'RT @mike_pence: Congrats to my running mate @realDonaldTrump on a big debate win! Proud to stand with you as we #MAGA.',\n",
       " 'RT @TeamTrump: RT if you agree @realDonaldTrump WON the #Debate- BIG LEAGUE! #MAGA  URL ',\n",
       " 'RT @DonaldJTrumpJr: Someone please fact check her coal comments. Give me a break. #debates',\n",
       " 'RT @TeamTrump: It۪s US vs. them! @realDonaldTrump will fight for you! #BigLeagueTruth #Debates',\n",
       " 'MY PRO-GROWTH Econ Plan:\\nEliminate excessive regulations!\\nLean government!\\nLower taxes!\\n#Debatesʉ_  URL ',\n",
       " 'Hypocrite: @HillaryClinton is the single biggest beneficiary of Citizens United in history, by far. #debate #bigleaguetruth',\n",
       " 'RT @DonaldJTrumpJr: Ironic since Hillary has gotten a lot more of that \"dark unaccountable money\" into her campaign. #debates',\n",
       " 'RT @TeamTrump: \"She calls our people deplorable and irredeemable. I will be a president for ALL of our people.\" - @RealDonaldTrump #BigLeag_',\n",
       " 'RT @KellyannePolls: After a decent first debate, @HillaryClinton is back to form: pedantic, lawyerly, technocratic, (woefully untruthful) r_',\n",
       " 'Our country has the slowest growth since 1929. #BigLeagueTruth #debate',\n",
       " 'RT @DanScavino: WE LOVE OUR DEPLORABLES!!!\\n#TrumpTrain #Debates2016  URL ',\n",
       " '@AC360: How can you unite a country if you۪ve written off tens of millions of Americans?\\u06dd #Deplorables #BigLeagueTruth #Debate',\n",
       " 'This country cannot take four more years of Barack Obama! #Debate',\n",
       " 'We agree @POTUS-\\n\\n\"SHELL (Hillary Clinton) SAY ANYTHING &amp; CHANGE NOTHING. ITS TIME TO TURN THE PAGE\" -President Obama',\n",
       " 'If @HillaryClinton is president, she۪ll be all talk and nothing will get done. #Debate #BigLeagueTruth',\n",
       " 'FACT ӕ on red line\\u06dd in Syria: HRC \"I wasn۪t there.\" Fact: line drawn in Aug ۪12. HRC Secy of State til Feb ۪13.  URL ',\n",
       " 'In my administration, EVERY American will be treated equally, protected equally, and honored equally #Debate #BigLeagueTruth',\n",
       " 'RT @TeamTrump: Its hard to fight terrorism when youre making cash payments to the worlds LARGEST state sponsor of TERROR. Under Trump: N_',\n",
       " 'RT @TeamTrump: .@HillaryClinton had her chance and she BLEW IT. #BigLeagueTruth #Debates  URL ',\n",
       " 'RT @TeamTrump: \"We are going to be THRIVING again.\" - @realDonaldTrump #BigLeagueTruth #Debates2016  URL ',\n",
       " 'We cannot let this evil continue! #Debates2016  URL ',\n",
       " 'This is the definition of ransom   URL ',\n",
       " 'RT @JasonMillerinDC: Is @realDonaldTrump debating Crooked @HillaryClinton or the moderators, @AC360 and @MarthaRaddatz? #rattledhillary',\n",
       " 'The world is most peaceful, and most prosperous when America is strongest.  URL ',\n",
       " 'Here are Hillary Clintons \"accomplishments\" at the State Department.\\n#Debates2016 #RattledHillary  URL ',\n",
       " 'RT @TeamTrump: #RattledHillary wants to talk about her 30 years in service. How about her 30 years of FLOPSӕFLOPS?! #BigLeagueTruth #Debat_',\n",
       " 'RT @TeamTrump: .@HillaryClinton is RAISING your taxes to a disastrous level. @realDonaldTrump is going to LOWER your taxes - BIG LEAGUE! #D_',\n",
       " 'History lesson: There۪s a big difference between Hillary Clinton and Abraham Lincoln. For one, his nickname is Hone_  URL ',\n",
       " 'Were going to cut taxes BIG LEAGUE for the middle class. Shes raising your taxes and Im lowering your taxes!\\n URL ',\n",
       " '\"YOU NEED BOTH A PUBLIC AND A PRIVATE POSITION\"\\n@HillaryClinton #Debates2016  URL ',\n",
       " 'Hypocrite! @HillaryClinton claims she needs a public and a private stance\\u06dd in discussions with Wall Street banks. #Debate',\n",
       " 'I hope when the MSM runs its interruption counters\\u06dd they consider the # of times the moderators interrupted me com_  URL ',\n",
       " '.@HillaryClinton - ITS CALLED EXTREME VETTING! #Debates2016  URL ',\n",
       " '.@HillaryClinton #ICYMI- \"WE ARE NOT IN A NARRATIVE FIGHT.\"\\n@Mike_Pence #MAGA  URL ',\n",
       " 'RT @TeamTrump: ONLY @realDonaldTrump will end what even @BillClinton called a CRAZY SYSTEM. #BigLeagueTruth #Debate  URL ',\n",
       " '#CrookedHillary has FAILED all over the world!\\n#BigLeagueTruth #Debates2016  URL ',\n",
       " 'RT @realDonaldTrump: ATTN: @HillaryClinton - Why did five of your staffers need FBI IMMUNITY?! #BigLeagueTruth #Debates',\n",
       " '.@HillaryClinton is NOT above the law!\\n#Debates2016  URL ',\n",
       " 'RT @TeamTrump: We agree with Bill, ObamaCare is the craziest thing in the world.\\u06dd #BigLeagueTruth #Debates2016  URL ',\n",
       " '.@HillaryClinton : Bill clarified\\u06dd what he meant when calling Obamacare a disaster.\\u06dd Actually disaster\\u06dd is pretty clear. #Debate',\n",
       " 'We must repeal Obamacare and replace it with a much more competitive, comprehensive, affordable system. #debate #MAGA',\n",
       " 'Obama and Clinton told the same lie to sell #ObamaCare. #Debates2016  URL ',\n",
       " 'ATTN: @HillaryClinton - Why did five of your staffers need FBI IMMUNITY?! #BigLeagueTruth #Debates',\n",
       " 'Hillary۪s 33,000 deleted emails about her daughter۪s wedding. That۪s a lot of wedding emails. #debate',\n",
       " 'Basically nothing Hillary has said about her secret server has been true. #CrookedHillary',\n",
       " 'If I win-I am going to instruct my AG to get a special prosecutor to look into your situation bc theres never been anything like your lies.',\n",
       " 'RT @TeamTrump: RT if you believe @HillaryClinton is the one who owes America an apology! #BigLeagueTruth #Debates  URL ',\n",
       " 'Donald J. Trumps History Of Empowering Women #BigLeagueTruth\\n URL ',\n",
       " 'There۪s never been anyone more abusive to women in politics than Bill Clinton.My words were unfortunate-the Clintons۪ actions were far worse',\n",
       " 'RT @TeamTrump: Quite simply, @HillaryClinton mistreats women. #BigLeagueTruth #Debate2016\\n URL ',\n",
       " 'I۪m not proud of my locker room talk. But this world has serious problems. We need serious leaders. #debate #BigLeagueTruth',\n",
       " 'RT @TeamTrump: .@realDonaldTrump is here to talk about the REAL issues #BigLeagueTruth #Debates2016  URL ',\n",
       " 'It۪s this simple. Make America Great Again.\\u06dd #debate #BigLeagueTruth',\n",
       " 'RT @TeamTrump: .@HillaryClinton just claimed she has a \"positive, optimistic view\" for America. #Debates  URL ',\n",
       " 'My team of deplorables will be taking over my Twitter account for tonights #debate\\n#MakeAmericaGreatAgain',\n",
       " 'Join me on #FacebookLive as I conclude my final #debate preparations.  URL ',\n",
       " 'The Palestinian terror attack today reminds the world of the grievous perils facing Israeli citizens....continued:\\n URL ',\n",
       " 'Exclusive VideoBroaddrick, Willey, Jones to Bills Defenders: These Are Crimes,۪ Terrified۪ of Enabler۪ Hillary\\n URL ',\n",
       " 'LA Times- USC Dornsife Sunday Poll: Donald Trump Retains 2 Point Lead Over Hillary:\\n URL ',\n",
       " 'So many self-righteous hypocrites. Watch their poll numbers - and elections - go down!',\n",
       " '\"@HenryLeledog: @realDonaldTrump This Black Democrat is on the \"TRUMP TRAIN\"!!\"',\n",
       " '\"@maidaa17: @realDonaldTrump GOP traitors! Not supporting U is voting for her, destroying America.',\n",
       " '\"@CharleneOsbor17: @realDonaldTrump politicians dont count. Its the people. We are behind trump all the way to White House.\"',\n",
       " '\"@eericmyers: @realDonaldTrump  \"Republican leadership\" should have only one job: Help elect the nominee we voted for, Donald J. Trump.\"',\n",
       " '\"@Jodygirl1010: @realDonaldTrump I am a woman who continues to support &amp; stand with #Trump! #dtmag  URL ',\n",
       " 'EXCLUSIVE  Video Interview: Bill Clinton Accuser Juanita Broaddrick Relives Brutal Rapes:\\n URL ',\n",
       " 'Tremendous support (except for some Republican \"leadership\"). Thank you.',\n",
       " 'Thank you to my great supporters in Wisconsin. I heard that the crowd and enthusiasm was unreal!',\n",
       " 'RT @atensnut: Hillary calls Trumps remarks \"horrific\" while she lives with and protects a \"Rapist\".  Her actions are horrific.',\n",
       " 'RT @atensnut: How many times must it be said? Actions speak louder than words. DT said bad things!HRC threatened me after BC raped me.',\n",
       " 'The media and establishment want me out of the race so badly -  I WILL NEVER DROP OUT OF THE RACE, WILL NEVER LET MY SUPPORTERS DOWN! #MAGA',\n",
       " 'Certainly has been an interesting 24 hours!',\n",
       " 'Here is my statement.  URL ',\n",
       " 'Thoughts &amp; prayers with the millions of people in the path of Hurricane Matthew. Look out for neighbors, and listen_  URL ',\n",
       " '\"@kevcirilli: Trump speaking in exact same tone he did in Waterville Valley on 12/1. The night I first realized he was gonna be GOP nominee\"',\n",
       " 'New National Rasmussen Poll:  URL ',\n",
       " 'Thank you Tennessee! #MAGA  URL ',\n",
       " 'RT @JasonMillerinDC: Rasmussen national poll: Trump leads 43-41. White House Watch - Rasmussen Reports\\u06dd  URL ',\n",
       " 'VOTE #TrumpPence16 on 11/8/16!  URL ',\n",
       " 'Donald Trump: A President for All Americans  URL ',\n",
       " 'Volunteer to be a Trump Election Observer. Sign up today!\\n#MakeAmericaGreatAgain\\n URL ',\n",
       " 'RT @DonaldJTrumpJr: Great group at our Victory Office in Columbus, Ohio. Im incredibly grateful to have so many_  URL ',\n",
       " 'RT @IvankaTrump: Thank you Angie Phillips for inviting me to tour your plant Middletown Tube Works. #Ohio  URL ',\n",
       " 'Praying for everyone in Florida. Hoping the hurricane dissipates, but in any event, please be careful.',\n",
       " 'New Virginia poll- thank you! We are going to show the whole world that America is back  BIGGER, and BETTER, and S_  URL ',\n",
       " 'Pennsylvania poll just released. Two rallies there on Mon- join me!\\nAmbridge:  URL Wilkes-Barre:_  URL ',\n",
       " 'Nations Immigration And Customs Enforcement Officers (ICE) Make First-Ever Presidential Endorsement:\\n URL ',\n",
       " 'Such a great honor!\\n URL ',\n",
       " 'Amazing rally in Reno, Nevada- thank you. Make sure you get out on 11/8 &amp; VOTE #TrumpPence16. Together, we will put_  URL ',\n",
       " 'Reuters polling just out- thank you!\\n#MakeAmericaGreatAgain  URL ',\n",
       " 'Thank you South Carolina! Everyone has to get out and VOTE on 11/8/16.\\n#MakeAmericaGreatAgain_  URL ',\n",
       " 'EARLY VOTING: MN &amp; IA already underway, more states coming up in the next week: OH, ME, AZ, IN  check w/local officials for details &amp; VOTE!',\n",
       " 'Small business says Trump is their pick for president\\n URL ',\n",
       " 'Thank you @SenJohnMcCain for your kind remarks on the important issue of PTSD and the dishonest media. Great to be in Arizona yesterday!',\n",
       " 'Bill Clinton is right: Obamacare is crazy, doesnt work and doesnt make sense.  Thanks Bill for telling the truth.',\n",
       " 'Thank you Henderson, NV. This is a MOVEMENT like never seen before! Watch some of the rally via my Facebook page:_  URL ',\n",
       " 'About to begin a rally here in Henderson, Nevada. New Reuters poll just out- thank you! Join the MOVEMENT:_  URL ',\n",
       " 'Beautiful morning- thank you @ICLV!  URL ',\n",
       " 'The constant interruptions last night by Tim Kaine should not have been allowed. Mike Pence won big!',\n",
       " 'Mike Pence won big. We should all be proud of Mike!',\n",
       " 'RT @mike_pence: History teaches us that weakness arouses evil. America needs to be strong on the world stage. #VPDebate  URL ',\n",
       " 'RT @TeamTrump: RT if you agree - @HillaryClinton &amp; @timkaine are WRONG for America! #VPDebate #MAGA  URL ',\n",
       " 'RT @TeamTrump: .@timkaines Abortion Flip-Flops: From Valuing The Sanctity of Life --&gt; Pro-Abortion Demagogue #VPdebate  URL ',\n",
       " '\"@AnyoneTennis: @timkaine Cannot believe how often the moderator interrupts #Pence vs the other guy...so obvious @FoxNews\"  So true!',\n",
       " '\"@Gsimmons03Ginny: @realDonaldTrump ..Kaine is awful, Trump and Pence are the ticket..no more lies, we are ready to see America Great Again!',\n",
       " 'Clinton۪s Top Aides Were Mired In Conflict Of Interest At The State Department:  URL #VPDebate #BigLeagueTruth',\n",
       " '\"@FLifeforce: @_CFJ_ @vine That is a reason to NOT Vote for Hillary Clinton. Vote for Liberty! Vote for @realDonaldTrump\"',\n",
       " '.@HillaryClinton۪s Careless Use Of A Secret Server Put National Security At Risk:  URL #BigLeagueTruth',\n",
       " 'CLINTON IS WEAK ON NORTH KOREA:\\n URL ',\n",
       " 'RT @TeamTrump: Obama-Clinton FAILED foreign policy:\\n-Bad nuclear deal\\n-Ransom payment to leading state sponsor of terror\\n-Sharing classifie_',\n",
       " 'RT @TeamTrump: .@HillaryClinton &amp; @timkaine think youre #Deplorables &amp; #BasementDwellers. @realDonaldTrump &amp; @mike_pence think youre PATR_',\n",
       " 'CLINTON۪S CLOSE TIES TO PUTIN DESERVE SCRUTINY:\\n URL ',\n",
       " 'Sanctions Relief From Clinton-Obama Iran Nuclear Deal Likely Go to Terrorists:\\n URL ',\n",
       " '.@timkaine is wrong for defense:\\n URL #BigLeagueTruth #VPDebate',\n",
       " 'ICYMI: PENCE: I RAN A STATE THAT WORKED; KAINE RAN A STATE THAT FAILED.  URL ',\n",
       " '.@timkaine is the ANTI-DEFENSE SENATOR. #VPDebate #BigLeagueTruth  URL ',\n",
       " 'RT @TeamTrump: We need STRONG, BROAD-SHOULDERED leadership like @mike_pence &amp; @realDonaldTrump in the White House! #VPDebate #BigLeagueTrut_',\n",
       " 'CLINTON۪S FLAILING SYRIA POLICY WAS JUDGED A FAILURE:\\n URL ',\n",
       " 'RT @GOP: In @timkaines own words  #Debates2016  URL ',\n",
       " '.@mike_pence and I will defeat #ISIS.\\n URL ',\n",
       " 'WHAT THEY ARE SAYING ABOUT MIKE PENCE DOMINATING\\u06dd THE DEBATE:\\n URL ',\n",
       " 'I agree Mike - thank you to all of our law enforcement officers! #VPDebate\\n\\n\"Police officers are the best of us...\"\\n@Mike_Pence',\n",
       " '.@HillaryClinton Sneers At Millions Of Average Americans.\\n URL #VPDebate #BigLeagueTruth',\n",
       " 'RT @TeamTrump: Police officers are the BEST of us. Law enforcement in this country is a force for GOOD.\" - @mike_pence #VPDebate #BigLeagu_',\n",
       " '\"@GeeVeeM: @realDonaldTrump @Susiesentinel Pence is so prepared! He did his homework to outperform Kaine.\"',\n",
       " 'RT @TeamTrump: Law enforcement officers bring communities together &amp; keep us safe. @mike_pence &amp; @realDonaldTrump RESPECT &amp; stand by them!_',\n",
       " 'RT @seanspicer: .@timkaine wants to tough on crime - fails to talk about defending rapists and murders #VPDebate',\n",
       " '.@timkaine oversaw unemployment INCREASE by 179,249 while @mike_pence DECREASED unemployment in Indiana by 113,826._  URL ',\n",
       " '\"@aldonturnaolco1: @FrankLuntz @marthamaccallum @realDonaldTrump good!!\"',\n",
       " '\"@bcuzimdamomma: @FreeDavidKing No she only gets #Americans killed #Benghazi - we need @realDonaldTrump #MAGA\"',\n",
       " '\"@ifdanyt: @realDonaldTrump Loving @mike_pence hes so likeable and sensible. Kaine is just talking bull!',\n",
       " '\"@carol_lcnixon67: @realDonaldTrump Kaine says Hillary and he have plans. She could care less what Kaine thinks.\"',\n",
       " '\"@ARSenMissyIrvin: I want a \"youre fired\" president with people in Govt who are WASTING my tax $s. @realDonaldTrump\"',\n",
       " '.@mike_pence is doing a great job - so far, no contest!',\n",
       " '\"@TeamTrump: .@mike_pence &amp; @realDonaldTrump are PROVEN job creators and are prepared to bring JOBS BACK to the American people!',\n",
       " '\"@Jnelson52722: @realDonaldTrump @Susiesentinel Kaine looks like an evil crook out of the Batman movies\"',\n",
       " '\"@elisac006: @nycmia @realDonaldTrump I agree. Kaine looks like a fool!!\"',\n",
       " 'RT @TeamTrump: .@timkaine has a pay-to-play problem just like Crooked @HillaryClinton #VPDebates #BigLeagueTruth  URL ',\n",
       " '\"@bigdog_joey: @realDonaldTrump @timkaine is so angry. Our @mike_pence looks great. kaine cant defend all those lies #makeamericagreatagain',\n",
       " 'RT @mike_pence: There۪s one clear choice in this election to create jobs and grow the American economy. #VPDebate  URL ',\n",
       " 'RT @joshrogin: Pence is right. Clinton &amp; Obama tried to negotiate an Iraq troop extension but failed. Bush admin always anticipated such an_',\n",
       " '\"@Susiesentinel: #pence is so much more likeable than Kaine #cbsnews @realDonaldTrump\"',\n",
       " '\"@lainey34210: @realDonaldTrump Great opening Pence_ե\"',\n",
       " '\"@RoadkingL: @mike_pence Wow, Kaine couldnt go 12 seconds without a lie. Marines and military are scared of the liar running. #bengazi\"',\n",
       " '.@megynkelly- I am in Nevada. Sorry to inform you Kellyanne is in the audience. Better luck next time.',\n",
       " 'Both are looking good! Now we begin!',\n",
       " 'Here we go - Enjoy!',\n",
       " 'I will be live-tweeting the V.P. Debate. Very exciting! MAKE AMERICA GREAT AGAIN!',\n",
       " 'Wow, @CNN is so negative. Their panel is a joke, biased and very dumb. Im turning to @FoxNews where we get a fair shake! Mike will do great',\n",
       " 'Wow, did you just hear Bill Clintons statement on how bad ObamaCare is. Hillary not happy. As I have been saying, REPEAL AND REPLACE!',\n",
       " 'Join the MOVEMENT!\\n URL ',\n",
       " 'Thank you ARIZONA! This is a MOVEMENT like nobody has ever seen before. Together, we are going to MAKE AMERICA SAFE_  URL ',\n",
       " 'My childcare plan makes a difference for working families - more money, more freedom. #AmericaFirst means_  URL ',\n",
       " 'I will be watching the great Governor @Mike_Pence and live tweeting the VP debate tonight starting at 8:30pm est! Enjoy!',\n",
       " 'Join me in Reno, Nevada tomorrow at 3:30pm! #AmericaFirst #MAGA\\nTickets:  URL ',\n",
       " 'Join me in Reno, Nevada on Wednesday at 3:30pm at the Reno-Sparks Convention Center! #MAGA\\nTickets:_  URL ',\n",
       " 'Thank you Colorado! #MAGA\\n URL  URL ',\n",
       " 'We must bring the truth directly to hard-working Americans who want to take our country back. #BigLeagueTruth_  URL ',\n",
       " 'Thank you Pueblo, Colorado!\\n#TrumpRally #AmericaFirst\\n URL ',\n",
       " 'Join me in Henderson, Nevada on Wednesday at 11:30am! #MAGA\\nTickets:  URL ',\n",
       " 'Just announced that Iraq (U.S.) is preparing for battle to reclaim Mosul. Why do they have to announce this? Makes mission much harder!',\n",
       " 'Melania and I extend our warmest greetings to those observing Rosh Hashanah here in the United States, in Israel, and around the world.',\n",
       " 'Bernie should pull his endorsement of Crooked Hillary after she decieved him and then attacked him and his supporters.',\n",
       " '\"@trumplican2016: .@realDonaldTrump  There will be MASSIVE turnout for you,  Mr. Trump - These polls dont register the pulse of the PEOPLE!',\n",
       " 'I have created tens of thousands of jobs and will bring back great American prosperity. Hillary has only created jobs at the FBI and DOJ!',\n",
       " 'I know our complex tax laws better than anyone who has ever run for president and am the only one who can fix them. #failing@nytimes',\n",
       " 'Heading to Pennsylvania for a big rally tonight. We will MAKE AMERICA GREAT AGAIN!',\n",
       " 'Wow, just saw the really bad @CNN ratings. People dont want to watch bad product that only builds up Crooked Hillary.',\n",
       " 'The so-called Commission on Presidential Debates admitted to us that the DJT audio &amp; sound level was very bad. So why didnt they fix it?',\n",
       " 'I won the debate if you decide without watching the totally one-sided \"spin\" that followed. This despite the really bad microphone.',\n",
       " 'Crooked H is nasty to Sanders supporters behind closed doors. Owned by Wall St and Politicians, HRC is not with you.  URL ',\n",
       " 'I believe in #AmericaFirst and that means FAMILY FIRST! My childcare plan reflects the needs of modern working-clas_  URL ',\n",
       " 'Thank you Novi, Michigan! Get out and VOTE #TrumpPence16 on 11/8. Together, WE WILL MAKE AMERICA GREAT AGAIN!_  URL ',\n",
       " 'Thank you for your support - on my way now! See you soon. #TrumpTrain  URL ',\n",
       " 'Join me in Pueblo, Colorado on Monday afternoon at 3pm! #TrumpRally\\n URL ',\n",
       " 'For those few people knocking me for tweeting at three oclock in the morning, at least you know I will be there, awake, to answer the call!',\n",
       " 'Why isnt Hillary 50 points ahead? Maybe its the email scandal, policies that spread ISIS, or calling millions of_  URL ',\n",
       " 'The people are really smart in cancelling subscriptions to the Dallas &amp; Arizona papers &amp; now USA Today will lose readers! The people get it!',\n",
       " 'Remember, dont believe \"sources said\" by the VERY dishonest media. If they dont name the sources, the sources dont exist.',\n",
       " 'Did Crooked Hillary help disgusting (check out sex tape and past) Alicia M become a U.S. citizen so she could use her in the debate?',\n",
       " 'Using Alicia M in the debate as a paragon of virtue just shows that Crooked Hillary suffers from BAD JUDGEMENT! Hillary was set up by a con.',\n",
       " 'Wow, Crooked Hillary was duped and used by my worst Miss U. Hillary floated her as an \"angel\" without checking her past, which is terrible!',\n",
       " 'Anytime you see a story about me or my campaign saying \"sources said,\" DO NOT believe it. There are no sources, they are just made up lies!',\n",
       " 'Wow, did you see how badly @CNN (Clinton News Network) is doing in the ratings. With people like @donlemon, who could expect any more?',\n",
       " 'While Hillary profits off the rigged system, I am fighting for you! Remember the simple phrase: #FollowTheMoney_  URL ',\n",
       " 'Thank you for joining me this afternoon, New Hampshire! Will be back soon. #FollowTheMoney\\nSpeech transcript:_  URL ',\n",
       " 'Join me in Manheim, Pennsylvania on Saturday at 7pm! #TrumpRally\\nTickets:  URL ',\n",
       " 'My condolences to those involved in todays horrible accident in NJ and my deepest gratitude to all of the amazing first responders.',\n",
       " 'Will be in Novi, Michigan this Friday at 5:00pm. Join the MOVEMENT! Tickets available at:  URL ',\n",
       " 'Join me in Bedford, New Hampshire- tomorrow at 3:00pm. Cant wait to see everyone! #AmericaFirst #MAGA_  URL ',\n",
       " 'Thank you Waukesha, Wisconsin!\\nFull transcript of my speech, #FollowTheMoney:\\n URL ',\n",
       " 'Joining @oreillyfactor from Waukesha, Wisconsin - now, live! Enjoy!',\n",
       " 'Join me live in Waukesha, Wisconsin for an 8pmE rally! #AmericaFirst #MAGA\\n URL ',\n",
       " 'Thank you Council Bluffs, Iowa! Will be back soon. Remember- everything you need to know about Hillary -- just_  URL ',\n",
       " 'RT @TeamTrump: \"She put the office of Sec of State up for sale. If she ever got the chance, she۪d put the Oval Office up for sale too.\" #Fo_',\n",
       " 'An honor to meet with the Polish American Congress in Chicago this morning! #ImWithYou\\nVideo:_  URL ',\n",
       " 'Melania and I extend our deepest condolences to the family of Shimon Peres... URL ',\n",
       " 'Join me in Council Bluffs, Iowa- today at 3pm! #MakeAmericaGreatAgain\\nTickets:  URL ',\n",
       " 'Every on-line poll, Time Magazine, Drudge etc., has me winning the debate. Thank you to Fox &amp; Friends for so reporting!',\n",
       " 'My supporters are the best! $18 million from hard-working people who KNOW what we can be again! Shatter the record:  URL ',\n",
       " 'Unbelievable evening in Melbourne, Florida w/ 15,000 supporters- and an additional 12,000 who could not get in. Tha_  URL ',\n",
       " 'Join me for a 3pm rally - tomorrow at the Mid-America Center in Council Bluffs, Iowa! Tickets:_  URL ',\n",
       " 'Once again, we will have a government of, by and for the people. Join the MOVEMENT today!  URL ',\n",
       " 'RT @GOP: On National #VoterRegistrationDay, make sure youre registered to vote so we can #MakeAmericaGreatAgain  URL ',\n",
       " 'Hillary Clintons Campaign Continues To Make False Claims About Foundation Disclosure:\\n URL ',\n",
       " 'CNBC, Time magazine online polls say Donald Trump won the first presidential debate via @WashTimes. #MAGA\\n URL ',\n",
       " 'Great afternoon in Little Havana with Hispanic community leaders. Thank you for your support! #ImWithYou  URL ',\n",
       " 'In the last 24 hrs. we have raised over $13M from online donations and National Call Day, and we۪re still going! Thank you America! #MAGA',\n",
       " 'Well, now theyre saying that I not only won the NBC Presidential Forum, but last night the big debate. Nice!',\n",
       " 'Thank you for your endorsement, @GovernorSununu. #MAGA\\n URL ',\n",
       " 'Such a great honor. Final debate polls are in - and the MOVEMENT wins!\\n#AmericaFirst #MAGA #ImWithYou_  URL ',\n",
       " 'U.S. Murders Increased 10.8% in 2015 via @WSJ:  URL ',\n",
       " 'Thank you! #TrumpWon #MAGA\\n URL ',\n",
       " 'Hillarys been failing for 30 years in not getting the job done - it will never change.',\n",
       " 'True blue-collar billionaire Donald Trump shows Hillary Clinton is out of touch  URL ',\n",
       " 'The #1 trend on Twitter right now is #TrumpWon - thank you!',\n",
       " 'I won every poll from last nights Presidential Debate - except for the little watched @CNN poll.',\n",
       " 'How Trump won over a bar full of undecideds and Democrats\\n URL ',\n",
       " 'I really enjoyed the debate last night.Crooked Hillary says she is going to do so many things.Why hasnt she done them in her last 30 years?',\n",
       " 'Great debate poll numbers - I will be on @foxandfriends at 7:00 to discuss. Enjoy!',\n",
       " 'Thank you! Four new #DebateNight polls with the MOVEMENT winning. Together, we will MAKE AMERICA SAFE &amp; GREAT AGAIN_  URL ',\n",
       " '.@DRUDGE_REPORTs First Presidential Debate Poll:\\nTrump: 80%\\nClinton: 20%\\nJoin the MOVEMENT today &amp; lets #MAGA!_  URL ',\n",
       " 'Thank you! CNBC #DebateNight poll with over 400,000 votes.\\nTrump 61%\\nClinton 39%\\n#AmericaFirst #ImWithYou_  URL ',\n",
       " 'TIME #DebateNight poll - over 800,000 votes. Thank you!\\n#AmericaFirst #MAGA  URL ',\n",
       " '.@newtgingrich just said \"a historic victory for Trump.\" NICE!',\n",
       " 'Wow, did great in the debate polls (except for @CNN - which I dont watch). Thank you!',\n",
       " 'Thank you Governor @TerryBranstad!\\n#AmericaFirst #Debates2016  URL ',\n",
       " 'Thank you Governor @Mike_Pence!\\nLets MAKE AMERICA SAFE AND GREAT AGAIN with the American people.\\n#AmericaFirst_  URL ',\n",
       " 'Thank you Senator @TedCruz!\\n#Debates2016 #MAGA  URL ',\n",
       " '.@HillaryClinton۪s Nuclear Agreement Paved The Way For The $400 Million Ransom Payment #DebateNight\\n URL ',\n",
       " 'Nothing on emails. Nothing on the corrupt Clinton Foundation. And nothing on #Benghazi. #Debates2016 #debatenight',\n",
       " '.@HillaryClinton - Obama #ISIS Strategy Has Allowed It To Expand To Become A Global Threat #DebateNight  URL ',\n",
       " 'RT @TeamTrump: .@realDonaldTrump calling out @HillaryClintons support for NAFTA = most searched moment during tonights debate. #Debates20_',\n",
       " 'Russia has more warheads than ever, N Korea is testing nukes, and Iran got a sweetheart deal to keep theirs. Thanks, @HillaryClinton.',\n",
       " 'Hillary Clinton failed all over the world.\\nLIBYA\\nSYRIA\\nIRAN\\nIRAQ\\nASIA PIVOT\\nRUSSIAN RESET\\nBENGHAZI_  URL ',\n",
       " 'RT @TeamTrump: 100% TRUE --&gt; @realDonaldTrump is right - @HillaryClinton did call TPP the gold standard۪ #Debates2016\\n URL ',\n",
       " 'Hillary Clinton is the only candidate on stage who voted for the Iraq War. #Debates2016 #MAGA  URL ',\n",
       " '.@HillaryClintons 2008 Campaign And Supporters Trafficked In Rumors About Obamas Heritage #DebateNight\\n URL ',\n",
       " 'RT @TeamTrump: Hillarys policies have made America less safe, thats why 200+ general and military leaders have endorsed @realDonaldTrump!_',\n",
       " 'RT @DanScavino: Jesse Jackson on @realDonaldTrump - when he donated space for the Rainbow/Push Coalition.\\n#DebateNight  URL ',\n",
       " 'I will stand with police and protect ALL Americans! #Debates2016 #MAGA  URL ',\n",
       " 'RT @TeamTrump: When @realDonaldTrump is POTUS, families are going to be safe and secure. Law and order will be RESTORED! #MAGA #Debates #De_',\n",
       " 'RT @TeamTrump: WATCH: @realDonaldTrump on the stakes in this election #Debates2016  URL ',\n",
       " 'This is the simple fact about @HillaryClinton: she is a typical politician - all talk, no action. #Debates2016',\n",
       " 'HILLARYS BAD TAX HABIT!  URL ',\n",
       " 'A Clinton economy = more taxes and more spending! #DebateNight  URL ',\n",
       " '.@HillaryClinton has been part of the rigged DC system for 30 years? Why would we take policy advice from her? #Debates2016',\n",
       " 'Instead of driving jobs and wealth away, AMERICA will become the worlds great magnet for innovation and job creati_  URL ',\n",
       " '.@HillaryClinton channels John Kerry on trade: she was for bad trade deals before she was against them. #TPP #Debates2016',\n",
       " '.@HillaryClinton and Obama policies increased debt by $9trillion over the last 8 years',\n",
       " 'RT @TeamTrump: A @realDonaldTrump Administration will bring JOBS BACK! #Debates2016  URL ',\n",
       " 'Why isnt Hillary Clinton 50 points ahead?\\n#DebateNight  URL ',\n",
       " 'RT @DanScavino: Join @realDonaldTrump on his official social media platforms during tonights debate ~ as @TeamTrump manages rapid response_',\n",
       " 'My team of deplorables will be managing my Twitter account for this evenings debate. Tune in!\\n#DebateNight #TrumpPence16',\n",
       " 'RT @KellyannePolls: #Polls showing @realDonaldTrump surging, @hillaryclinton #slipping, have HER camp on defense/lowering expectations, goi_',\n",
       " 'New national Bloomberg poll just released - thank you! Join the MOVEMENT:  URL #TrumpTrain_  URL ',\n",
       " 'Really sad news: The great Arnold Palmer, the \"King,\" has died. There was no-one like him - a true champion! He will be truly missed.',\n",
       " 'Five people killed in Washington State by a Middle Eastern immigrant. Many people died this weekend in Ohio from drug overdoses. N.C. riots!',\n",
       " 'Readout of my meeting with Israeli Prime Minister Benjamin Netanyahu:\\n URL ',\n",
       " 'Looking forward to my meeting with Benjamin Netanyahu in Trump Tower at 10:00 A.M.',\n",
       " 'Bernie Sanders gave Hillary the Dem nomination when he gave up on the e-mails. That issue has only gotten bigger!',\n",
       " 'Many on the team and staff of Bernie Sanders have been treated badly by the Hillary Clinton campaign - and they like Trump on trade, a lot!',\n",
       " 'Thank you Roanoke, Virginia - this a MOVEMENT - join us today!\\nSign up:  URL #AmericaFirst_  URL ',\n",
       " 'If dopey Mark Cuban of failed Benefactor fame wants to sit in the front row, perhaps I will put Gennifer Flowers right alongside of him!',\n",
       " 'Will be back in Virginia tonight- for a 6pm rally at the Berglund Center in Roanoke. Join me! Tickets:_  URL ',\n",
       " '\"@KellyannePolls: Trump is headed for a win, says professor who has predicted 30 years of presidential outcomes    URL ',\n",
       " 'The @SenTedCruz endorsement was a wonderful surprise. I greatly appreciate his support! We will have a tremendous victory on November 8th.',\n",
       " 'Today is the day! Knock on doors and make calls with us on National Day of Action! #TrumpTrain #MAGA_  URL ',\n",
       " 'INTERVIEW on @seanhannity NOW! Enjoy.',\n",
       " 'Crooked Hillarys bad judgement forced her to announce that she would go to Charlotte on Saturday to grandstand. Dem pols said no way, dumb!',\n",
       " 'Join me in Roanoke, Virginia tomorrow at the Berglund Center- Coliseum ~ 6pm! Tickets available at:_  URL ',\n",
       " 'How Trump Would Stimulate the U.S. Economy\\n URL ',\n",
       " 'Hillary Clinton just lost every Republican she ever had, including Never Trump, all farmers &amp; sm. biz, by saying she۪ll tax estates at 65%.',\n",
       " 'Tomorrows the day! Knock on doors and make calls with us on National Day of Action! #TrumpTrain #MAGA_  URL ',\n",
       " 'RT @dcexaminer: EXCLUSIVE: How Donald Trumps 30 million followers are crashing the Internet  URL ',\n",
       " 'Spoke with Governor @PatMcCroryNC of North Carolina today. He is doing a tremendous job under tough circumstances.',\n",
       " 'This is more than a campaign- it is a movement. #MakeAmericaGreatAgain\\nSIGN UP TODAY &amp; WE WILL WIN!  URL ',\n",
       " 'Join me in Roanoke, Virginia on Saturday evening at 6pm! #MAGA\\n URL ',\n",
       " 'Will be on @foxandfriends now.',\n",
       " 'I will be interviewed from Cleveland, Ohio, on @seanhannity - Tonight at 10:00 P.M. Enjoy!',\n",
       " '\"@ThAllenSBoucher: @DiamondandSilk @realDonaldTrump @seanhannity I love those beautiful gals.\" D + S = Two amazing women!',\n",
       " '.@YoungDems4Trump  Thank you!',\n",
       " 'Great new polls! Thank you Nevada, North Carolina &amp; Ohio. Join the MOVEMENT today &amp; lets #MAGA!_  URL ',\n",
       " 'Thank you Toledo, Ohio! It is so important for you to get out and VOTE on November 8, 2016! Lets MAKE AMERICA SAFE_  URL ',\n",
       " 'RT @GMA: WATCH: @IvankaTrump on \"women who work;\" empowering campaign celebrates modern women.   URL ',\n",
       " 'Hopefully the violence &amp; unrest in Charlotte will come to an immediate end. To those injured, get well soon. We need unity &amp; leadership.',\n",
       " 'The situations in Tulsa and Charlotte are tragic. We must come together to make America safe again.',\n",
       " 'It is a MOVEMENT - not a campaign. Leaving the past behind, changing our future. Together, we will MAKE AMERICA SAF_  URL ',\n",
       " 'Thank you Kenansville, North Carolina! Remember- on November 8th, that special interest gravy train is coming to a_  URL ',\n",
       " 'Thank you High Point, NC! I will fight for every neglected part of this nation &amp; I will fight to bring us together_  URL ',\n",
       " 'Hillary Clinton is taking the day off again, she needs the rest. Sleep well Hillary - see you at the debate!',\n",
       " 'Heading to North Carolina for two big rallies. Will be there soon. We will bring jobs back where they belong!',\n",
       " 'Do people notice Hillary is copying my airplane rallies - she puts the plane behind her like I have been doing from the beginning.',\n",
       " 'Thank you Nevada! #AmericaFirst\\n#MakeAmericaGreatAgain\\n URL ',\n",
       " 'Thank you Georgia! #AmericaFirst\\n#MakeAmericaGreatAgain\\n URL ',\n",
       " 'Crooked Hillary has been fighting ISIS, or whatever she has been doing, for years. Now she has new ideas. It is time for change.',\n",
       " 'Amazing rally in Florida - this is a MOVEMENT! Join us today at  URL ',\n",
       " 'Together, we will MAKE AMERICA SAFE AND GREAT AGAIN! #ImWithYou #AmericaFirst  URL ',\n",
       " 'I will be interviewed on the @oreillyfactor - tonight from Florida, now. Enjoy!',\n",
       " 'Philly FOP Chief On Presidential Endorsement: Clinton Blew The Police Off  URL ',\n",
       " 'Hillary Clintons weakness while she was Secretary of State, has emboldened terrorists all over the world..cont:  URL ',\n",
       " 'Once again someone we were told is ok turns out to be a terrorist who wants to destroy our country &amp; its people- how did he get thru system?',\n",
       " 'Great job once again by law enforcement! We are proud of them and should embrace them - without them, we dont have a country!',\n",
       " '\"@TarukMatuk: @CNN @FoxNews @realDonaldTrump @RogerRice10 Refugees from Syria over 10k plus more coming. Lots young males, poorly vetted.',\n",
       " '\"@AngPiazza: @foxandfriends  @realDonaldTrump hes the ONLY candidate that will keep us safe!\"',\n",
       " 'Will be on @foxandfriends at 7:02 A.M.  Enjoy.',\n",
       " 'Terrible attacks in NY, NJ and MN this weekend. Thinking of victims, their families and all Americans! We need to be strong!',\n",
       " 'Saturday۪s attacks show that failed Obama/Hillary Clinton polices won۪t keep us safe! I will Make America Safe Again!',\n",
       " 'Under the leadership of Obama &amp; Clinton, Americans have experienced more attacks at home than victories abroad. Time to change the playbook!',\n",
       " 'HAPPY BIRTHDAY - to the United States Air Force!!  URL ',\n",
       " 'RT @KellyannePolls: This updates @Reuters/Ipsos Electoral Map shows more than nearly anything how much has changed in just a month. https:/_',\n",
       " 'RT @KellyannePolls: #polls, continued. Hillary averaging 40% in three states Pres. Obama won.  URL ',\n",
       " 'RT @KellyannePolls: more media #polls showing @realDonaldTrump ahead in states Pres Obama won twice.  URL ',\n",
       " 'I would like to express my warmest regards, best wishes and condolences to all of the families and victims of the horrible bombing in NYC.',\n",
       " 'Never met but never liked dopey Robert Gates. Look at the mess the U.S. is in. Always speaks badly of his many bosses, including Obama.',\n",
       " 'Heading to Colorado for a big rally. Massive crowd, great people! Will be there soon - the polls are looking good.',\n",
       " 'My lawyers want to sue the failing @nytimes so badly for irresponsible intent. I said no (for now), but they are watching. Really disgusting',\n",
       " 'The failing @nytimes has gone nuts that Crooked Hillary is doing so badly. They are willing to say anything, has become a laughingstock rag!',\n",
       " 'Crazy Maureen Dowd, the wacky columnist for the failing @nytimes, pretends she knows me well--wrong!',\n",
       " 'Wacky @NYTimesDowd, who hardly knows me, makes up things that I never said for her boring interviews and column. A neurotic dope!',\n",
       " '.@CNN just doesnt get it, and thats why their ratings are so low - and getting worse. Boring anti-Trump panelists, mostly losers in life!',\n",
       " 'I never met former Defense Secretary Robert Gates. He knows nothing about me. But look at the results under his guidance - a total disaster!',\n",
       " 'Crooked Hillary wants to take your 2nd Amendment rights away. Will guns be taken from her heavily armed Secret Service detail? Maybe not!',\n",
       " 'My thoughts and prayers go out to the @PhillyPolice &amp; @Penn police officers- in Philadelphia.  URL ',\n",
       " '\"Donald Trump۪s birther event is the greatest trick he۪s ever pulled\"\\n URL ',\n",
       " 'A very interesting take from @KatiePavlich:  URL ',\n",
       " 'Just arrived in Texas - have been informed two @fortworthpd officers have been shot. My thoughts and prayers are with them.',\n",
       " 'Just leaving Miami for Houston, Oklahoma and Colorado. Miami crowd was fantastic!',\n",
       " 'Great parade in The Villages- I love you all. We will #MAGA. Thank you for the incredible support-I will not forget!  URL ',\n",
       " 'I am truly honored and grateful for receiving SO much support from our American heroes... URL ',\n",
       " 'I am now going to the brand new Trump International, Hotel D.C. for a major statement.',\n",
       " 'Thank you for a great evening - Laconia, New Hampshire -- will be back soon! #AmericaFirst\\n URL ',\n",
       " '\"@AK_TWEET: #TheDonalds hair gets the #JimmyFallon treatment on #TheTonightShow #TrumpPence16  URL ',\n",
       " '\"@jimmyfallon: Tonight: @realDonaldTrump, @normmacdonald, a performance by Kiiara,and your funniest #MyTeacherIsWeird tweets. #FallonTonight',\n",
       " 'Instead of driving jobs and wealth away, AMERICA will become the WORLDS great magnet for innovation &amp; job creation!  URL ',\n",
       " 'Will be joining @jimmyfallon on @FallonTonight at 11:35pmE tonight. Enjoy!',\n",
       " 'RT @EricTrump: What a scary statistic! Americans are working harder and making less! We need competent leadership!  URL ',\n",
       " 'I will be interviewed by @jessebwatters on @oreillyfactor tonight at 8pm. Enjoy!',\n",
       " 'Full transcript of economic plan- delivered to the Economic Club of New York. #MAGA  URL ',\n",
       " 'Thank you @JerryJrFalwell!  URL ',\n",
       " 'Thank you to all of our law enforcement officers - across America! #LESM #MAGA\\n URL ',\n",
       " 'Thank you for having me! I enjoyed the tour and spending time with everyone. See you soon. #MAGA  URL ',\n",
       " 'Will be on @foxandfriends at 7:00 A.M. Enjoy!',\n",
       " '\"@ghfanlovessonny: @realDonaldTrump you have my vote in Pennsylvania. Trump 2016\" Thank you!',\n",
       " 'I was never a fan of Colin Powell after his weak understanding of weapons of mass destruction in Iraq = disaster. We can do much better!',\n",
       " 'I will be interviewed on @foxandfriends tomorrow at 7am. Enjoy!',\n",
       " 'Great poll out of Nevada- thank you! See you soon. #MAGA #AmericaFirst\\n URL ',\n",
       " 'Great evening in Canton, Ohio-thank you! We are going to MAKE AMERICA GREAT AGAIN! Join us:\\n URL ',\n",
       " 'Honor to have been interviewed by the very wonderful @bishopwtjackson in Detroit last week - tune in at 9pmE. Enjoy!  URL ',\n",
       " 'Thank you Ohio! Just landed in Canton for a rally at the Civic Center. Join me at 7pm:  URL ',\n",
       " 'Thank you Florida- cant wait to see you Friday in Miami! Join me:\\n URL ',\n",
       " 'Thank you @ATFD17! #ImWithYou\\nVideo:  URL ',\n",
       " 'Great poll Florida - thank you!\\n#ImWithYou #AmericaFirst  URL ',\n",
       " 'Thank you Ohio - see you tonight!  URL ',\n",
       " 'Russia took Crimea during the so-called Obama years. Who wouldnt know this and why does Obama get a free pass?',\n",
       " 'Why isnt President Obama working instead of campaigning for Hillary Clinton?',\n",
       " 'Thank you Rep. @CynthiaLummis!\\n URL ',\n",
       " 'Thank you Rep. @MarshaBlackburn!\\n URL ',\n",
       " 'Thank you @RepReneeEllmers!\\n URL ',\n",
       " 'RT @LouDobbs: Trump outlines new child-care policy proposals via the @FoxNews App @realDonaldTrump seems a candidate of destiny  URL ',\n",
       " 'CHILD CARE REFORMS THAT WILL MAKE AMERICA GREAT AGAIN!\\nTranscript:  URL  URL ',\n",
       " 'RT @IvankaTrump: Ivanka penned an Op-Ed that ran in the @WSJ this afternoon, read it here.  URL ',\n",
       " '#ImWithYou #AmericaFirst  URL ',\n",
       " 'RT @IvankaTrump: Ivanka is joining @realDonaldTrump to outline an innovative new child care policy to support American families. Tune in to_',\n",
       " 'Thank you Clive, Iowa!\\n URL ',\n",
       " 'Join us today! Together, we will\\n#MakeAmericaGreatAgain!\\n URL ',\n",
       " 'Heading to Iowa- join me today at noon! #MakeAmericaGreatAgain\\nTickets:  URL ',\n",
       " 'Join me in Clive, Iowa tomorrow at noon! #AmericaFirst #MAGA\\nTickets:  URL ',\n",
       " '\"@brimyers813: Saw ur speech on Twitter. U give me hope and optimism. I feel as though I am in the room with u. I pray 4 ur/our success.\"',\n",
       " 'Just got back from Asheville,  North Carolina, where we had a massive rally. The spirit of the crowd was unbelievable. Thank you!  #MAGA',\n",
       " 'Stopped by @TrumpDC to thank all of the tremendous men &amp; women for their hard work!  URL ',\n",
       " 'Will be on @CNBC at @7:22. Enjoy!',\n",
       " 'I will be interviewed on @foxandfriends at 7:00 A.M.',\n",
       " '#NeverForget\\n URL ',\n",
       " 'The seriously failing @nytimes, despite so much winning and poll numbers that will soon put me in first place, only writes dishonest hits!',\n",
       " 'Hillary Clinton just had her 47% moment. What a terrible thing she said about so many great Americans!',\n",
       " 'RT @BarackObama: RT if you agree: We need a President who is fighting for all Americans, not one who writes off nearly half the country.',\n",
       " 'While Hillary said horrible things about my supporters, and while many of her supporters will never vote for me, I still respect them all!',\n",
       " 'Really sad that Republicans would allow themselves to be used in a Clinton ad. Lindsey Graham, Romney, Flake, Sass. SUPREME COURT, REMEMBER!',\n",
       " 'Wow, Hillary Clinton was SO INSULTING to my supporters, millions of amazing, hard working people. I think it will cost her at the Polls!',\n",
       " 'Will be in Missouri today with Melania for the funeral of a wonderful and truly respected woman, Phyllis S!',\n",
       " '\"@Stvzbnk: Just Watched @tonyschwartz. Obviously Tony is a Total Whack Job @realDonaldTrump\"',\n",
       " 'Just returned from Pensacola, Florida, where the crowd was incredible.',\n",
       " 'I havnt seen @tonyschwartz in many years, he hardly knows me. Never liked his style. Super lib, Crooked H supporter. Irrelevant dope!',\n",
       " 'Dummy writer @tonyschwartz, who wanted to do a second book with me for years (I said no), is now a hostile basket case who feels jilted!',\n",
       " 'Thank you Florida - we are going to MAKE AMERICA GREAT AGAIN! Join us:  URL #AmericaFirst  URL ',\n",
       " 'Will be delivering a major speech tonight - live on @oreillyfactor at 8:10pm from Pensacola, Florida.',\n",
       " 'Thank you Ohio. Together, we will MAKE AMERICA GREAT AGAIN!\\n URL ',\n",
       " 'Great honor to be endorsed by popular &amp; successful @gov_gilmore of VA. A state that I very much want to win-THX Jim!  URL ',\n",
       " 'MAKE AMERICA GREAT AGAIN!\\n#AmericaFirst #ImWithYou  URL ',\n",
       " 'Henry McMaster, Lt. Governor of South Carolina who endorsed me, beat failed @CNN announcer Bakari Sellers, so badly. Funny!',\n",
       " 'RT @EricTrump: Join @TeamTrump on Saturday for National Day of Action as we work to #MakeAmericaGreatAgain!  URL ',\n",
       " 'Jeff Zucker failed @NBC and he is now failing @CNN.',\n",
       " '.@CNN is unwatchable. Their news on me is fiction. They\\nare a disgrace to the broadcasting industry and an arm of the Clinton campaign.',\n",
       " 'The documentary of me that @CNN just aired is a total waste of time. I dont even know many of the people who spoke about me. A joke!',\n",
       " 'Final poll results from NBC on last nights Commander-in-Chief Forum. Thank you! #ImWithYou #MAGA  URL ',\n",
       " 'It wasnt Matt Lauer that hurt Hillary last night. It was her very dumb answer about emails &amp; the veteran who said she should be in jail.',\n",
       " 'More poll results from last nights Commander-in-Chief Forum.\\n#AmericaFirst #TrumpTrain  URL ',\n",
       " 'Last nights results - in poll taken by NBC. #AmericaFirst #ImWithYou  URL ',\n",
       " 'With Luis, Mexico and the United States would have made wonderful deals together - where both Mexico and the US would have benefitted.',\n",
       " 'Mexico has lost a brilliant finance minister and wonderful man who I know is highly respected by President Pe̱a Nieto.',\n",
       " 'Hillary Clinton answered email questions differently last night than she has in the past. She is totally confused. Unfit to serve as #POTUS.',\n",
       " 'Hillary just gave a disastrous news conference on the tarmac to make up for poor performance last night. Shes being decimated by the media!',\n",
       " 'RT @DanScavino: Last nights winner was clear &amp; it will be proven time &amp; time again - lets #MAGA!! Lets WIN!! #TrumpTrain  URL ',\n",
       " '\"A rough night for Hillary Clinton\"  ABC News.',\n",
       " 'Wow, reviews are in - THANK YOU!',\n",
       " 'COMING UP @GenFlynn @newtgingrich on @foxandfriends',\n",
       " 'Thank you to @foxandfriends for the nice reviews of last night.',\n",
       " 'Thank you America - great #CommanderInChiefForum polls!  URL ',\n",
       " 'Thank you to our fantastic veterans. The reviews and polls from almost everyone of my Commander-in-Chief presentation were great. Nice!',\n",
       " 'Thank you Peter - if elected, I will think big for our country &amp; never let the American people down! #AmericaFirst  URL ',\n",
       " 'Wow - thank you Pensacola, FL. See you Friday at 7pm -- join me!\\n URL ',\n",
       " '#AmericaFirst!  URL ',\n",
       " 'Donald Trump leads Hillary Clinton by 19 points among military, veteran voters: poll #AmericaFirst #MAGA\\n URL ',\n",
       " '\"@adhd_fa:Kudos to @PARISDENNARD for standing up to CNNs attempt to bully you and shout you down for defending @realDonaldTrump #media bias',\n",
       " 'I will be interviewed on @oreillyfactor tonight at 11pmE @FoxNews. Enjoy!',\n",
       " 'Thank you North Carolina- get out &amp; #VoteTrump on 11/8/2016!\\n#MakeAmericaGreatAgain  URL ',\n",
       " 'Great meeting with military spouses in Virginia- joined by @IvankaTrump, @LaraLeaTrump, @GenFlynn &amp; @MayorRGiuliani.  URL ',\n",
       " 'Thank you to all of our amazing military families, service members, and veterans. #ImWithYou  URL ',\n",
       " 'Join me in Pensacola, Florida this Friday at 7pm! #VoteTrump\\n URL ',\n",
       " 'Thank you! #VoteTrump #ImWithYou  URL ',\n",
       " 'Mainstream media never covered Hillary۪s massive hacking\\u06dd\\nor coughing attack, yet it is #1 trending. What۪s up?',\n",
       " '\"@Ler: Message for undecided voters: Please wake up and vote DonaldTrump now! Trump/Pence very important save our America before too late!\"',\n",
       " 'Thank you! #AmericaFirst  URL ',\n",
       " 'As a tribute to the late, great Phyllis Schlafly, I hope everybody can go out and get her latest book, THE CONSERVATIVE CASE FOR TRUMP.',\n",
       " 'China wouldnt provide a red carpet stairway from Air Force One and then Philippines President calls Obama \"the son of a whore.\" Terrible!',\n",
       " 'The truly great Phyllis Schlafly, who honored me with her strong endorsement for president, has passed away at 92. She was very special!',\n",
       " 'Thank you Ohio! #AmericaFirst  URL ',\n",
       " 'Heading to Youngstown, Ohio now- some great polls. #AmericaFirst  URL ',\n",
       " 'Thank you American Legion Post 610- for hosting @Mike_Pence &amp; I for a roundtable with labor leaders. #LaborDay #MAGA  URL ',\n",
       " '#LaborDay #AmericaFirst\\nVideo:  URL ',\n",
       " 'Can you believe that the Chinese would not give Obama the proper stairway to get off his plane - fight on tarmac!  URL ',\n",
       " 'President Obama &amp; Putin fail to reach deal on Syria - so what else is new? Obama is not a natural deal maker. Only makes bad deals!',\n",
       " '\"@OSPREY675: @Miami4Trump I followed you because you are a patriot &amp; support @realDonaldTrump, as do I. #MAGA by sticking together.',\n",
       " '\"@tweak626: Im at a biker rally in Perry, Kansas...and everyone is a @realDonaldTrump fan. Love it.\"',\n",
       " '\"@ronnieclemmons: @ChrisCJackson @TakouiS @realDonaldTrump  Trump now leads her by 2 - get real, she will lose big\"',\n",
       " '\"@lblackvelvet: @realDonaldTrump We need to show Americans that Hillary will KILL our Country !! Vote for Trump !!\"',\n",
       " '\"@CherNuna: @realDonaldTrump It defies belief the Web of Lies Hillary is spinning! One excuse after another. Then its this, then its that.',\n",
       " 'Lyin Hillary Clinton told the FBI that she did not know the \"C\" markings on documents stood for CLASSIFIED. How can this be happening?',\n",
       " 'To the African-American community: The Democrats have failed you for fifty years, high crime, poor schools, no jobs. I will fix it, VOTE \"T\"',\n",
       " 'The polls are close so Crooked Hillary is getting out of bed and will campaign tomorrow.Why did she hammer 13 devices and acid-wash e-mails?',\n",
       " 'The Great State of Arizona, where I just had a massive rally (amazing people), has a very weak and ineffective Senator, Jeff Flake. Sad!',\n",
       " 'The Republican Party needs strong and committed leaders, not weak people such as @JeffFlake, if it is going to stop illegal immigration.',\n",
       " 'RT @DanScavino: Doesnt fit the MSM narrative - so they wont share what @realDonaldTrump did for Jesse Jackson in 1999 - so I will! https:/_',\n",
       " '\"@AnneBellar: @realDonaldTrump @CNN CNN is so biased. Never ever watch them. Trump 2016!!\"',\n",
       " 'Crooked Hillarys V.P. pick said this morning that I was not aware that Russia took over Crimea. A total lie - and taken over during O term!',\n",
       " 'Wow, the failing @nytimes has not reported properly on Crookeds FBI release. They are at the back of the pack - no longer a credible source',\n",
       " '.@CNN is so disgusting in their bias, but they are having a hard time promoting Crooked Hillary in light of the new e-mail scandals.',\n",
       " 'Great visit to Detroit church, fantastic reception, and all @CNN talks about is a small protest outside. Inside a large and wonderful crowd!',\n",
       " 'I am returning to the Pensacola Bay Center in Florida- Friday, 9/9/16 at 7pm. Join me!  URL ',\n",
       " 'Thank you Great Faith Ministries International, Bishop Wayne T. Jackson, and Detroit!\\n URL ',\n",
       " '#ImWithYou  URL ',\n",
       " '#AmericaFirst #ImWithYou  URL ',\n",
       " 'Great new poll Iowa - thank you!\\n#MakeAmericaGreatAgain #ImWithYou  URL ',\n",
       " 'I visited our Trump Tower campaign headquarters last night, after returning from Ohio and Arizona, and it was packed with great pros - WIN!',\n",
       " 'People will be very surprised by our ground game on Nov. 8. We have an army of volunteers and people with GREAT SPIRIT! They want to #MAGA!',\n",
       " 'Just heard that crazy and very dumb @morningmika had a mental breakdown while talking about me on the low ratings @Morning_Joe. Joe a mess!',\n",
       " 'I will be interviewed by @ericbolling tonight at 8pm on the @oreillyfactor. Enjoy!',\n",
       " 'I am promising you a new legacy for America. Were going to create a new American future. Thank you OHIO! #ImWithYou  URL ',\n",
       " 'Thank you for having me this morning @AmericanLegion. I enjoyed my time with everyone! #ALConvention2016  URL ',\n",
       " 'Poll numbers way up - making big progress!',\n",
       " 'Thank you to @foxandfriends for the great review of the speech on immigration last night. Thank you also to the great people of Arizona!',\n",
       " 'Mexico will pay for the wall!',\n",
       " 'Under a Trump administration, its called #AmericaFirst! #ImWithYou\\n URL ',\n",
       " 'Hillary Clinton doesnt have the strength or the stamina to MAKE AMERICA GREAT AGAIN! #AmericaFirst\\n URL ',\n",
       " 'There will be no amnesty!\\n#MakeAmericaGreatAgain #ImWithYou\\n URL ',\n",
       " 'Mexico will pay for the wall - 100%!\\n#MakeAmericaGreatAgain #ImWithYou\\n URL ',\n",
       " 'RT @LouDobbs: We are Watching A Leader Who for the First Time in Three Presidencies Will Put America and Americans First! @realDonaldTrump_',\n",
       " 'RT @AnnCoulter: I hear Churchill had a nice turn of phrase, but Trumps immigration speech is the most magnificent speech ever given.',\n",
       " 'Just arrived in Arizona! #ImWithYou\\n URL ',\n",
       " 'Great trip to Mexico today  - wonderful leadership and high quality people! Look forward to our next meeting.',\n",
       " 'Hillary Clinton didnt go to Louisiana, and now she didnt go to Mexico. She doesnt have the drive or stamina to MAKE AMERICA GREAT AGAIN!',\n",
       " 'Former President Vicente Fox, who is railing against my visit to Mexico today, also invited me when he apologized for using the \"f bomb.\"',\n",
       " 'Thank you Washington! Together, WE will MAKE AMERICA SAFE AND GREAT AGAIN! #ImWithYou #AmericaFirst  URL ',\n",
       " 'I have accepted the invitation of President Enrique Pena Nieto, of Mexico, and look very much forward to meeting him tomorrow.',\n",
       " 'RT @RSBNetwork: We are ALREADY LIVE in Everett, WA for the Trump Rally. Come join us- our cameras tonight! #TrumpinEverett\\n\\n URL ',\n",
       " 'RT @DRUDGE_REPORT: REUTERS POLL:  CLINTON, TRUMP ALL TIED UP...  URL ',\n",
       " 'Thank you North Carolina! #MAGA  URL ',\n",
       " 'Thank you America! #MAGA\\n URL ',\n",
       " '\"@meequalsfree: Looking forward to seeing you again! Everett to be a packed house! @realDonaldTrump  @mike_pence\"',\n",
       " 'From day one I said that I was going to build a great wall on the SOUTHERN BORDER, and much more. Stop illegal immigration. Watch Wednesday!',\n",
       " 'Join me this Thursday in Wilmington, Ohio at noon! #ImWithYou\\nTickets:  URL ',\n",
       " 'Join me this Wednesday in Phoenix, Arizona at 6pm! #ImWithYou\\nTickets:  URL ',\n",
       " '#MakeAmericaGreatAgain #ImWithYou  URL ',\n",
       " 'We will repeal and replace the horrible disaster known as #Obamacare!  URL ',\n",
       " '#CrookedHillary  URL ',\n",
       " 'Now that African-Americans are seeing what a bad job Hillary type policy and management has done to the inner-cities, they want TRUMP!',\n",
       " 'Crooked Hillarys brainpower is highly overrated.Probably why her decision making is so bad or, as stated by Bernie S, she has BAD JUDGEMENT',\n",
       " 'Does anyone know that Crooked Hillary, who tried so hard, was unable to pass the Bar Exams in Washington D.C. She was forced to go elsewhere',\n",
       " '\"@PMNOrlando: @realDonaldTrump I know of NO ONE voting for Crooked Hillary! Her rallies are held in (blank)  &amp; she still has room.',\n",
       " '\"@RhondaR: Thank-You Clarence Henderson for telling @cnn you know racism &amp; its not DonaldTrump  URL ',\n",
       " '\"@Patrici: Crowd at Trump Rally in Akron, Ohio is a Sea of Women, Minorities, Independents, Dems  URL ',\n",
       " 'Inner-city crime is reaching record levels. African-Americans will vote for Trump because they know I will stop the slaughter going on!',\n",
       " 'Look how bad it is getting! How much more crime, how many more shootings, will it take for African-Americans and Latinos to vote Trump=SAFE!',\n",
       " 'I will be making a major speech on ILLEGAL IMMIGRATION on Wednesday in the GREAT State of Arizona. Big crowds, looking for a larger venue.',\n",
       " 'I think that both candidates, Crooked Hillary and myself, should release detailed medical records. I have no problem in doing so! Hillary?',\n",
       " 'Today is the 53rd anniversary of the March on Washington - today we honor the enduring fight for justice, equality and opportunity.',\n",
       " 'RT @FoxNews: Poll: @realDonaldTrump vs. @HillaryClinton among white Evangelicals.  URL ',\n",
       " 'Thank you Arizona! #VoteTrump  URL ',\n",
       " 'Join me Tuesday in Everett, Washington at the Xfinity Arena! Tickets:  URL ',\n",
       " '\"@LindaHarden: @realDonaldTrump America loves Trump and @mike_pence -- praying for you every day. Stay strong. #TrumpPence2016 #NeverHillary',\n",
       " 'NATIONAL DEBT\\nJanuary 2009 = $10.6 TRILLION\\nAugust 2016 = $19.4 TRILLION  URL ',\n",
       " 'It was an honor to have the amazing Root family join me in Iowa. I have been so inspired by their courage &amp; bravery.  URL ',\n",
       " 'Thank you Iowa! #ImWithYou  URL ',\n",
       " 'Just landed in Iowa to attend a great event in honor of wonderful Senator @JoniErnst. Look forward to being with all of my friends.',\n",
       " 'My condolences to Dwyane Wade and his family, on the loss of Nykea Aldridge. They are in my thoughts and prayers.',\n",
       " 'Dwyane Wades cousin was just shot and killed walking her baby in Chicago. Just what I have been saying. African-Americans will VOTE TRUMP!',\n",
       " '\"@GoldJazz559: #BlackMenForBernie Leader: #Hillary2016 No Regard For Black Race۪  URL ',\n",
       " '\"@DiamondandSilk: Crooked Hillary getting desperate. On TV bashing Trump. @CNN, she forgot how she said a KKK member was her mentor.',\n",
       " 'Heroin overdoses are taking over our children and others in the MIDWEST. Coming in from our southern border. We need strong border &amp; WALL!',\n",
       " 'New polls - join the MOVEMENT today.\\n URL ',\n",
       " 'Join us via our new #AmericaFirst APP! #TrumpPence16  URL ',\n",
       " 'Will be in Phoenix, Arizona on Wednesday. Changing venue to much larger one. Demand is unreal. Polls looking great! #ImWithYou',\n",
       " 'Thank you @TeamTrump Florida. Keep me updated, and lets get those 100,000 registered voters!\\n#MakeAmericaGreatAgain   URL ',\n",
       " 'I will be interviewed by @kimguilfoyle\\nat 7pm on @FoxNews. #Enjoy!',\n",
       " 'Army training slide lists Hillary Clinton as insider threat:  URL ',\n",
       " 'Meet the Trumpocrats۪: Lifelong Democrats Breaking w/ Party Over Hillary to Support Donald Trump for President:  URL ',\n",
       " '\"Hillary Clinton Deleted Emails Using Program Intended To Prevent Recovery\" #CrookedHillary  URL ',\n",
       " 'RT @DRUDGE: Watched Clinton Cash last night. Scariest movie since The Invitation!  Youve been warned   URL ',\n",
       " 'How quickly people forget that Crooked Hillary called African-American youth \"SUPER PREDATORS\" - Has she apologized?',\n",
       " 'I am very proud to have brought the subject of illegal immigration back into the discussion. Such a big problem for our country-I will solve',\n",
       " 'Wonderful @pastormarkburns was attacked viciously and unfairly on @MSNBC by crazy @morningmika on low ratings @Morning_Joe. Apologize!',\n",
       " 'What do African-Americans and Hispanics have to lose by going with me. Look at the poverty, crime and educational statistics. I will fix it!',\n",
       " 'Crooked Hillary will NEVER be able to solve the problems of poverty, education and safety within the African-American &amp; Hispanic communities',\n",
       " 'The Clintons are the real predators...\\n URL ',\n",
       " '\"@Lewenskimo: Your opponent has run out of ideas, now resorts to personal attacks on you. Every Amercan knows, you represent HOPE!!\"',\n",
       " '\"Hillary Clinton needs to address the racist undertones of her 2008 campaign.\" #FlashbackFriday  URL ',\n",
       " '\"@DonaldJTrumpJr: Company Gouging Price Of EpiPens Is A Clinton Foundation Donor And Partner  URL ',\n",
       " '\"@foxnation: Flashback: Hillary Clinton Praised Former KKK Member Robert Byrd as Friend and Mentor:  URL ',\n",
       " 'CLINTON CORRUPTION AND HER SABOTAGE OF THE INNER CITIES.\\nFull speech transcript:  URL ',\n",
       " ...]"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list_of_url_less_tweets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Challenge\n",
    "\n",
    "Use the regular expression for hashtags below to replace all hashtags in all tweets in `tweet_text`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "hashtag_pattern = r'(?:^|\\s)[＃#]{1}(\\w+)'\n",
    "HASHTAG_SIGN = ' HASHTAG '\n",
    "digit_pattern = '\\d+'\n",
    "DIGIT_SIGN = ' DIGIT '"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### OOV words\n",
    "\n",
    "Sometimes it's best for us to remove infrequent words (sometimes not!). When we do remove infrequent words, it's often for a downstream method (like classification) that is sensitive to rare words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Today',\n",
       " 'we',\n",
       " 'express',\n",
       " 'our',\n",
       " 'deepest',\n",
       " 'gratitude',\n",
       " 'to',\n",
       " 'all',\n",
       " 'those',\n",
       " 'who',\n",
       " 'have',\n",
       " 'served',\n",
       " 'in',\n",
       " 'our',\n",
       " 'armed',\n",
       " 'forces',\n",
       " 'HASHTAG',\n",
       " 'URL',\n",
       " 'HASHTAG',\n",
       " 'HASHTAG']"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_tweets = ' '.join(tweet_text)\n",
    "clean = re.sub(url_pattern, URL_SIGN, all_tweets)\n",
    "clean = re.sub(hashtag_pattern, HASHTAG_SIGN, clean)\n",
    "clean = re.sub(digit_pattern, DIGIT_SIGN, clean)\n",
    "tokens = word_tokenize(clean)\n",
    "tokens = [token for token in tokens if token not in punctuation]\n",
    "tokens[:20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can count the frequency of each word type with the built-in `Counter` in Python. This basically just takes the set of word types (we calculated this above as `vocabularly`) and makes a special Python dictionary with each value being the number of times it appears in the list. We can ask that dictionary for the most common words, or for the frequency of individual word types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('URL', 932),\n",
       " ('HASHTAG', 717),\n",
       " ('DIGIT', 258),\n",
       " ('the', 87),\n",
       " ('in', 76),\n",
       " ('to', 72),\n",
       " ('of', 61),\n",
       " ('you', 57),\n",
       " ('I', 56),\n",
       " ('is', 54)]"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import Counter\n",
    "freq = Counter(tokens)\n",
    "freq.most_common(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "freq['unleashed']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "OOV = 'OOV'\n",
    "new_tokens = []\n",
    "for token in tokens:\n",
    "    if freq[token] == 1:\n",
    "        new_tokens.append(OOV)\n",
    "    else:\n",
    "        new_tokens.append(token)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['OOV',\n",
       " 'we',\n",
       " 'OOV',\n",
       " 'our',\n",
       " 'OOV',\n",
       " 'OOV',\n",
       " 'to',\n",
       " 'all',\n",
       " 'those',\n",
       " 'who',\n",
       " 'have',\n",
       " 'OOV',\n",
       " 'in',\n",
       " 'our',\n",
       " 'OOV',\n",
       " 'OOV',\n",
       " 'HASHTAG',\n",
       " 'URL',\n",
       " 'HASHTAG',\n",
       " 'HASHTAG']"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_tokens[:20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "I've read in some Amazon reviews from earlier into a list called `reviews`. Each element of the list is a string, representing the text of a single review. Try to:\n",
    "- Tokenize each review\n",
    "- Separate each review into sentences\n",
    "- Strip all whitespace\n",
    "- Make all characters lower case\n",
    "- Replace any URLs and digits\n",
    "\n",
    "Then find the most common 50 words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "fnames = os.path.join(DATA_DIR, 'amazon', '*.csv')\n",
    "fnames = glob.glob(fnames)\n",
    "reviews = []\n",
    "column_names = ['id', 'product_id', 'user_id', 'profile_name', 'helpfulness_num', 'helpfulness_denom',\n",
    "               'score', 'time', 'summary', 'text']\n",
    "for fname in fnames[:2]:\n",
    "    df = pd.read_csv(fname, names=column_names)\n",
    "    text = list(df['text'])\n",
    "    reviews.extend(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Text',\n",
       " 'I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.',\n",
       " 'Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as \"Jumbo\".']"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews[:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Removing stop words\n",
    "\n",
    "You might have noticed that the most common words above aren't terribly exciting. They're words like \"am\", \"i\", \"the\" and \"a\": stop words. These are rarely useful to us in computational text analysis, so it's very common to remove them completely.\n",
    "\n",
    "- What other stop words do you think there are?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['i',\n",
       " 'me',\n",
       " 'my',\n",
       " 'myself',\n",
       " 'we',\n",
       " 'our',\n",
       " 'ours',\n",
       " 'ourselves',\n",
       " 'you',\n",
       " 'your',\n",
       " 'yours',\n",
       " 'yourself',\n",
       " 'yourselves',\n",
       " 'he',\n",
       " 'him',\n",
       " 'his',\n",
       " 'himself',\n",
       " 'she',\n",
       " 'her',\n",
       " 'hers',\n",
       " 'herself',\n",
       " 'it',\n",
       " 'its',\n",
       " 'itself',\n",
       " 'they',\n",
       " 'them',\n",
       " 'their',\n",
       " 'theirs',\n",
       " 'themselves',\n",
       " 'what',\n",
       " 'which',\n",
       " 'who',\n",
       " 'whom',\n",
       " 'this',\n",
       " 'that',\n",
       " 'these',\n",
       " 'those',\n",
       " 'am',\n",
       " 'is',\n",
       " 'are',\n",
       " 'was',\n",
       " 'were',\n",
       " 'be',\n",
       " 'been',\n",
       " 'being',\n",
       " 'have',\n",
       " 'has',\n",
       " 'had',\n",
       " 'having',\n",
       " 'do',\n",
       " 'does',\n",
       " 'did',\n",
       " 'doing',\n",
       " 'a',\n",
       " 'an',\n",
       " 'the',\n",
       " 'and',\n",
       " 'but',\n",
       " 'if',\n",
       " 'or',\n",
       " 'because',\n",
       " 'as',\n",
       " 'until',\n",
       " 'while',\n",
       " 'of',\n",
       " 'at',\n",
       " 'by',\n",
       " 'for',\n",
       " 'with',\n",
       " 'about',\n",
       " 'against',\n",
       " 'between',\n",
       " 'into',\n",
       " 'through',\n",
       " 'during',\n",
       " 'before',\n",
       " 'after',\n",
       " 'above',\n",
       " 'below',\n",
       " 'to',\n",
       " 'from',\n",
       " 'up',\n",
       " 'down',\n",
       " 'in',\n",
       " 'out',\n",
       " 'on',\n",
       " 'off',\n",
       " 'over',\n",
       " 'under',\n",
       " 'again',\n",
       " 'further',\n",
       " 'then',\n",
       " 'once',\n",
       " 'here',\n",
       " 'there',\n",
       " 'when',\n",
       " 'where',\n",
       " 'why',\n",
       " 'how',\n",
       " 'all',\n",
       " 'any',\n",
       " 'both',\n",
       " 'each',\n",
       " 'few',\n",
       " 'more',\n",
       " 'most',\n",
       " 'other',\n",
       " 'some',\n",
       " 'such',\n",
       " 'no',\n",
       " 'nor',\n",
       " 'not',\n",
       " 'only',\n",
       " 'own',\n",
       " 'same',\n",
       " 'so',\n",
       " 'than',\n",
       " 'too',\n",
       " 'very',\n",
       " 's',\n",
       " 't',\n",
       " 'can',\n",
       " 'will',\n",
       " 'just',\n",
       " 'don',\n",
       " 'should',\n",
       " 'now',\n",
       " 'd',\n",
       " 'll',\n",
       " 'm',\n",
       " 'o',\n",
       " 're',\n",
       " 've',\n",
       " 'y',\n",
       " 'ain',\n",
       " 'aren',\n",
       " 'couldn',\n",
       " 'didn',\n",
       " 'doesn',\n",
       " 'hadn',\n",
       " 'hasn',\n",
       " 'haven',\n",
       " 'isn',\n",
       " 'ma',\n",
       " 'mightn',\n",
       " 'mustn',\n",
       " 'needn',\n",
       " 'shan',\n",
       " 'shouldn',\n",
       " 'wasn',\n",
       " 'weren',\n",
       " 'won',\n",
       " 'wouldn']"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from nltk.corpus import stopwords\n",
    "stop = stopwords.words('english')\n",
    "stop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "Use the list `stop` of English stopwords to remove stopwords from our dataset of Tweets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Today',\n",
       " 'we',\n",
       " 'express',\n",
       " 'our',\n",
       " 'deepest',\n",
       " 'gratitude',\n",
       " 'to',\n",
       " 'all',\n",
       " 'those',\n",
       " 'who',\n",
       " 'have',\n",
       " 'served',\n",
       " 'in',\n",
       " 'our',\n",
       " 'armed',\n",
       " 'forces',\n",
       " 'HASHTAG',\n",
       " 'URL',\n",
       " 'HASHTAG',\n",
       " 'HASHTAG']"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_tweets = ' '.join(tweet_text)\n",
    "clean = re.sub(url_pattern, URL_SIGN, all_tweets)\n",
    "clean = re.sub(hashtag_pattern, HASHTAG_SIGN, clean)\n",
    "clean = re.sub(digit_pattern, DIGIT_SIGN, clean)\n",
    "tokens = word_tokenize(clean)\n",
    "tokens = [token for token in tokens if token not in punctuation]\n",
    "tokens[:20]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Stemming/lemmatization\n",
    "\n",
    "Stemming and lemmatization both refer to remove morphological affixes on words. For example, if we stem the word \"grows\", we get \"grow\". If we stem the word \"running\", we get \"run\". We do this because often we care more about the core content of the word (i.e. that it has something to do with growth or running, rather than the fact that it's a third person present tense verb, or progressive participle).\n",
    "\n",
    "NLTK provides many algorithms for stemming. For English, a great baseline is the [Porter](https://github.com/nltk/nltk/blob/develop/nltk/stem/porter.py) algorithm, which is in spirit isn't that far from a bunch of regular expressions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nltk.stem import PorterStemmer\n",
    "stemmer = PorterStemmer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'grow'"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stemmer.stem('grows')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'run'"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stemmer.stem('running')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'leav'"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stemmer.stem('leaves')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nltk.stem import SnowballStemmer, WordNetLemmatizer\n",
    "snowballer_stemmer = SnowballStemmer('english')\n",
    "lemmatizer = WordNetLemmatizer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "run\n",
      "leav\n"
     ]
    }
   ],
   "source": [
    "print(snowballer_stemmer.stem('running'))\n",
    "print(snowballer_stemmer.stem('leaves'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "leaf\n"
     ]
    }
   ],
   "source": [
    "print(lemmatizer.lemmatize('leaves'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "Use the Porter stemmer to stem each word in the tweet dataset after having removed stop words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## POS tagging\n",
    "\n",
    "POS tagging means assigning each token a part-of-speech (e.g. noun, verb, adjective, etc.). Again, there are many different [alternatives](https://github.com/nltk/nltk/tree/develop/nltk/tag), but NLTK keeps its recommended POS tagger available through the function `pos_tag`. The tagger expects a list of tokens as input.When doing POS tagging, it is advisable **not** to remove stop words beforehand (although you are free to do it afterwards)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis\\' \"The Lion, The Witch, and The Wardrobe\" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.'"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from nltk import pos_tag\n",
    "single_review = reviews[3]\n",
    "single_review"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('This', 'DT'),\n",
       " ('is', 'VBZ'),\n",
       " ('a', 'DT'),\n",
       " ('confection', 'NN'),\n",
       " ('that', 'WDT'),\n",
       " ('has', 'VBZ'),\n",
       " ('been', 'VBN'),\n",
       " ('around', 'IN'),\n",
       " ('a', 'DT'),\n",
       " ('few', 'JJ'),\n",
       " ('centuries', 'NNS'),\n",
       " ('.', '.'),\n",
       " ('It', 'PRP'),\n",
       " ('is', 'VBZ'),\n",
       " ('a', 'DT'),\n",
       " ('light', 'JJ'),\n",
       " (',', ','),\n",
       " ('pillowy', 'JJ'),\n",
       " ('citrus', 'NN'),\n",
       " ('gelatin', 'NN'),\n",
       " ('with', 'IN'),\n",
       " ('nuts', 'NNS'),\n",
       " ('-', ':'),\n",
       " ('in', 'IN'),\n",
       " ('this', 'DT'),\n",
       " ('case', 'NN'),\n",
       " ('Filberts', 'NNP'),\n",
       " ('.', '.'),\n",
       " ('And', 'CC'),\n",
       " ('it', 'PRP'),\n",
       " ('is', 'VBZ'),\n",
       " ('cut', 'VBN'),\n",
       " ('into', 'IN'),\n",
       " ('tiny', 'JJ'),\n",
       " ('squares', 'NNS'),\n",
       " ('and', 'CC'),\n",
       " ('then', 'RB'),\n",
       " ('liberally', 'RB'),\n",
       " ('coated', 'VBN'),\n",
       " ('with', 'IN'),\n",
       " ('powdered', 'JJ'),\n",
       " ('sugar', 'NN'),\n",
       " ('.', '.'),\n",
       " ('And', 'CC'),\n",
       " ('it', 'PRP'),\n",
       " ('is', 'VBZ'),\n",
       " ('a', 'DT'),\n",
       " ('tiny', 'JJ'),\n",
       " ('mouthful', 'NN'),\n",
       " ('of', 'IN'),\n",
       " ('heaven', 'NN'),\n",
       " ('.', '.'),\n",
       " ('Not', 'RB'),\n",
       " ('too', 'RB'),\n",
       " ('chewy', 'JJ'),\n",
       " (',', ','),\n",
       " ('and', 'CC'),\n",
       " ('very', 'RB'),\n",
       " ('flavorful', 'JJ'),\n",
       " ('.', '.'),\n",
       " ('I', 'PRP'),\n",
       " ('highly', 'RB'),\n",
       " ('recommend', 'VBP'),\n",
       " ('this', 'DT'),\n",
       " ('yummy', 'JJ'),\n",
       " ('treat', 'NN'),\n",
       " ('.', '.'),\n",
       " ('If', 'IN'),\n",
       " ('you', 'PRP'),\n",
       " ('are', 'VBP'),\n",
       " ('familiar', 'JJ'),\n",
       " ('with', 'IN'),\n",
       " ('the', 'DT'),\n",
       " ('story', 'NN'),\n",
       " ('of', 'IN'),\n",
       " ('C.S', 'NNP'),\n",
       " ('.', '.'),\n",
       " ('Lewis', 'NNP'),\n",
       " (\"'\", 'POS'),\n",
       " ('``', '``'),\n",
       " ('The', 'DT'),\n",
       " ('Lion', 'NNP'),\n",
       " (',', ','),\n",
       " ('The', 'DT'),\n",
       " ('Witch', 'NNP'),\n",
       " (',', ','),\n",
       " ('and', 'CC'),\n",
       " ('The', 'DT'),\n",
       " ('Wardrobe', 'NNP'),\n",
       " (\"''\", \"''\"),\n",
       " ('-', ':'),\n",
       " ('this', 'DT'),\n",
       " ('is', 'VBZ'),\n",
       " ('the', 'DT'),\n",
       " ('treat', 'NN'),\n",
       " ('that', 'WDT'),\n",
       " ('seduces', 'VBZ'),\n",
       " ('Edmund', 'NNP'),\n",
       " ('into', 'IN'),\n",
       " ('selling', 'VBG'),\n",
       " ('out', 'RP'),\n",
       " ('his', 'PRP$'),\n",
       " ('Brother', 'NN'),\n",
       " ('and', 'CC'),\n",
       " ('Sisters', 'NNP'),\n",
       " ('to', 'TO'),\n",
       " ('the', 'DT'),\n",
       " ('Witch', 'NNP'),\n",
       " ('.', '.')]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokens = word_tokenize(single_review)\n",
    "tagged_review = pos_tag(tokens)\n",
    "tagged_review"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "\n",
    "Below I've read in the text of Austen's _Pride and Prejudice_ into a variable called `pride`. Preprocess using the following steps:\n",
    "\n",
    "- Strip whitespace\n",
    "- Replace all numbers with '0'\n",
    "- Tokenize\n",
    "- Tag each token with a POS tag\n",
    "\n",
    "Make sure you know:\n",
    "- What type is the result?\n",
    "- What type is each element of the result?\n",
    "- What type are the elements of the elements of the result?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname = 'pride-and-prejudice.txt'\n",
    "fname = os.path.join(DATA_DIR, fname)\n",
    "with open(fname, encoding='utf-8') as f:\n",
    "    raw = f.read()\n",
    "pride = raw[679:684814]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DTM/TF-IDF\n",
    "\n",
    "Document term matrix and Term Frequency-Inverse Document Frequency are common preprocessing steps for taking tokenized texts and turning them into numerical features, ready for supervised machine learning models. Scikit-learn is the standard method of using DTM and TF-IDF in Python. They have two main classes for this: [CountVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html) and [TfidfVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Today we express our deepest gratitude to all those who have served in our armed forces. HASHTAG URL ',\n",
       " 'Busy day planned in New York. Will soon be making some very important decisions on the people who will be running our government!',\n",
       " 'Love the fact that the small groups of protesters last night have passion for our great country. We will all come together and be proud!',\n",
       " 'Just had a very open and successful presidential election. Now professional protesters, incited by the media, are protesting. Very unfair!']"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clean = [re.sub(url_pattern, URL_SIGN, t) for t in tweet_text]\n",
    "clean = [re.sub(hashtag_pattern, HASHTAG_SIGN, t) for t in clean]\n",
    "clean = [re.sub(digit_pattern, DIGIT_SIGN, t) for t in clean]\n",
    "clean = [re.sub(whitespace_pattern, ' ', t) for t in clean]\n",
    "clean[:4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<7375x10046 sparse matrix of type '<class 'numpy.int64'>'\n",
       "\twith 113679 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\n",
    "count = CountVectorizer()\n",
    "X = count.fit_transform(clean)\n",
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0, 0, 0, 0, 0],\n",
       "       [0, 0, 0, 0, 0],\n",
       "       [0, 0, 0, 0, 0],\n",
       "       [0, 0, 0, 0, 0],\n",
       "       [0, 0, 0, 0, 0]], dtype=int64)"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.toarray()[:5,:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/feature_extraction/text.py:1059: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
      "  if hasattr(X, 'dtype') and np.issubdtype(X.dtype, np.float):\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<7375x10046 sparse matrix of type '<class 'numpy.float64'>'\n",
       "\twith 113679 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tfidf = TfidfVectorizer()\n",
    "X = tfidf.fit_transform(clean)\n",
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0., 0.]])"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.toarray()[:5,:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Things we didn't cover\n",
    "\n",
    "- Named entity recognition\n",
    "- Syntactic parsing\n",
    "- Information extraction\n",
    "- Removing markup from HTML\n",
    "- Extracting numerical features\n",
    "- SpaCy"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}