{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Assignment 01 - Mongo\n",
    "*DBMS for Analytics*\n",
    "\n",
    "**Due: Tuesday, February 10th, at midnight**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Notes:*\n",
    " - All code should follow the PEP 8 Style Guide for Python\n",
    " - Assignment should be submitted using jupyter notebooks\n",
    "  - File name should follow “{Your Name} – Assignment_XX_Submission”\n",
    "  - Each Task should be contained in its own cell\n",
    "  - Each Task should be properly commented\n",
    "  - Each Task should print out the answer to the Task if appropriate\n",
    "  - Sample submission can be found on blackboard\n",
    "  - **Not following these standards may result in lost points**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Assignment Description\n",
    "For this assignment we are going to use our Mongo database of Elon Musk tweets to identify whether Elon is more/less active on twitter during the weeks leading up to a major event.\n",
    "\n",
    "To accomplish this, we will need to implement the following:\n",
    " 1. Create a method to pull tweets that occured within a set timerange from a given date\n",
    " 2. Create a method to randomly pull ranges of tweets (when blacking out certain ranges)\n",
    " 3. Pull data from each respective approach to create event_tweets and non_event_tweet datasets\n",
    " 4. Use a variety of analysis tools to determine the relationship between the two datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from datetime import datetime, timedelta\n",
    "\n",
    "from pymongo import MongoClient\n",
    "\n",
    "client = MongoClient(host='18.219.151.47', #host is the hostname for the database\n",
    "                     port=27017, #port is the port number that mongo is running on\n",
    "                     username='student', #username for the db\n",
    "                     password='emse6992pass', #password for the db\n",
    "                     authSource='emse6992') #Since our user only exists for the emse6992 db, we need to specify this\n",
    "\n",
    "db = client.emse6992\n",
    "stats_coll = db.twitter_statuses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 1\n",
    "Write a function, ***get_tweets_daterange(screen_name, date, days_before, days_after)***, that takes a user's **screen_name** and will pull all tweets that occur *x* **days_before** and *y* **days_after** the provided **date** from the ***twitter_statuses*** collection for the given user.\n",
    "\n",
    "Example:\n",
    "`get_tweets_daterange('elonmusk', \"2020-10-28\", 14, 14)` would get all tweets made by *elonmusk* between 2020-10-14 - 2020-11-11 from the ***twitter_statuses*** collection, which should return 33 tweets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_tweets_daterange(screen_name, date, days_before, days_after):\n",
    "    \"\"\"This function returns a list of tweets that fall inbetween date-days_before\n",
    "    and date + days_after\n",
    "    \n",
    "    Args:\n",
    "        screen_name (str): screen name of twitter user\n",
    "        date (str): Date in the 'YYYY-MM-DD' format\n",
    "        days_before (int): number of days prior to `date` to consider\n",
    "        days_after (int): number of days after `date` to consider\n",
    "        \n",
    "    Returns:\n",
    "        list: A list of all the tweets that meet the conditions\n",
    "    \"\"\"\n",
    "    \n",
    "    # Your code\n",
    "    date_input = datetime.strptime(date, '%Y-%m-%d')\n",
    "    date_start = date_input - timedelta(days=days_before)\n",
    "    date_end = date_input + timedelta(days=days_after)\n",
    "\n",
    "    doc_date = stats_coll.find({\n",
    "        \"$and\": [\n",
    "            {'user.screen_name': screen_name},\n",
    "            {\"created_at\": {\"$gte\": date_start}},\n",
    "            {\"created_at\": {\"$lte\": date_end}}\n",
    "        ]\n",
    "    })\n",
    "    \n",
    "    return doc_date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-3-9ddbe4438f08>:2: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.\n",
      "  fi.count()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "33"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fi = get_tweets_daterange('elonmusk', \"2020-10-28\", 14, 14)\n",
    "fi.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 2\n",
    "Write a function, ***get_random_date(min_date, max_date)***, that will generate a random date that falls within **min_date** and **max_date**.\n",
    "\n",
    "Ensure the output of this function conforms to the _\"YYYY-MM-DD\"_ format used for the **get_tweets_daterange()**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Space work\n",
    "random.seed(10) # To provide consistent randomness\n",
    "\n",
    "def get_random_date(min_date, max_date):\n",
    "    \"\"\"This function returns a random date inbetween min_date and max_date\n",
    "    \n",
    "    Args:\n",
    "        min_date (str): Earliset date to consider - in the 'YYYY-MM-DD' format\n",
    "        max_date (str): Latest date to consider - in the 'YYYY-MM-DD' format\n",
    "        \n",
    "    Returns:\n",
    "        str: Random date - in the 'YYYY-MM-DD' format\n",
    "    \"\"\"\n",
    "    \n",
    "    # Your code\n",
    "    start_date = datetime.strptime(min_date, '%Y-%m-%d')\n",
    "    end_date = datetime.strptime(max_date, '%Y-%m-%d')\n",
    "\n",
    "    num_dates = (end_date - start_date).days\n",
    "    \n",
    "    random_number = random.randrange(num_dates)\n",
    "    random_date = start_date + timedelta(days=random_number)\n",
    "    \n",
    "    date_str = random_date.strftime(\"%Y-%m-%d\")\n",
    "    \n",
    "    return date_str"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'2020-12-03'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b = get_random_date(\"2020-10-28\", \"2020-12-28\")\n",
    "b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 3\n",
    "Using the two functions from task 1 & 2, create two datasets using `days_before = 14`, `dayse_after = 7` and `screen_name = 'elonmusk'`.\n",
    " - `major_events_dataset` - a collection of **get_tweets_daterange()** for the provided **major_events**\n",
    " - `random_events_dataset` - a collection of **get_tweets_daterange()** for at least 10 randomly generated dats\n",
    "   - Use `min_date = \"2020-01-01\"` and `max_date = \"2021-01-01\"` for all calls to **get_random_date()**\n",
    "   \n",
    "Each dataset should be a python list where each element in the list is a result from **get_tweets_daterange()**\n",
    "\n",
    "Example:\n",
    "```python\n",
    "major_events_dataset = [\n",
    "    get_tweets_daterange(...,\"2020-10-28\",...),\n",
    "    get_tweets_daterange(...,\"2020-05-30\",...),\n",
    "    ....\n",
    "]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "major_events = [\"2020-10-28\", \"2020-05-30\", \"2021-01-07\"]\n",
    "major_events_dataset=[]\n",
    "random_events_dataset=[]\n",
    "\n",
    "for i in major_events:\n",
    "    major_events_dataset += get_tweets_daterange('elonmusk', i, 14, 7)\n",
    "\n",
    "list_date = []\n",
    "for i in range(10):\n",
    "    list_date.append(get_random_date(\"2020-01-01\", \"2021-01-01\"))\n",
    "\n",
    "for i in list_date:\n",
    "    random_events_dataset += get_tweets_daterange('elonmusk', i, 14, 7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['2020-10-22',\n",
       " '2020-01-08',\n",
       " '2020-04-15',\n",
       " '2020-08-24',\n",
       " '2020-09-08',\n",
       " '2020-05-22',\n",
       " '2020-11-30',\n",
       " '2020-03-23',\n",
       " '2020-01-18',\n",
       " '2020-09-23']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list_date"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 4\n",
    "From each dataset, create a histogram of the average tweets/week for each daterange.\n",
    "\n",
    "This means the histogram for the major_events_dataset would only comprise of 3 values:\n",
    " - The avg. tweets/week for 2020-10-28, 2020-05-30, and 20201-01-07.\n",
    " \n",
    "For the major events you should get `[5.666666666666667, 14.333333333333334, 7.666666666666667]`\n",
    "\n",
    "_Note: For this you are welcome to use matplotlib, seaborn, or any other python plotting package._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-9-af041dd35f86>:4: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.\n",
      "  avg_tweet_final += [avg_tweet_num.count()/3]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[5.666666666666667, 14.333333333333334, 7.666666666666667]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "avg_tweet_final = []\n",
    "for i in major_events:\n",
    "    avg_tweet_num = get_tweets_daterange('elonmusk', i, 14, 7)\n",
    "    avg_tweet_final += [avg_tweet_num.count()/3]\n",
    "\n",
    "avg_tweet_final"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-10-7e82e139db28>:4: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.\n",
      "  avg_tweet_final_2 += [avg_tweet_num_2.count()/3]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[6.0,\n",
       " 0.0,\n",
       " 9.333333333333334,\n",
       " 13.666666666666666,\n",
       " 11.333333333333334,\n",
       " 16.666666666666668,\n",
       " 9.333333333333334,\n",
       " 11.0,\n",
       " 0.0,\n",
       " 10.333333333333334]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "avg_tweet_final_2 = []\n",
    "for i in list_date:\n",
    "    avg_tweet_num_2 = get_tweets_daterange('elonmusk', i, 14, 7)\n",
    "    avg_tweet_final_2 += [avg_tweet_num_2.count()/3]\n",
    "\n",
    "avg_tweet_final_2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-11-2072104b1784>:8: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.\n",
      "  avg_tweet_final += [avg_tweet_num.count()/3]\n",
      "<ipython-input-11-2072104b1784>:12: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.\n",
      "  avg_tweet_final_2 += [avg_tweet_num_2.count()/3]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAd4AAAD3CAYAAACzSjWJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQGElEQVR4nO3df4wc9XnH8c8n3JlrwA5gb5NcDmIjUFRMgFRnfqSkJtgKyFXc0jYt/ZGkAsdqhFJXLQgkqlIq1BRVFU1DIuX+qKACQhW1QJNUEAxcsPlhctA2ElZTFLkR55bo8C9Ik6vN8fSPnYPN3e7denbuufP4/ZJWNzsz35lnj+f2s7MzHhwRAgAAOd6x2AUAAHA8IXgBAEhE8AIAkIjgBQAgEcELAECivoXewapVq2L16tULvRsAAJaM559//tWIaLRbtuDBu3r1ao2NjS30bgAAWDJs/6DTMr5qBgAgEcELAEAighcAgEQLfo4XAICjdeTIEY2Pj2tycnKxS5nTwMCAhoaG1N/f3/UYghcAsOSMj49r+fLlWr16tWwvdjltRYT27dun8fFxrVmzputxfNUMAFhyJicntXLlyiUbupJkWytXrjzqo3KCFwCwJC3l0J1WpkaCFwCARJzjBQAsefatlW4v4pau1nv44Ye1bds2TU1NacuWLbrpppt63jfBC3RQ9R86jl63b47AQpiamtJ1112nRx99VENDQ1q3bp02b96sc845p6ft8lUzAABtPPfcczrrrLN05plnatmyZbr66qv10EMP9bzdeYPX9qDtF2xP2u5rmf9rtl/uuQIAAJagvXv36vTTT3/r+dDQkPbu3dvzdrs54t0vaYOkZ2fM/3VJBC8AoJYiYta8Kq60njd4I2IyIg7M2PEvSXpU0ps9VwAAwBI0NDSkl19++/hyfHxcg4ODPW+37DneT0u6p9NC21ttj9kem5iYKLkLAAAWz7p16/TSSy9pz549Onz4sO6//35t3ry55+0e9VXNti+X9HREHO50yB0RI5JGJGl4eHj2sToAAEdhMa5w7+vr05133qkrrrhCU1NTuuaaa7R27dret1tizLmSNtu+UtJa27dFxJ/0XAkAAEvMpk2btGnTpkq32c1Vzf22t0s6X9IjknZFxOURcaWkFwldAAC6N+8Rb0QckbSxw7JLK68IAIAa4wYaAIAlqd0/51lqytRI8AIAlpyBgQHt27dvSYfv9P+Pd2Bg4KjGca9mAMCSMzQ0pPHxcS31f5I6MDCgoaGhoxpD8AIAlpz+/n6tWbNmsctYEHzVDABAIoIXAIBEBC8AAIkIXgAAEhG8AAAkIngBAEhE8AIAkIjgBQAgEcELAEAighcAgEQELwAAiQheAAASEbwAACQieAEASETwAgCQiOAFACARwQsAQKJ5g9f2oO0XbE/a7rO9xvYO20/avs/2CRmFAgBQB90c8e6XtEHSs8Xzg5I+HhG/KGmPpE0LUxoAAPXTN98KETEpadL29PMDLYvfkDQ1c4ztrZK2StIZZ5xRSaEAANRB6XO8tgclbZT0rZnLImIkIoYjYrjRaPRSHwAAtVIqeG2fKOluSZ+JiDeqLQkAgPoqe8Q7IunLEbG7ymIAAKi7bq5q7re9XdL5kh6xvV7Sr0raZnvU9lULXSQAAHXRzcVVR9Q8l9tq+cKUAwBAvXEDDQAAEhG8AAAkIngBAEhE8AIAkIjgBQAgEcELAEAighcAgEQELwAAiQheAAASEbwAACQieAEASETwAgCQiOAFACARwQsAQCKCFwCARAQvAACJCF4AABIRvAAAJCJ4AQBIRPACAJCI4AUAING8wWt70PYLtidt9xXzbrC90/a9tvsXvkwAAOqhmyPe/ZI2SHpWkmw3JH00Ii6V9F1Jv7Jg1QEAUDPzBm9ETEbEgZZZF0oaLaa3S7p45hjbW22P2R6bmJiopFAAAOqgzDneUyS9VkwfknTqzBUiYiQihiNiuNFo9FAeAAD1UiZ4D0paUUyvKJ4DAIAulAne70haX0xvVHHuFwAAzK+bq5r7bW+XdL6kRyStkfSk7Z2SLpD04EIWCABAnfTNt0JEHFHzyLbVLkm3L0hFAADUGDfQAAAgEcELAEAighcAgEQELwAAiQheAAASEbwAACQieAEASETwAgCQiOAFACARwQsAQCKCFwCARAQvAACJCF4AABIRvAAAJCJ4AQBIRPACAJCI4AUAIBHBCwBAIoIXAIBEBC8AAIkIXgAAEvWVGWT7nZK+JukkSYck/UZE/F+VhQEAUEdlj3ivlLQrIi6T9FzxHAAAzKNs8H5f0onF9CmS9rUutL3V9pjtsYmJiR7KAwCgXsoG70uSLrL9oqRhSU+3LoyIkYgYjojhRqPRa40AANRG2eD9tKRHImKtpG9K+t3qSgIAoL7KBq8l7S+mX5X0rmrKAQCg3kpd1SzpPkn/YPuTko5I+s3qSgIAoL5KBW9EHJR0RbWlAMCxyb51sUuApIhbFruErnADDQAAEhG8AAAkIngBAEhE8AIAkIjgBQAgEcELAEAighcAgEQELwAAiQheAAASEbwAACQieAEASETwAgCQiOAFACARwQsAQCKCFwCARAQvAACJCF4AABIRvAAAJCJ4AQBIRPACAJCodPDa/pTtx2yP2n5flUUBAFBXfWUGFUG7PiI2VFwPAAC1VvaI9wpJJxRHvF+0fUKVRQEAUFdlg/fdkpYVR7w/lvTLrQttb7U9ZntsYmKi1xoBAKiNssF7SNK3i+nHJf1c68KIGImI4YgYbjQavdQHAECtlA3epyWdV0xfIGlPJdUAAFBzpS6uioh/s/0T26OSXpV0R6VVAQBQU6WCV5Ii4voqCwEA4HhQOniPRfati10CJEXcstglAMCi4c5VAAAkIngBAEhE8AIAkIjgBQAgEcELAEAighcAgEQELwAAiQheAAASEbwAACQieAEASETwAgCQiOAFACARwQsAQCKCFwCARAQvAACJCF4AABIRvAAAJCJ4AQBIRPACAJCI4AUAIFFPwWv7j2zvrKoYAADqrnTw2j5R0vkV1gIAQO31csS7RdLdVRUCAMDxoFTw2u6XtD4iHu+wfKvtMdtjExMTPRUIAECdlD3i/aSk+zotjIiRiBiOiOFGo1FyFwAA1E/Z4P2ApM/afljSWtufq7AmAABqq6/MoIi4cXra9s6I+GJ1JQEAUF89/zveiLi0ikIAADgecAMNAAASEbwAACQieAEASETwAgCQiOAFACARwQsAQCKCFwCARAQvAACJCF4AABIRvAAAJCJ4AQBIRPACAJCI4AUAIBHBCwBAIoIXAIBEBC8AAIkIXgAAEhG8AAAkIngBAEhE8AIAkIjgBQAgUangtX2R7adt77B9R9VFAQBQV2WPeH8g6fKI+Iikn7X9wQprAgCgtvrKDIqIV1qeviFpqppyAACot57O8do+T9KqiNg9Y/5W22O2xyYmJnoqEACAOikdvLZPk3SnpGtnLouIkYgYjojhRqPRS30AANRK2Yur+iTdI+mGGV87AwCAOZQ94v2EpHWSbrc9avuSCmsCAKC2yl5c9VVJX624FgAAao8baAAAkIjgBQAgEcELAEAighcAgEQELwAAiQheAAASEbwAACQieAEASETwAgCQiOAFACARwQsAQCKCFwCARAQvAACJCF4AABIRvAAAJCJ4AQBIRPACAJCI4AUAIBHBCwBAIoIXAIBEBC8AAIlKB6/tO2zvsP2FKgsCAKDOSgWv7Z+XdFJEfETSMtvrqi0LAIB6KnvEe4mk7cX0dkkXV1MOAAD11ldy3CmSvl9MH5K0tnWh7a2SthZPf2T7eyX3g9lWSXp1sYvohf1ni13C8eSY7hd6JdUx3SvSkuuX93daUDZ4D0paUUyvKJ6/JSJGJI2U3DbmYHssIoYXuw4cG+gXdIteyVP2q+ZnJG0opjdKeraacgAAqLdSwRsRL0iatL1D0psR8Vy1ZQEAUE9lv2pWRGyrshB0ja/wcTToF3SLXkniiFjsGgAAOG5w5yoAABIRvAAAJCJ4S7J9ve0P2b7I9tPF7TPvaFl+g+2dtu+13W97ue3HbD9p+xu2lxfrXW77GdtP2B5qs58v256wvaVl3rnFtp+yfV6bMV8plu2cXm57ve1dtp+1/ftzvK5B248Xr2ljMW+b7Qt7+40dvxJ7ZdZy23cV/91Hbf92mzFfsP3tYp1fKObN6oEOr4teWQAV9sus944Z+2nXLzfb/m/bt3UYM+u9x/bfFP01avvAHK+LfpkWETyO8qHmB5YHiun3SBoopu+V9EFJDUn/Usy7UdInJA1Iem8x7zOSPldMPyFpuaSLJH2pzb7eK+n3JG1pmfeApNMlvU/SQ23GrCl+ni3pH4vpf5Z0RlH7rjle299K+rCkkyWNFvNOlXTXYv/ej8VHcq/MWi7pLklnzVFff/Hz/ZK+2akH6JVjsl9mvXd00S/vlvRRSbd1GNPxvUfShyTdQ7/M/+CIt5zzJY1LUkS8EhGTxfw3JE1JulDSaDFvu6SLI2IyIv6ndT3b75T0k4h4PSJ2STpn5o5axrQ6LSJejoi9kt7VZsyeYvJIUY8kvVise6Kk/53jtZ0n6ZmI+JGk120vj4gDkgZte45xaC+lV+ZYHpL+3vbXbc+6k05EHCkmT5b078X0rB7o8NrolepV0i/F+HbvHZI690tE/FDNnulkrveeqyT90xxj6ZcCwVvO2ZL+q3VG8bXLqojYreYtNV8rFh1S81Pd9Honq3k7zfuK+a+1bOaELvf/jg7TM31ezU+ZkvSgpG9I+g81Pz13ckIUH0X107XvV/MTNI5OVq90Wv7HEfFhSbdL+ut2Bdp+QNK39Pb91zv1wEz0SvWq6pf5LMR7z5WSHp5jLP1SIHjL+alPZ7ZPk3SnpGuLWQfV5paaxae6v5N0c0QclHSgZT1JetN2Y/p8yRz7f3PmdHGOZ9T2e4rnfyhpd0TsLNb7K0mXqvmH/aniE287Uy3TrbcDteb+JIz2snpl1nJJioj9xc+dan51OatXIuIqNf9HJ39RjO3UAzPRK9Wrql9mb7iLfukwrrVfZr33FOucLWlvRPx4jtdGvxQI3nL+U9JqSbLdJ+keSTdExCvF8u9IWl9Mt95S888lPRURj0tS0aQ/Y/vk4gKD3RExERGXRcRlc+x/v+0h24NqfnJURGwoxr1i+2NqnktpvUBiStLBiDis5h/M9EUZK2Zs+7u2L7F9kqQVETH9qfg0ST/s8veDt6X0SrvlxT5XFD8/oOKNbkavnFjs73W9fQpiVg/QK2kq6Zd2uumXDuPe6he1ee8pXKXm+V8VtdMvc1nsk8zH4kPNDywPFtO/JWlCzfMuo5IuKebfKGmnml/7LJM0KOlwy3qfLdbbqOa9r5+QdEabfd2s5vnZ3ZL+tJh3XrHtpyRd0GbM99T8Ax2V9JVi3pWSdhX7mt7OFkm/M2PskKTHi/U+Vsw7VdLdi/17PxYfyb0ya7mkrxfb3iHp3DZjHizW3yHp0jl6gF459vpl1ntHF/1yraTnJe1R+wv42r73SHpS0qktz+mXOR7cuaok29dLeiwi/nWxaynL9ucl/WVEHJpnvT9Q80roXTmV1Qu9gqNBv9QfwQsAQCLO8QIAkIjgBQAgEcELAEAighcAgEQELwAAiQheAAAS/T+05C409iJ4OAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 576x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 576x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "avg_tweet_final = []\n",
    "avg_tweet_final_2 = []\n",
    "\n",
    "for i in major_events:\n",
    "    avg_tweet_num = get_tweets_daterange('elonmusk', i, 14, 7)\n",
    "    avg_tweet_final += [avg_tweet_num.count()/3]\n",
    "\n",
    "for i in list_date:\n",
    "    avg_tweet_num_2 = get_tweets_daterange('elonmusk', i, 14, 7)\n",
    "    avg_tweet_final_2 += [avg_tweet_num_2.count()/3]\n",
    "\n",
    "data_major = [major_events, avg_tweet_final]\n",
    "column_names_m = data_major.pop(0)\n",
    "df_major_event = pd.DataFrame(data_major, columns=column_names_m)\n",
    "df_lists_major = df_major_event.unstack().apply(pd.Series)\n",
    "df_lists_major.plot.bar(rot=0, cmap=plt.cm.jet, fontsize=8, width=0.7, figsize=(8,4))\n",
    "\n",
    "data_random = [list_date, avg_tweet_final_2]\n",
    "column_names_r = data_random.pop(0)\n",
    "df_random_event = pd.DataFrame(data_random, columns=column_names_r)\n",
    "df_lists_random = df_random_event.unstack().apply(pd.Series)\n",
    "df_lists_random.plot.bar(rot=40, cmap=plt.cm.jet, fontsize=8, width=0.7, figsize=(8,4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Task 5\n",
    "Repeat the steps for Task 4, but this time plot the data based on average characters/week for each daterange.\n",
    "\n",
    "For the major events you should get `[435.0, 1259.0, 575.6666666666666]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def count_chars(txt):\n",
    "    result = 0\n",
    "    for char in txt:\n",
    "        result += 1\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[435.0, 1259.0, 575.6666666666666]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "avg_tweet_char_num = []\n",
    "\n",
    "for i in major_events:\n",
    "    avg_tweet_char = []\n",
    "    avg_tweet_char += get_tweets_daterange('elonmusk', i, 14, 7)\n",
    "    \n",
    "    num_char = 0\n",
    "    for doc in range(len(avg_tweet_char)):\n",
    "        ab = count_chars(avg_tweet_char[doc]['text'])\n",
    "        num_char += ab \n",
    "\n",
    "    avg_tweet_char_num += [num_char/3]\n",
    "\n",
    "avg_tweet_char_num"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[608.0,\n",
       " 0.0,\n",
       " 736.0,\n",
       " 1007.0,\n",
       " 950.3333333333334,\n",
       " 1489.0,\n",
       " 642.3333333333334,\n",
       " 1045.6666666666667,\n",
       " 0.0,\n",
       " 970.0]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "avg_tweet_char_num_ran = []\n",
    "\n",
    "for i in list_date:\n",
    "    avg_tweet_char_ran = []\n",
    "    avg_tweet_char_ran += get_tweets_daterange('elonmusk', i, 14, 7)\n",
    "    \n",
    "    num_char_ran = 0\n",
    "    for doc in range(len(avg_tweet_char_ran)):\n",
    "        ab_ran = count_chars(avg_tweet_char_ran[doc]['text'])\n",
    "        num_char_ran += ab_ran \n",
    "\n",
    "    avg_tweet_char_num_ran += [num_char_ran/3]\n",
    "\n",
    "avg_tweet_char_num_ran"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAegAAAD3CAYAAADWp8f2AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAASm0lEQVR4nO3df6xf9X3f8eertsFJ4RYH7to4l8ZECZWAkXYyhHQEh2CFCKmITKqWgdJMCQGqaGNag0IWdVmqZh2aqpCWVqqRqqQOafbHlrCSlRSHuLEDNnG6DVS0klWscNmobsA/aBML27z3x/fj5Jv7y9ffe6/vh+vnQ/rqfs7nfD7nfL7XH5/X93zP8XGqCkmS1JefWOkBSJKkmQxoSZI6ZEBLktQhA1qSpA4Z0JIkdWjtSg9g2HnnnVebNm1a6WFIknRKfOc73/leVY3Ptq6rgN60aRP79u1b6WFIknRKJPmbudb5FbckSR0yoCVJ6pABLUlSh7q6Bi1J0sk6cuQIk5OTHD58eKWHMqf169czMTHBunXrFtzHgJYkvapNTk5y9tlns2nTJpKs9HBmqCpeeOEFJicnueCCCxbcz6+4JUmvaocPH+bcc8/tMpwBknDuueee9Bm+AS1JetXrNZyPG2V8BrQkSR3yGrQkaVVJPrWk26v65AnbPPjgg9x+++0cO3aMm2++mTvvvHPR+zWgpUVa6oOBRrOQg6i0HI4dO8ZHPvIRHnroISYmJrjsssu4/vrrueiiixa1Xb/iliRpER577DHe/OY386Y3vYkzzjiD973vfdx///2L3q4BLUnSIjz33HOcf/75P1yemJjgueeeW/R2DWhJkhahqmbULcVd5Qa0JEmLMDExwbPPPvvD5cnJSTZu3Ljo7RrQkiQtwmWXXcZ3v/tdnn76aV5++WW+9KUvcf311y96u97FLUlaVU71Hf1r167lnnvu4dprr+XYsWN88IMf5OKLL178dk/UIMlG4AHgIuAs4Hzgj4ACJoH3V9WxJDcBHwFeBG6sqkNJ3gV8Gjjc2k0uesSSJHXmuuuu47rrrlvSbS7kK+4XgWuAPW35APBLVXUV8DRwXZJ1wG3AVcB24NbW9teBdwN3Ah9fumFLkrS6nTCgq+pwVe0fWt5fVQfa4lHgGHAh8ERVHQV2AFckeS3wg6p6qar2MjgDnyHJLUn2Jdk3NTW1yLcjSdLqMPJNYu2r763AnwHnAIfaqoPAhvY6NNRlzWzbqaptVbW5qjaPj4+POhxJ0mlstn/q1JNRxjdSQCc5E/g88OF21nwAGGurx9ry/qE6gFdG2ZckSfNZv349L7zwQrchffz/g16/fv1J9Rv1Lu5twO9X1ZNt+SngkiRrGJxV76mq7yd5TZKzGHy9/eQc25IkaWQTExNMTk7S82XS9evXMzExcVJ9FnIX9zrgT4G3Al9L8hvAPwHemOR24LNV9eUk9wK7GJw539i6fxp4iMFd3B84qZFJkrQA69at44ILLljpYSy5EwZ0VR1hcFY87OxZ2m1ncAf3cN0OBjeNSZKkk+CTxCRJ6pABLUlShwxoSZI6ZEBLktQhA1qSpA4Z0JIkdciAliSpQwa0JEkdMqAlSeqQAS1JUocMaEmSOmRAS5LUIQNakqQOGdCSJHXIgJYkqUMGtCRJHTKgJUnqkAEtSVKHDGhJkjpkQEuS1CEDWpKkDhnQkiR1yICWJKlDBrQkSR0yoCVJ6tAJAzrJxiR/keRwkrWt7o4ku5Pcl2Rdq7spySNJHkgy1ureleTRJN9IMrG8b0WSpNVjIWfQLwLXAHsAkowDV1fVlcDjwA0tpG8DrgK2A7e2vr8OvBu4E/j40g5dkqTV64QBXVWHq2r/UNXlwM5W3gFcAVwIPFFVR4/XJXkt8IOqeqmq9gIXzbb9JLck2Zdk39TU1CLeiiRJq8co16DPAQ618kFgwxx1G4bqANbMtrGq2lZVm6tq8/j4+AjDkSRp9RkloA8AY6081pZnq9s/VAfwygj7kiTptDRKQH8b2NLKWxlcm34KuCTJmuN1VfV94DVJzkpyOfDkUgxYkqTTwdoTNWg3gP0p8Fbga8C/Ab6ZZDfwDHB3VR1Jci+wi8GZ842t+6eBh4DDwAeWfviSJK1OJwzoqjrC4Kx42F7grmnttjO4g3u4bgeDm8YkSdJJ8EElkiR1yICWJKlDBrQkSR0yoCVJ6pABLUlShwxoSZI6ZEBLktQhA1qSpA4Z0JIkdciAliSpQwa0JEkdMqAlSeqQAS1JUocMaEmSOmRAS5LUIQNakqQOGdCSJHXIgJYkqUMGtCRJHTKgJUnqkAEtSVKHDGhJkjpkQEuS1CEDWpKkDhnQkiR1aKSATvLaJF9NsjPJ/UnOTHJHkt1J7kuyrrW7KckjSR5IMra0Q5ckafUa9Qz6PcDeqnon8BjwPuDqqroSeBy4oYX0bcBVwHbg1sUPV5Kk08OoAf3XwJmtfA6wCdjZlncAVwAXAk9U1dGhuhmS3JJkX5J9U1NTIw5HkqTVZdSA/i7wtiR/CWwG/jdwqK07CGxgENzT62aoqm1VtbmqNo+Pj484HEmSVpdRA/oDwNeq6mLgq8Ba4Pg15jHgQHtNr5MkSQswakAHeLGVv8fgK+4tbXkrsAd4CrgkyZqhOkmStABrR+z3ReA/JXk/cAT4p8CHk+wGngHurqojSe4FdgH7gRuXYsCSJJ0ORgroqjoAXDut+q72Gm63ncEd3JIk6ST4oBJJkjpkQEuS1CEDWpKkDhnQkiR1yICWJKlDBrQkSR0yoCVJ6pABLUlShwxoSZI6ZEBLktQhA1qSpA4Z0JIkdciAliSpQwa0JEkdMqAlSeqQAS1JUocMaEmSOmRAS5LUIQNakqQOGdCSJHXIgJYkqUMGtCRJHTKgJUnqkAEtSVKHRg7oJL+S5OtJdiZ5Q5I7kuxOcl+Sda3NTUkeSfJAkrGlG7YkSavbSAGd5A3Alqq6pqreCbwMXF1VVwKPAze0kL4NuArYDty6NEOWJGn1G/UM+lpgTTuD/l3gcmBnW7cDuAK4EHiiqo4O1UmSpAUYNaB/Gjijqq4Bvg+cAxxq6w4CG+aomyHJLUn2Jdk3NTU14nAkSVpdRg3og8Cft/LDwCbg+DXmMeBAe02vm6GqtlXV5qraPD4+PuJwJElaXUYN6EeAS1v554FngS1teSuwB3gKuCTJmqE6SZK0AGtH6VRV/yPJD5LsBL4H3Ai8Pslu4Bng7qo6kuReYBewv7WRJEkLMFJAA1TVR6dV3dVew222M7iDW5IEJJ9a6SGc9qo+udJDWBAfVCJJUocMaEmSOmRAS5LUIQNakqQOGdCSJHXIgJYkqUMGtCRJHTKgJUnqkAEtSVKHDGhJkjpkQEuS1CEDWpKkDhnQkiR1yICWJKlDBrQkSR0yoCVJ6pABLUlShwxoSZI6ZEBLktQhA1qSpA4Z0JIkdciAliSpQ2tXegC9Sj610kM47VV9cqWHIEkrxjNoSZI6ZEBLktShRQV0kn+dZHcr35Fkd5L7kqxrdTcleSTJA0nGlmLAkiSdDkYO6CRnAm9t5XHg6qq6EngcuKGF9G3AVcB24NbFD1eSpNPDYs6gbwY+38qXAztbeQdwBXAh8ERVHR2qkyRJCzBSQLez4y1V9XCrOgc41MoHgQ1z1M22rVuS7Euyb2pqapThSJK06ox6Bv1+4ItDyweA49eYx9rybHUzVNW2qtpcVZvHx8dHHI4kSavLqAH9c8CvJnkQuBjYDGxp67YCe4CngEuSrBmqkyRJCzDSg0qq6mPHy0l2V9Wnknys3dH9DHB3VR1Jci+wC9gP3LgkI5Yk6TSw6CeJtTu3qaq7gLumrdvO4A5uSZJ0EnxQiSRJHTKgJUnqkAEtSVKHDGhJkjpkQEuS1CEDWpKkDhnQkiR1yICWJKlDBrQkSR0yoCVJ6pABLUlShwxoSZI6ZEBLktQhA1qSpA4Z0JIkdciAliSpQwa0JEkdMqAlSeqQAS1JUocMaEmSOmRAS5LUIQNakqQOGdCSJHXIgJYkqUMGtCRJHRopoJO8LckjSXYl+UyruyPJ7iT3JVnX6m5q7R5IMraUA5ckaTUb9Qz6b4B3VdU7gH+Q5B3A1VV1JfA4cEML6duAq4DtwK1LMWBJkk4HIwV0VT1fVYfb4lHgUmBnW94BXAFcCDxRVUeH6iRJ0gIs6hp0kkuB84ADwKFWfRDYAJwzS91s27glyb4k+6amphYzHEmSVo2RAzrJ64B7gA8xCOjj15jH2vJsdTNU1baq2lxVm8fHx0cdjiRJq8qoN4mtBb4A3FFVzwPfBra01VuBPcBTwCVJ1gzVSZKkBVg7Yr9fBi4D7koC8HHgm0l2A88Ad1fVkST3AruA/cCNSzBeSZJOCyMFdFX9MfDH06ofBe6a1m47gzu4JUnSSfBBJZIkdciAliSpQwa0JEkdMqAlSeqQAS1JUocMaEmSOmRAS5LUIQNakqQOGdCSJHXIgJYkqUMGtCRJHTKgJUnqkAEtSVKHDGhJkjpkQEuS1CEDWpKkDhnQkiR1yICWJKlDBrQkSR0yoCVJ6pABLUlShwxoSZI6ZEBLktQhA1qSpA4Z0JIkdWjZAzrJZ5LsSvLZ5d6XJEmrxbIGdJJ/BPxkVb0DOCPJZcu5P0mSVovlPoN+O7CjlXcAVyzz/iRJWhXWLvP2zwH+upUPAhdPb5DkFuCWtvh3Sf5qmcd0ujgP+N5KD2Ixkn+30kM4nThftFDOlaX1xrlWLHdAHwDGWnmsLf+YqtoGbFvmcZx2kuyrqs0rPQ69OjhftFDOlVNnub/ifhS4ppW3AnuWeX+SJK0KyxrQVfUXwOEku4BXquqx5dyfJEmrxXJ/xU1V3b7c+9CsvGygk+F80UI5V06RVNVKj0GSJE3jk8QkSeqQAS1JUocM6GWW5KNJfiHJ25I80h57+pmh9Xck2Z3kviTrkpyd5OtJvpnkgSRnt3bvSvJokm8kmZhlP7+fZCrJzUN1l7RtfyvJpbP0+YO2bvfx9Um2JNmbZE+S2+Z5XxuTPNze09ZWd3uSyxf3Gzt9ncK5MmN9ks+1P/edSW6cpc9nk/x5a/OPW92MOTDH+3KuLIMlnC8zjh3T9jPbfPlEkv+b5Dfn6DPj2JPk7ja/dibZP8/7cr4cV1W+lunF4APQl1v5Z4D1rXwf8A+BceC/tbqPAb8MrAde3+o+DPyLVv4GcDbwNuD3ZtnX64F/Dtw8VPdl4HzgDcD9s/S5oP18C/CfW/m/Aj/bxr53nvf2O8AvAmcBO1vdBuBzK/17fzW+TvFcmbEe+Bzw5nnGt679fCPw1bnmgHPlVTlfZhw7FjBffhq4GvjNOfrMeewBfgH4gvPlxC/PoJfXW4FJgKp6vqoOt/qjwDHgcmBnq9sBXFFVh6vq/w23S/Ja4AdV9VJV7QUumr6joT7DXldVz1bVc8BPzdLn6VY80sYD8Jet7ZnA38/z3i4FHq2qvwNeSnJ2Ve0HNibJPP00u1MyV+ZZX8AfJfmTJDOebFRVR1rxLOB/tvKMOTDHe3OuLL0lmS+t/2zHDmDu+VJVf8tgzsxlvmPPe4H/Mk9f50tjQC+vtwD/Z7iifd1zXlU9yeBRqIfaqoMMPiUeb3cWg0egfrHVHxrazJoF7v8n5ihP91sMPrUCfAV4APhfDD6Nz2VNtY+2/PjYX2TwiVwn51TNlbnW/1pV/SJwF/Dbsw0wyZeBP+NHz9efaw5M51xZeks1X05kOY497wEenKev86UxoJfXj33aS/I64B7gQ63qALM8CrV9SvxD4BNVdQDYP9QO4JUk48ev58yz/1eml9s1qJ1JfqYt/yvgyara3dr9R+BKBgeAX2mfoGdzbKg8/BjXMP8na83uVM2VGesBqurF9nM3g69MZ8yVqnovg//w5t+3vnPNgemcK0tvqebLzA0vYL7M0W94vsw49rQ2bwGeq6rvz/PenC+NAb28ngI2ASRZC3wBuKOqnm/rvw1saeXhR6H+BvCtqnoYoE3m1yQ5q90o8WRVTVXVO6vqnfPs/8UkE0k2MvgkSlVd0/o9n+TdDK71DN/ocQw4UFUvM/iLdfzmkrFp2348yduT/CQwVlXHP2W/DvjbBf5+9COnZK7Mtr7tc6z9/DnaAXHaXDmz7e8lfnTpY8YccK6cMksyX2azkPkyR78fzhdmOfY072VwfZo2dufLfFb6IvhqfjH4APSVVv5nwBSD60I7gbe3+o8Buxl83XQGsBF4eajdr7Z2Wxk82/wbwM/Osq9PMLh+/CTwb1vdpW3b3wJ+fpY+f8XgL/JO4A9a3XuAvW1fx7dzM3DTtL4TwMOt3btb3Qbg8yv9e381vk7xXJmxHviTtu1dwCWz9PlKa78LuHKeOeBcefXNlxnHjgXMlw8B3wGeZvYbEWc99gDfBDYMLTtf5nn5JLFlluSjwNer6r+v9FhGleS3gP9QVQdP0O5fMrjze++pGdnq4lzRyXC+rH4GtCRJHfIatCRJHTKgJUnqkAEtSVKHDGhJkjpkQEuS1CEDWpKkDv1/LvRi2nZJtpEAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 576x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 576x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "data_major_char = [major_events, avg_tweet_char_num]\n",
    "column_names_m_char = data_major_char.pop(0)\n",
    "df_major_event_char = pd.DataFrame(data_major_char, columns=column_names_m_char)\n",
    "df_lists_major_char = df_major_event_char.unstack().apply(pd.Series)\n",
    "df_lists_major_char.plot.bar(rot=0, cmap=plt.cm.jet, fontsize=8, width=0.7, figsize=(8,4))\n",
    "\n",
    "data_random_char = [list_date, avg_tweet_char_num_ran]\n",
    "column_names_r_char = data_random_char.pop(0)\n",
    "df_random_event_char = pd.DataFrame(data_random_char, columns=column_names_r_char)\n",
    "df_lists_random_char = df_random_event_char.unstack().apply(pd.Series)\n",
    "df_lists_random_char.plot.bar(rot=40, cmap=plt.cm.jet, fontsize=8, width=0.7, figsize=(8,4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 6\n",
    "Answer the following questions in the cell below using markdown.\n",
    " 1. From the data, do you believe Elon was more/less active during major events? Please support your position using evidence from the previous tasks.\n",
    " 2. What could we do to improve this experiment?\n",
    "   - Your response could talk about data collection/aggregation, method of analysis, etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your response:\n",
    "Base on the comparison with the ten random daterange's average tweet counts, Elon Musk's tweeting patterns do not change much. Since the major event dates have the count of (4, 14, 8) is in the range of (0 - 17), there is no clear sign of changing in his behavior.  \n",
    "\n",
    "From the data both on the average tweets count and range_date count, it is hard to dertermine if he is more active or not during the major events. Since only using the data from the 10 random datarange is not sufficient to support any conclusion, we will need to have more data of the year to discover if he tweets more on major events. For example, we can calculate the average number of tweets' count of one week instead of three weeks. Then we can compare the figures to see if the average tweet count on the major event weeks have a much bigger numbers than other normal weeks. Similarly, we can also collect the tweet count on daily bases from a year range to discover Elon Musk's tweeting pattern. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}