{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrangle and Analyze Data\n", "\n", "Wrangling of WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. \n", "\n", "## Table of Contents\n", "- [Introduction](#intro)\n", "- [Gathering](#gathering)\n", "- [Assessing](#assessing)\n", " - [The WeRateDogs twitter archive](#weratedogstwitter)\n", " - [The tweet image predictions](#tweetimagepreductions)\n", " - [Data retrieved via twitter API](#datafromtwitter)\n", " - [Synthesis](#assesssynthesis)\n", "- [Cleaning](#cleaning)\n", "- [Storing Cleaned Data](#storing)\n", "- [Analyzing and Visualizing](#analysis)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='intro'></a>\n", "### Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The WeRateDogs Twitter archive is great, but it only contains very basic tweet information. So we perform additional gathering through an URL and a direct connection to twitter. Then we assess and clean the Data as required to perform analysis and visualization." ] }, { "cell_type": "code", "execution_count": 784, "metadata": {}, "outputs": [], "source": [ "# Import required packages\n", "import pandas as pd\n", "import numpy as np\n", "import requests\n", "import tweepy\n", "import json\n", "import re\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from timeit import default_timer as timer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='gathering'></a>\n", "### Gathering\n", "\n", "Based on the project motivation and details, we gather Data from three sources:\n", "1. The WeRateDogs twitter archive, available as `twitter_archive_enhanced.csv`\n", "2. The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network, available at the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv\n", "3. A pandas DataFrame created from data we will be collecting directly from twitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. Gather data from the archive available as `twitter-archive-enhanced.csv`" ] }, { "cell_type": "code", "execution_count": 785, "metadata": {}, "outputs": [], "source": [ "# Load the csv file as a dataframe\n", "df_archive = pd.read_csv('twitter-archive-enhanced.csv')" ] }, { "cell_type": "code", "execution_count": 786, "metadata": {}, "outputs": [], "source": [ "# make a copy, as we will be using the copy\n", "df_archive_clean = df_archive.copy()" ] }, { "cell_type": "code", "execution_count": 787, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>in_reply_to_status_id</th>\n", " <th>in_reply_to_user_id</th>\n", " <th>timestamp</th>\n", " <th>source</th>\n", " <th>text</th>\n", " <th>retweeted_status_id</th>\n", " <th>retweeted_status_user_id</th>\n", " <th>retweeted_status_timestamp</th>\n", " <th>expanded_urls</th>\n", " <th>rating_numerator</th>\n", " <th>rating_denominator</th>\n", " <th>name</th>\n", " <th>doggo</th>\n", " <th>floofer</th>\n", " <th>pupper</th>\n", " <th>puppo</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>892420643555336193</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2017-08-01 16:23:56 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/892420643555336193/photo/1</td>\n", " <td>13</td>\n", " <td>10</td>\n", " <td>Phineas</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>892177421306343426</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2017-08-01 00:17:27 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/892177421306343426/photo/1</td>\n", " <td>13</td>\n", " <td>10</td>\n", " <td>Tilly</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>891815181378084864</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2017-07-31 00:18:03 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/891815181378084864/photo/1</td>\n", " <td>12</td>\n", " <td>10</td>\n", " <td>Archie</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>891689557279858688</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2017-07-30 15:58:51 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/891689557279858688/photo/1</td>\n", " <td>13</td>\n", " <td>10</td>\n", " <td>Darla</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>891327558926688256</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2017-07-29 16:00:24 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Franklin. He would like you to stop calling him \"cute.\" He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/891327558926688256/photo/1,https://twitter.com/dog_rates/status/891327558926688256/photo/1</td>\n", " <td>12</td>\n", " <td>10</td>\n", " <td>Franklin</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "0 892420643555336193 NaN NaN \n", "1 892177421306343426 NaN NaN \n", "2 891815181378084864 NaN NaN \n", "3 891689557279858688 NaN NaN \n", "4 891327558926688256 NaN NaN \n", "\n", " timestamp \\\n", "0 2017-08-01 16:23:56 +0000 \n", "1 2017-08-01 00:17:27 +0000 \n", "2 2017-07-31 00:18:03 +0000 \n", "3 2017-07-30 15:58:51 +0000 \n", "4 2017-07-29 16:00:24 +0000 \n", "\n", " source \\\n", "0 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "2 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "3 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "4 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "\n", " text \\\n", "0 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU \n", "1 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV \n", "2 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB \n", "3 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ \n", "4 This is Franklin. He would like you to stop calling him \"cute.\" He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f \n", "\n", " retweeted_status_id retweeted_status_user_id retweeted_status_timestamp \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " expanded_urls \\\n", "0 https://twitter.com/dog_rates/status/892420643555336193/photo/1 \n", "1 https://twitter.com/dog_rates/status/892177421306343426/photo/1 \n", "2 https://twitter.com/dog_rates/status/891815181378084864/photo/1 \n", "3 https://twitter.com/dog_rates/status/891689557279858688/photo/1 \n", "4 https://twitter.com/dog_rates/status/891327558926688256/photo/1,https://twitter.com/dog_rates/status/891327558926688256/photo/1 \n", "\n", " rating_numerator rating_denominator name doggo floofer pupper puppo \n", "0 13 10 Phineas None None None None \n", "1 13 10 Tilly None None None None \n", "2 12 10 Archie None None None None \n", "3 13 10 Darla None None None None \n", "4 12 10 Franklin None None None None " ] }, "execution_count": 787, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view on the dataframe\n", "df_archive_clean.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the project details, we got the information that the data we loaded are probably not all correct: the ratings are probably not all correct, same for the dog names and dog stages. \n", "So we will need to assess and clean those columns prior to any analysis and visualization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. Gather the tweet image predictions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the `image-predictions.tsv` is available through the following URL https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv, I download it first using the \"Requests\" library, and then I load it as a dataframe." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Set the images file URL\n", "images_url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Get the content \n", "response = requests.get(images_url)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Write it locally as a file\n", "with open('tweet_images_predictions.tsv', mode='wb') as file:\n", " file.write(response.content)" ] }, { "cell_type": "code", "execution_count": 788, "metadata": {}, "outputs": [], "source": [ "# load the images prediction file as a dataframe\n", "df_image = pd.read_csv('tweet_images_predictions.tsv', sep='\\t')" ] }, { "cell_type": "code", "execution_count": 789, "metadata": {}, "outputs": [], "source": [ "# create a copy for our further cleaning\n", "df_image_clean = df_image.copy()" ] }, { "cell_type": "code", "execution_count": 790, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>jpg_url</th>\n", " <th>img_num</th>\n", " <th>p1</th>\n", " <th>p1_conf</th>\n", " <th>p1_dog</th>\n", " <th>p2</th>\n", " <th>p2_conf</th>\n", " <th>p2_dog</th>\n", " <th>p3</th>\n", " <th>p3_conf</th>\n", " <th>p3_dog</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>666020888022790149</td>\n", " <td>https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg</td>\n", " <td>1</td>\n", " <td>Welsh_springer_spaniel</td>\n", " <td>0.465074</td>\n", " <td>True</td>\n", " <td>collie</td>\n", " <td>0.156665</td>\n", " <td>True</td>\n", " <td>Shetland_sheepdog</td>\n", " <td>0.061428</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>666029285002620928</td>\n", " <td>https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg</td>\n", " <td>1</td>\n", " <td>redbone</td>\n", " <td>0.506826</td>\n", " <td>True</td>\n", " <td>miniature_pinscher</td>\n", " <td>0.074192</td>\n", " <td>True</td>\n", " <td>Rhodesian_ridgeback</td>\n", " <td>0.072010</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>666033412701032449</td>\n", " <td>https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg</td>\n", " <td>1</td>\n", " <td>German_shepherd</td>\n", " <td>0.596461</td>\n", " <td>True</td>\n", " <td>malinois</td>\n", " <td>0.138584</td>\n", " <td>True</td>\n", " <td>bloodhound</td>\n", " <td>0.116197</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>666044226329800704</td>\n", " <td>https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg</td>\n", " <td>1</td>\n", " <td>Rhodesian_ridgeback</td>\n", " <td>0.408143</td>\n", " <td>True</td>\n", " <td>redbone</td>\n", " <td>0.360687</td>\n", " <td>True</td>\n", " <td>miniature_pinscher</td>\n", " <td>0.222752</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>666049248165822465</td>\n", " <td>https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg</td>\n", " <td>1</td>\n", " <td>miniature_pinscher</td>\n", " <td>0.560311</td>\n", " <td>True</td>\n", " <td>Rottweiler</td>\n", " <td>0.243682</td>\n", " <td>True</td>\n", " <td>Doberman</td>\n", " <td>0.154629</td>\n", " <td>True</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id jpg_url \\\n", "0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg \n", "1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg \n", "2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg \n", "3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg \n", "4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg \n", "\n", " img_num p1 p1_conf p1_dog p2 \\\n", "0 1 Welsh_springer_spaniel 0.465074 True collie \n", "1 1 redbone 0.506826 True miniature_pinscher \n", "2 1 German_shepherd 0.596461 True malinois \n", "3 1 Rhodesian_ridgeback 0.408143 True redbone \n", "4 1 miniature_pinscher 0.560311 True Rottweiler \n", "\n", " p2_conf p2_dog p3 p3_conf p3_dog \n", "0 0.156665 True Shetland_sheepdog 0.061428 True \n", "1 0.074192 True Rhodesian_ridgeback 0.072010 True \n", "2 0.138584 True bloodhound 0.116197 True \n", "3 0.360687 True miniature_pinscher 0.222752 True \n", "4 0.243682 True Doberman 0.154629 True " ] }, "execution_count": 790, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view to this dataframe\n", "df_image_clean.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. Gather additional informations directly from twitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the twitter archive data loaded as `df_archive_clean`, \"retweet count\" and \"favorite count\" (\"like\") are missing. \n", "We gather this additional information using twitter API. Using the tweet IDs within `df_archive_clean` we gather all this missing info through queries towards Twitter's API. \n", "The gathering process here is the following:\n", "* For each tweet ID in the WeRateDogs archive (`df_archive_clean`),\n", "* We query the Twitter API and get a tweet JSON data\n", "* We store the entire set of JSON data in a file called `tweet_json.txt`, on a new line\n", "* Once `tweet_json.txt` is completed, we read it line by line into a pandas DataFrame (at a minimum with tweet ID, retweet count, favorite count)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "# Create twitter API object to gather twitter data\n", "# Get the key, token and secrets\n", "# - keys, token and secrets are hidden here -\n", "\n", "\n", "# Create the api object\n", "# set the wait_on_rate_limit to True to automatically wait for rate limits to refill\n", "# set wait_on_rate_limit_notify to True to print a notification when Tweepy is waiting for rate limits to refill\n", "api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next cell query the twitter API and write the content into \"tweet_json.txt\" file. \n", "Due to the rate limit from twitter, this code run for more or less 30 minutes !" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 892420643555336193\n", "Successfully queried - Tweet ID: 892177421306343426\n", "Successfully queried - Tweet ID: 891815181378084864\n", "Successfully queried - Tweet ID: 891689557279858688\n", "Successfully queried - Tweet ID: 891327558926688256\n", "Successfully queried - Tweet ID: 891087950875897856\n", "Successfully queried - Tweet ID: 890971913173991426\n", "Successfully queried - Tweet ID: 890729181411237888\n", "Successfully queried - Tweet ID: 890609185150312448\n", "Successfully queried - Tweet ID: 890240255349198849\n", "Successfully queried - Tweet ID: 890006608113172480\n", "Successfully queried - Tweet ID: 889880896479866881\n", "Successfully queried - Tweet ID: 889665388333682689\n", "Successfully queried - Tweet ID: 889638837579907072\n", "Successfully queried - Tweet ID: 889531135344209921\n", "Successfully queried - Tweet ID: 889278841981685760\n", "Successfully queried - Tweet ID: 888917238123831296\n", "Successfully queried - Tweet ID: 888804989199671297\n", "Successfully queried - Tweet ID: 888554962724278272\n", "Error Tweet_Id: 888202515573088257 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 888202515573088257\n", "Successfully queried - Tweet ID: 888078434458587136\n", "Successfully queried - Tweet ID: 887705289381826560\n", "Successfully queried - Tweet ID: 887517139158093824\n", "Successfully queried - Tweet ID: 887473957103951883\n", "Successfully queried - Tweet ID: 887343217045368832\n", "Successfully queried - Tweet ID: 887101392804085760\n", "Successfully queried - Tweet ID: 886983233522544640\n", "Successfully queried - Tweet ID: 886736880519319552\n", "Successfully queried - Tweet ID: 886680336477933568\n", "Successfully queried - Tweet ID: 886366144734445568\n", "Successfully queried - Tweet ID: 886267009285017600\n", "Successfully queried - Tweet ID: 886258384151887873\n", "Successfully queried - Tweet ID: 886054160059072513\n", "Successfully queried - Tweet ID: 885984800019947520\n", "Successfully queried - Tweet ID: 885528943205470208\n", "Successfully queried - Tweet ID: 885518971528720385\n", "Successfully queried - Tweet ID: 885311592912609280\n", "Successfully queried - Tweet ID: 885167619883638784\n", "Successfully queried - Tweet ID: 884925521741709313\n", "Successfully queried - Tweet ID: 884876753390489601\n", "Successfully queried - Tweet ID: 884562892145688576\n", "Successfully queried - Tweet ID: 884441805382717440\n", "Successfully queried - Tweet ID: 884247878851493888\n", "Successfully queried - Tweet ID: 884162670584377345\n", "Successfully queried - Tweet ID: 883838122936631299\n", "Successfully queried - Tweet ID: 883482846933004288\n", "Successfully queried - Tweet ID: 883360690899218434\n", "Successfully queried - Tweet ID: 883117836046086144\n", "Successfully queried - Tweet ID: 882992080364220416\n", "Successfully queried - Tweet ID: 882762694511734784\n", "Successfully queried - Tweet ID: 882627270321602560\n", "Successfully queried - Tweet ID: 882268110199369728\n", "Successfully queried - Tweet ID: 882045870035918850\n", "Successfully queried - Tweet ID: 881906580714921986\n", "Successfully queried - Tweet ID: 881666595344535552\n", "Successfully queried - Tweet ID: 881633300179243008\n", "Successfully queried - Tweet ID: 881536004380872706\n", "Successfully queried - Tweet ID: 881268444196462592\n", "Successfully queried - Tweet ID: 880935762899988482\n", "Successfully queried - Tweet ID: 880872448815771648\n", "Successfully queried - Tweet ID: 880465832366813184\n", "Successfully queried - Tweet ID: 880221127280381952\n", "Successfully queried - Tweet ID: 880095782870896641\n", "Successfully queried - Tweet ID: 879862464715927552\n", "Successfully queried - Tweet ID: 879674319642796034\n", "Successfully queried - Tweet ID: 879492040517615616\n", "Successfully queried - Tweet ID: 879415818425184262\n", "Successfully queried - Tweet ID: 879376492567855104\n", "Successfully queried - Tweet ID: 879130579576475649\n", "Successfully queried - Tweet ID: 879050749262655488\n", "Successfully queried - Tweet ID: 879008229531029506\n", "Successfully queried - Tweet ID: 878776093423087618\n", "Successfully queried - Tweet ID: 878604707211726852\n", "Successfully queried - Tweet ID: 878404777348136964\n", "Successfully queried - Tweet ID: 878316110768087041\n", "Successfully queried - Tweet ID: 878281511006478336\n", "Successfully queried - Tweet ID: 878057613040115712\n", "Successfully queried - Tweet ID: 877736472329191424\n", "Successfully queried - Tweet ID: 877611172832227328\n", "Successfully queried - Tweet ID: 877556246731214848\n", "Successfully queried - Tweet ID: 877316821321428993\n", "Successfully queried - Tweet ID: 877201837425926144\n", "Successfully queried - Tweet ID: 876838120628539392\n", "Successfully queried - Tweet ID: 876537666061221889\n", "Successfully queried - Tweet ID: 876484053909872640\n", "Successfully queried - Tweet ID: 876120275196170240\n", "Successfully queried - Tweet ID: 875747767867523072\n", "Successfully queried - Tweet ID: 875144289856114688\n", "Successfully queried - Tweet ID: 875097192612077568\n", "Successfully queried - Tweet ID: 875021211251597312\n", "Successfully queried - Tweet ID: 874680097055178752\n", "Successfully queried - Tweet ID: 874434818259525634\n", "Successfully queried - Tweet ID: 874296783580663808\n", "Successfully queried - Tweet ID: 874057562936811520\n", "Successfully queried - Tweet ID: 874012996292530176\n", "Error Tweet_Id: 873697596434513921 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 873697596434513921\n", "Successfully queried - Tweet ID: 873580283840344065\n", "Successfully queried - Tweet ID: 873337748698140672\n", "Successfully queried - Tweet ID: 873213775632977920\n", "Successfully queried - Tweet ID: 872967104147763200\n", "Successfully queried - Tweet ID: 872820683541237760\n", "Error Tweet_Id: 872668790621863937 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 872668790621863937\n", "Successfully queried - Tweet ID: 872620804844003328\n", "Successfully queried - Tweet ID: 872486979161796608\n", "Error Tweet_Id: 872261713294495745 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 872261713294495745\n", "Successfully queried - Tweet ID: 872122724285648897\n", "Successfully queried - Tweet ID: 871879754684805121\n", "Successfully queried - Tweet ID: 871762521631449091\n", "Successfully queried - Tweet ID: 871515927908634625\n", "Successfully queried - Tweet ID: 871166179821445120\n", "Successfully queried - Tweet ID: 871102520638267392\n", "Successfully queried - Tweet ID: 871032628920680449\n", "Successfully queried - Tweet ID: 870804317367881728\n", "Successfully queried - Tweet ID: 870726314365509632\n", "Successfully queried - Tweet ID: 870656317836468226\n", "Successfully queried - Tweet ID: 870374049280663552\n", "Successfully queried - Tweet ID: 870308999962521604\n", "Successfully queried - Tweet ID: 870063196459192321\n", "Error Tweet_Id: 869988702071779329 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 869988702071779329\n", "Successfully queried - Tweet ID: 869772420881756160\n", "Successfully queried - Tweet ID: 869702957897576449\n", "Successfully queried - Tweet ID: 869596645499047938\n", "Successfully queried - Tweet ID: 869227993411051520\n", "Successfully queried - Tweet ID: 868880397819494401\n", "Successfully queried - Tweet ID: 868639477480148993\n", "Successfully queried - Tweet ID: 868622495443632128\n", "Successfully queried - Tweet ID: 868552278524837888\n", "Successfully queried - Tweet ID: 867900495410671616\n", "Successfully queried - Tweet ID: 867774946302451713\n", "Successfully queried - Tweet ID: 867421006826221569\n", "Successfully queried - Tweet ID: 867072653475098625\n", "Successfully queried - Tweet ID: 867051520902168576\n", "Error Tweet_Id: 866816280283807744 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 866816280283807744\n", "Successfully queried - Tweet ID: 866720684873056260\n", "Successfully queried - Tweet ID: 866686824827068416\n", "Successfully queried - Tweet ID: 866450705531457537\n", "Successfully queried - Tweet ID: 866334964761202691\n", "Successfully queried - Tweet ID: 866094527597207552\n", "Successfully queried - Tweet ID: 865718153858494464\n", "Successfully queried - Tweet ID: 865359393868664832\n", "Successfully queried - Tweet ID: 865006731092295680\n", "Successfully queried - Tweet ID: 864873206498414592\n", "Successfully queried - Tweet ID: 864279568663928832\n", "Successfully queried - Tweet ID: 864197398364647424\n", "Successfully queried - Tweet ID: 863907417377173506\n", "Successfully queried - Tweet ID: 863553081350529029\n", "Successfully queried - Tweet ID: 863471782782697472\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 863432100342583297\n", "Successfully queried - Tweet ID: 863427515083354112\n", "Successfully queried - Tweet ID: 863079547188785154\n", "Successfully queried - Tweet ID: 863062471531167744\n", "Successfully queried - Tweet ID: 862831371563274240\n", "Successfully queried - Tweet ID: 862722525377298433\n", "Successfully queried - Tweet ID: 862457590147678208\n", "Successfully queried - Tweet ID: 862096992088072192\n", "Error Tweet_Id: 861769973181624320 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 861769973181624320\n", "Successfully queried - Tweet ID: 861383897657036800\n", "Successfully queried - Tweet ID: 861288531465048066\n", "Successfully queried - Tweet ID: 861005113778896900\n", "Successfully queried - Tweet ID: 860981674716409858\n", "Successfully queried - Tweet ID: 860924035999428608\n", "Successfully queried - Tweet ID: 860563773140209665\n", "Successfully queried - Tweet ID: 860524505164394496\n", "Successfully queried - Tweet ID: 860276583193509888\n", "Successfully queried - Tweet ID: 860184849394610176\n", "Successfully queried - Tweet ID: 860177593139703809\n", "Successfully queried - Tweet ID: 859924526012018688\n", "Successfully queried - Tweet ID: 859851578198683649\n", "Successfully queried - Tweet ID: 859607811541651456\n", "Successfully queried - Tweet ID: 859196978902773760\n", "Successfully queried - Tweet ID: 859074603037188101\n", "Successfully queried - Tweet ID: 858860390427611136\n", "Successfully queried - Tweet ID: 858843525470990336\n", "Successfully queried - Tweet ID: 858471635011153920\n", "Successfully queried - Tweet ID: 858107933456039936\n", "Successfully queried - Tweet ID: 857989990357356544\n", "Successfully queried - Tweet ID: 857746408056729600\n", "Successfully queried - Tweet ID: 857393404942143489\n", "Successfully queried - Tweet ID: 857263160327368704\n", "Successfully queried - Tweet ID: 857214891891077121\n", "Successfully queried - Tweet ID: 857062103051644929\n", "Successfully queried - Tweet ID: 857029823797047296\n", "Error Tweet_Id: 856602993587888130 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 856602993587888130\n", "Successfully queried - Tweet ID: 856543823941562368\n", "Successfully queried - Tweet ID: 856526610513747968\n", "Successfully queried - Tweet ID: 856330835276025856\n", "Successfully queried - Tweet ID: 856288084350160898\n", "Successfully queried - Tweet ID: 856282028240666624\n", "Successfully queried - Tweet ID: 855862651834028034\n", "Successfully queried - Tweet ID: 855860136149123072\n", "Successfully queried - Tweet ID: 855857698524602368\n", "Successfully queried - Tweet ID: 855851453814013952\n", "Successfully queried - Tweet ID: 855818117272018944\n", "Successfully queried - Tweet ID: 855459453768019968\n", "Successfully queried - Tweet ID: 855245323840757760\n", "Successfully queried - Tweet ID: 855138241867124737\n", "Successfully queried - Tweet ID: 854732716440526848\n", "Successfully queried - Tweet ID: 854482394044301312\n", "Successfully queried - Tweet ID: 854365224396361728\n", "Successfully queried - Tweet ID: 854120357044912130\n", "Successfully queried - Tweet ID: 854010172552949760\n", "Successfully queried - Tweet ID: 853760880890318849\n", "Successfully queried - Tweet ID: 853639147608842240\n", "Successfully queried - Tweet ID: 853299958564483072\n", "Successfully queried - Tweet ID: 852936405516943360\n", "Successfully queried - Tweet ID: 852912242202992640\n", "Successfully queried - Tweet ID: 852672615818899456\n", "Successfully queried - Tweet ID: 852553447878664193\n", "Successfully queried - Tweet ID: 852311364735569921\n", "Successfully queried - Tweet ID: 852226086759018497\n", "Successfully queried - Tweet ID: 852189679701164033\n", "Successfully queried - Tweet ID: 851953902622658560\n", "Successfully queried - Tweet ID: 851861385021730816\n", "Successfully queried - Tweet ID: 851591660324737024\n", "Successfully queried - Tweet ID: 851464819735769094\n", "Successfully queried - Tweet ID: 851224888060895234\n", "Successfully queried - Tweet ID: 850753642995093505\n", "Successfully queried - Tweet ID: 850380195714523136\n", "Successfully queried - Tweet ID: 850333567704068097\n", "Successfully queried - Tweet ID: 850145622816686080\n", "Successfully queried - Tweet ID: 850019790995546112\n", "Successfully queried - Tweet ID: 849776966551130114\n", "Successfully queried - Tweet ID: 849668094696017920\n", "Successfully queried - Tweet ID: 849412302885593088\n", "Successfully queried - Tweet ID: 849336543269576704\n", "Successfully queried - Tweet ID: 849051919805034497\n", "Successfully queried - Tweet ID: 848690551926992896\n", "Successfully queried - Tweet ID: 848324959059550208\n", "Successfully queried - Tweet ID: 848213670039564288\n", "Successfully queried - Tweet ID: 848212111729840128\n", "Successfully queried - Tweet ID: 847978865427394560\n", "Successfully queried - Tweet ID: 847971574464610304\n", "Successfully queried - Tweet ID: 847962785489326080\n", "Successfully queried - Tweet ID: 847842811428974592\n", "Successfully queried - Tweet ID: 847617282490613760\n", "Successfully queried - Tweet ID: 847606175596138505\n", "Successfully queried - Tweet ID: 847251039262605312\n", "Successfully queried - Tweet ID: 847157206088847362\n", "Successfully queried - Tweet ID: 847116187444137987\n", "Successfully queried - Tweet ID: 846874817362120707\n", "Successfully queried - Tweet ID: 846514051647705089\n", "Successfully queried - Tweet ID: 846505985330044928\n", "Successfully queried - Tweet ID: 846153765933735936\n", "Successfully queried - Tweet ID: 846139713627017216\n", "Successfully queried - Tweet ID: 846042936437604353\n", "Successfully queried - Tweet ID: 845812042753855489\n", "Successfully queried - Tweet ID: 845677943972139009\n", "Error Tweet_Id: 845459076796616705 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 845459076796616705\n", "Successfully queried - Tweet ID: 845397057150107648\n", "Successfully queried - Tweet ID: 845306882940190720\n", "Successfully queried - Tweet ID: 845098359547420673\n", "Successfully queried - Tweet ID: 844979544864018432\n", "Successfully queried - Tweet ID: 844973813909606400\n", "Error Tweet_Id: 844704788403113984 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 844704788403113984\n", "Successfully queried - Tweet ID: 844580511645339650\n", "Successfully queried - Tweet ID: 844223788422217728\n", "Successfully queried - Tweet ID: 843981021012017153\n", "Successfully queried - Tweet ID: 843856843873095681\n", "Successfully queried - Tweet ID: 843604394117681152\n", "Successfully queried - Tweet ID: 843235543001513987\n", "Error Tweet_Id: 842892208864923648 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 842892208864923648\n", "Successfully queried - Tweet ID: 842846295480000512\n", "Successfully queried - Tweet ID: 842765311967449089\n", "Successfully queried - Tweet ID: 842535590457499648\n", "Successfully queried - Tweet ID: 842163532590374912\n", "Successfully queried - Tweet ID: 842115215311396866\n", "Successfully queried - Tweet ID: 841833993020538882\n", "Successfully queried - Tweet ID: 841680585030541313\n", "Successfully queried - Tweet ID: 841439858740625411\n", "Successfully queried - Tweet ID: 841320156043304961\n", "Successfully queried - Tweet ID: 841314665196081154\n", "Successfully queried - Tweet ID: 841077006473256960\n", "Successfully queried - Tweet ID: 840761248237133825\n", "Successfully queried - Tweet ID: 840728873075638272\n", "Successfully queried - Tweet ID: 840698636975636481\n", "Successfully queried - Tweet ID: 840696689258311684\n", "Successfully queried - Tweet ID: 840632337062862849\n", "Successfully queried - Tweet ID: 840370681858686976\n", "Successfully queried - Tweet ID: 840268004936019968\n", "Successfully queried - Tweet ID: 839990271299457024\n", "Successfully queried - Tweet ID: 839549326359670784\n", "Successfully queried - Tweet ID: 839290600511926273\n", "Successfully queried - Tweet ID: 839239871831150596\n", "Successfully queried - Tweet ID: 838952994649550848\n", "Successfully queried - Tweet ID: 838921590096166913\n", "Successfully queried - Tweet ID: 838916489579200512\n", "Successfully queried - Tweet ID: 838831947270979586\n", "Successfully queried - Tweet ID: 838561493054533637\n", "Successfully queried - Tweet ID: 838476387338051585\n", "Successfully queried - Tweet ID: 838201503651401729\n", "Successfully queried - Tweet ID: 838150277551247360\n", "Successfully queried - Tweet ID: 838085839343206401\n", "Successfully queried - Tweet ID: 838083903487373313\n", "Successfully queried - Tweet ID: 837820167694528512\n", "Successfully queried - Tweet ID: 837482249356513284\n", "Successfully queried - Tweet ID: 837471256429613056\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 837366284874571778\n", "Successfully queried - Tweet ID: 837110210464448512\n", "Error Tweet_Id: 837012587749474308 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 837012587749474308\n", "Successfully queried - Tweet ID: 836989968035819520\n", "Successfully queried - Tweet ID: 836753516572119041\n", "Successfully queried - Tweet ID: 836677758902222849\n", "Successfully queried - Tweet ID: 836648853927522308\n", "Successfully queried - Tweet ID: 836397794269200385\n", "Successfully queried - Tweet ID: 836380477523124226\n", "Successfully queried - Tweet ID: 836260088725786625\n", "Successfully queried - Tweet ID: 836001077879255040\n", "Successfully queried - Tweet ID: 835685285446955009\n", "Successfully queried - Tweet ID: 835574547218894849\n", "Successfully queried - Tweet ID: 835536468978302976\n", "Successfully queried - Tweet ID: 835309094223372289\n", "Successfully queried - Tweet ID: 835297930240217089\n", "Successfully queried - Tweet ID: 835264098648616962\n", "Successfully queried - Tweet ID: 835246439529840640\n", "Successfully queried - Tweet ID: 835172783151792128\n", "Successfully queried - Tweet ID: 835152434251116546\n", "Successfully queried - Tweet ID: 834931633769889797\n", "Successfully queried - Tweet ID: 834786237630337024\n", "Successfully queried - Tweet ID: 834574053763584002\n", "Successfully queried - Tweet ID: 834477809192075265\n", "Successfully queried - Tweet ID: 834458053273591808\n", "Successfully queried - Tweet ID: 834209720923721728\n", "Successfully queried - Tweet ID: 834167344700198914\n", "Successfully queried - Tweet ID: 834089966724603904\n", "Successfully queried - Tweet ID: 834086379323871233\n", "Successfully queried - Tweet ID: 833863086058651648\n", "Successfully queried - Tweet ID: 833826103416520705\n", "Successfully queried - Tweet ID: 833732339549220864\n", "Successfully queried - Tweet ID: 833722901757046785\n", "Successfully queried - Tweet ID: 833479644947025920\n", "Successfully queried - Tweet ID: 833124694597443584\n", "Successfully queried - Tweet ID: 832998151111966721\n", "Successfully queried - Tweet ID: 832769181346996225\n", "Successfully queried - Tweet ID: 832757312314028032\n", "Successfully queried - Tweet ID: 832682457690300417\n", "Successfully queried - Tweet ID: 832645525019123713\n", "Successfully queried - Tweet ID: 832636094638288896\n", "Successfully queried - Tweet ID: 832397543355072512\n", "Successfully queried - Tweet ID: 832369877331693569\n", "Successfully queried - Tweet ID: 832273440279240704\n", "Successfully queried - Tweet ID: 832215909146226688\n", "Successfully queried - Tweet ID: 832215726631055365\n", "Successfully queried - Tweet ID: 832088576586297345\n", "Successfully queried - Tweet ID: 832040443403784192\n", "Successfully queried - Tweet ID: 832032802820481025\n", "Successfully queried - Tweet ID: 831939777352105988\n", "Successfully queried - Tweet ID: 831926988323639298\n", "Successfully queried - Tweet ID: 831911600680497154\n", "Successfully queried - Tweet ID: 831670449226514432\n", "Successfully queried - Tweet ID: 831650051525054464\n", "Successfully queried - Tweet ID: 831552930092285952\n", "Successfully queried - Tweet ID: 831322785565769729\n", "Successfully queried - Tweet ID: 831315979191906304\n", "Successfully queried - Tweet ID: 831309418084069378\n", "Successfully queried - Tweet ID: 831262627380748289\n", "Successfully queried - Tweet ID: 830956169170665475\n", "Successfully queried - Tweet ID: 830583320585068544\n", "Successfully queried - Tweet ID: 830173239259324417\n", "Successfully queried - Tweet ID: 830097400375152640\n", "Successfully queried - Tweet ID: 829878982036299777\n", "Successfully queried - Tweet ID: 829861396166877184\n", "Successfully queried - Tweet ID: 829501995190984704\n", "Successfully queried - Tweet ID: 829449946868879360\n", "Successfully queried - Tweet ID: 829374341691346946\n", "Successfully queried - Tweet ID: 829141528400556032\n", "Successfully queried - Tweet ID: 829011960981237760\n", "Successfully queried - Tweet ID: 828801551087042563\n", "Successfully queried - Tweet ID: 828770345708580865\n", "Successfully queried - Tweet ID: 828708714936930305\n", "Successfully queried - Tweet ID: 828650029636317184\n", "Successfully queried - Tweet ID: 828409743546925057\n", "Successfully queried - Tweet ID: 828408677031882754\n", "Successfully queried - Tweet ID: 828381636999917570\n", "Successfully queried - Tweet ID: 828376505180889089\n", "Successfully queried - Tweet ID: 828372645993398273\n", "Successfully queried - Tweet ID: 828361771580813312\n", "Successfully queried - Tweet ID: 828046555563323392\n", "Successfully queried - Tweet ID: 828011680017821696\n", "Successfully queried - Tweet ID: 827933404142436356\n", "Successfully queried - Tweet ID: 827653905312006145\n", "Successfully queried - Tweet ID: 827600520311402496\n", "Successfully queried - Tweet ID: 827324948884643840\n", "Error Tweet_Id: 827228250799742977 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 827228250799742977\n", "Successfully queried - Tweet ID: 827199976799354881\n", "Successfully queried - Tweet ID: 826958653328592898\n", "Successfully queried - Tweet ID: 826848821049180160\n", "Successfully queried - Tweet ID: 826615380357632002\n", "Successfully queried - Tweet ID: 826598799820865537\n", "Successfully queried - Tweet ID: 826598365270007810\n", "Successfully queried - Tweet ID: 826476773533745153\n", "Successfully queried - Tweet ID: 826240494070030336\n", "Successfully queried - Tweet ID: 826204788643753985\n", "Successfully queried - Tweet ID: 826115272272650244\n", "Successfully queried - Tweet ID: 825876512159186944\n", "Successfully queried - Tweet ID: 825829644528148480\n", "Successfully queried - Tweet ID: 825535076884762624\n", "Successfully queried - Tweet ID: 825147591692263424\n", "Successfully queried - Tweet ID: 825120256414846976\n", "Successfully queried - Tweet ID: 825026590719483904\n", "Successfully queried - Tweet ID: 824796380199809024\n", "Successfully queried - Tweet ID: 824775126675836928\n", "Successfully queried - Tweet ID: 824663926340194305\n", "Successfully queried - Tweet ID: 824325613288833024\n", "Successfully queried - Tweet ID: 824297048279236611\n", "Successfully queried - Tweet ID: 824025158776213504\n", "Successfully queried - Tweet ID: 823939628516474880\n", "Successfully queried - Tweet ID: 823719002937630720\n", "Successfully queried - Tweet ID: 823699002998870016\n", "Successfully queried - Tweet ID: 823581115634085888\n", "Successfully queried - Tweet ID: 823333489516937216\n", "Successfully queried - Tweet ID: 823322678127919110\n", "Successfully queried - Tweet ID: 823269594223824897\n", "Successfully queried - Tweet ID: 822975315408461824\n", "Successfully queried - Tweet ID: 822872901745569793\n", "Successfully queried - Tweet ID: 822859134160621569\n", "Successfully queried - Tweet ID: 822647212903690241\n", "Successfully queried - Tweet ID: 822610361945911296\n", "Successfully queried - Tweet ID: 822489057087389700\n", "Successfully queried - Tweet ID: 822462944365645825\n", "Successfully queried - Tweet ID: 822244816520155136\n", "Successfully queried - Tweet ID: 822163064745328640\n", "Successfully queried - Tweet ID: 821886076407029760\n", "Successfully queried - Tweet ID: 821813639212650496\n", "Successfully queried - Tweet ID: 821765923262631936\n", "Successfully queried - Tweet ID: 821522889702862852\n", "Successfully queried - Tweet ID: 821421320206483457\n", "Successfully queried - Tweet ID: 821407182352777218\n", "Successfully queried - Tweet ID: 821153421864615936\n", "Successfully queried - Tweet ID: 821149554670182400\n", "Successfully queried - Tweet ID: 821107785811234820\n", "Successfully queried - Tweet ID: 821044531881721856\n", "Successfully queried - Tweet ID: 820837357901512704\n", "Successfully queried - Tweet ID: 820749716845686786\n", "Successfully queried - Tweet ID: 820690176645140481\n", "Successfully queried - Tweet ID: 820494788566847489\n", "Successfully queried - Tweet ID: 820446719150292993\n", "Successfully queried - Tweet ID: 820314633777061888\n", "Successfully queried - Tweet ID: 820078625395449857\n", "Successfully queried - Tweet ID: 820013781606658049\n", "Successfully queried - Tweet ID: 819952236453363712\n", "Successfully queried - Tweet ID: 819924195358416896\n", "Successfully queried - Tweet ID: 819711362133872643\n", "Successfully queried - Tweet ID: 819588359383371776\n", "Successfully queried - Tweet ID: 819347104292290561\n", "Successfully queried - Tweet ID: 819238181065359361\n", "Successfully queried - Tweet ID: 819227688460238848\n", "Successfully queried - Tweet ID: 819015337530290176\n", "Successfully queried - Tweet ID: 819015331746349057\n", "Successfully queried - Tweet ID: 819006400881917954\n", "Successfully queried - Tweet ID: 819004803107983360\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 818646164899774465\n", "Successfully queried - Tweet ID: 818627210458333184\n", "Successfully queried - Tweet ID: 818614493328580609\n", "Successfully queried - Tweet ID: 818588835076603904\n", "Successfully queried - Tweet ID: 818536468981415936\n", "Successfully queried - Tweet ID: 818307523543449600\n", "Successfully queried - Tweet ID: 818259473185828864\n", "Successfully queried - Tweet ID: 818145370475810820\n", "Successfully queried - Tweet ID: 817908911860748288\n", "Successfully queried - Tweet ID: 817827839487737858\n", "Successfully queried - Tweet ID: 817777686764523521\n", "Successfully queried - Tweet ID: 817536400337801217\n", "Successfully queried - Tweet ID: 817502432452313088\n", "Successfully queried - Tweet ID: 817423860136083457\n", "Successfully queried - Tweet ID: 817415592588222464\n", "Successfully queried - Tweet ID: 817181837579653120\n", "Successfully queried - Tweet ID: 817171292965273600\n", "Successfully queried - Tweet ID: 817120970343411712\n", "Successfully queried - Tweet ID: 817056546584727552\n", "Successfully queried - Tweet ID: 816829038950027264\n", "Successfully queried - Tweet ID: 816816676327063552\n", "Successfully queried - Tweet ID: 816697700272001025\n", "Successfully queried - Tweet ID: 816450570814898180\n", "Successfully queried - Tweet ID: 816336735214911488\n", "Successfully queried - Tweet ID: 816091915477250048\n", "Successfully queried - Tweet ID: 816062466425819140\n", "Successfully queried - Tweet ID: 816014286006976512\n", "Successfully queried - Tweet ID: 815990720817401858\n", "Successfully queried - Tweet ID: 815966073409433600\n", "Successfully queried - Tweet ID: 815745968457060357\n", "Successfully queried - Tweet ID: 815736392542261248\n", "Successfully queried - Tweet ID: 815639385530101762\n", "Successfully queried - Tweet ID: 815390420867969024\n", "Successfully queried - Tweet ID: 814986499976527872\n", "Successfully queried - Tweet ID: 814638523311648768\n", "Successfully queried - Tweet ID: 814578408554463233\n", "Successfully queried - Tweet ID: 814530161257443328\n", "Successfully queried - Tweet ID: 814153002265309185\n", "Successfully queried - Tweet ID: 813944609378369540\n", "Successfully queried - Tweet ID: 813910438903693312\n", "Successfully queried - Tweet ID: 813812741911748608\n", "Successfully queried - Tweet ID: 813800681631023104\n", "Successfully queried - Tweet ID: 813217897535406080\n", "Successfully queried - Tweet ID: 813202720496779264\n", "Successfully queried - Tweet ID: 813187593374461952\n", "Successfully queried - Tweet ID: 813172488309972993\n", "Successfully queried - Tweet ID: 813157409116065792\n", "Successfully queried - Tweet ID: 813142292504645637\n", "Successfully queried - Tweet ID: 813130366689148928\n", "Successfully queried - Tweet ID: 813127251579564032\n", "Successfully queried - Tweet ID: 813112105746448384\n", "Successfully queried - Tweet ID: 813096984823349248\n", "Successfully queried - Tweet ID: 813081950185472002\n", "Successfully queried - Tweet ID: 813066809284972545\n", "Successfully queried - Tweet ID: 813051746834595840\n", "Successfully queried - Tweet ID: 812781120811126785\n", "Error Tweet_Id: 812747805718642688 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 812747805718642688\n", "Successfully queried - Tweet ID: 812709060537683968\n", "Successfully queried - Tweet ID: 812503143955202048\n", "Successfully queried - Tweet ID: 812466873996607488\n", "Successfully queried - Tweet ID: 812372279581671427\n", "Successfully queried - Tweet ID: 811985624773361665\n", "Successfully queried - Tweet ID: 811744202451197953\n", "Successfully queried - Tweet ID: 811647686436880384\n", "Successfully queried - Tweet ID: 811627233043480576\n", "Successfully queried - Tweet ID: 811386762094317568\n", "Successfully queried - Tweet ID: 810984652412424192\n", "Successfully queried - Tweet ID: 810896069567610880\n", "Successfully queried - Tweet ID: 810657578271330305\n", "Successfully queried - Tweet ID: 810284430598270976\n", "Successfully queried - Tweet ID: 810254108431155201\n", "Successfully queried - Tweet ID: 809920764300447744\n", "Successfully queried - Tweet ID: 809808892968534016\n", "Successfully queried - Tweet ID: 809448704142938112\n", "Successfully queried - Tweet ID: 809220051211603969\n", "Successfully queried - Tweet ID: 809084759137812480\n", "Successfully queried - Tweet ID: 808838249661788160\n", "Successfully queried - Tweet ID: 808733504066486276\n", "Successfully queried - Tweet ID: 808501579447930884\n", "Successfully queried - Tweet ID: 808344865868283904\n", "Successfully queried - Tweet ID: 808134635716833280\n", "Successfully queried - Tweet ID: 808106460588765185\n", "Successfully queried - Tweet ID: 808001312164028416\n", "Successfully queried - Tweet ID: 807621403335917568\n", "Successfully queried - Tweet ID: 807106840509214720\n", "Successfully queried - Tweet ID: 807059379405148160\n", "Successfully queried - Tweet ID: 807010152071229440\n", "Successfully queried - Tweet ID: 806629075125202948\n", "Successfully queried - Tweet ID: 806620845233815552\n", "Successfully queried - Tweet ID: 806576416489959424\n", "Successfully queried - Tweet ID: 806542213899489280\n", "Successfully queried - Tweet ID: 806242860592926720\n", "Successfully queried - Tweet ID: 806219024703037440\n", "Successfully queried - Tweet ID: 805958939288408065\n", "Successfully queried - Tweet ID: 805932879469572096\n", "Successfully queried - Tweet ID: 805826884734976000\n", "Successfully queried - Tweet ID: 805823200554876929\n", "Successfully queried - Tweet ID: 805520635690676224\n", "Successfully queried - Tweet ID: 805487436403003392\n", "Successfully queried - Tweet ID: 805207613751304193\n", "Successfully queried - Tweet ID: 804738756058218496\n", "Successfully queried - Tweet ID: 804475857670639616\n", "Successfully queried - Tweet ID: 804413760345620481\n", "Successfully queried - Tweet ID: 804026241225523202\n", "Successfully queried - Tweet ID: 803773340896923648\n", "Successfully queried - Tweet ID: 803692223237865472\n", "Successfully queried - Tweet ID: 803638050916102144\n", "Successfully queried - Tweet ID: 803380650405482500\n", "Successfully queried - Tweet ID: 803321560782307329\n", "Successfully queried - Tweet ID: 803276597545603072\n", "Successfully queried - Tweet ID: 802952499103731712\n", "Successfully queried - Tweet ID: 802624713319034886\n", "Successfully queried - Tweet ID: 802600418706604034\n", "Successfully queried - Tweet ID: 802572683846291456\n", "Successfully queried - Tweet ID: 802323869084381190\n", "Successfully queried - Tweet ID: 802265048156610565\n", "Error Tweet_Id: 802247111496568832 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 802247111496568832\n", "Successfully queried - Tweet ID: 802239329049477120\n", "Successfully queried - Tweet ID: 802185808107208704\n", "Successfully queried - Tweet ID: 801958328846974976\n", "Successfully queried - Tweet ID: 801854953262350336\n", "Successfully queried - Tweet ID: 801538201127157760\n", "Successfully queried - Tweet ID: 801285448605831168\n", "Successfully queried - Tweet ID: 801167903437357056\n", "Successfully queried - Tweet ID: 801127390143516673\n", "Successfully queried - Tweet ID: 801115127852503040\n", "Successfully queried - Tweet ID: 800859414831898624\n", "Successfully queried - Tweet ID: 800855607700029440\n", "Successfully queried - Tweet ID: 800751577355128832\n", "Successfully queried - Tweet ID: 800513324630806528\n", "Successfully queried - Tweet ID: 800459316964663297\n", "Successfully queried - Tweet ID: 800443802682937345\n", "Successfully queried - Tweet ID: 800388270626521089\n", "Successfully queried - Tweet ID: 800188575492947969\n", "Successfully queried - Tweet ID: 800141422401830912\n", "Successfully queried - Tweet ID: 800018252395122689\n", "Successfully queried - Tweet ID: 799774291445383169\n", "Successfully queried - Tweet ID: 799757965289017345\n", "Successfully queried - Tweet ID: 799422933579902976\n", "Successfully queried - Tweet ID: 799308762079035393\n", "Successfully queried - Tweet ID: 799297110730567681\n", "Successfully queried - Tweet ID: 799063482566066176\n", "Successfully queried - Tweet ID: 798933969379225600\n", "Successfully queried - Tweet ID: 798925684722855936\n", "Successfully queried - Tweet ID: 798705661114773508\n", "Successfully queried - Tweet ID: 798701998996647937\n", "Successfully queried - Tweet ID: 798697898615730177\n", "Successfully queried - Tweet ID: 798694562394996736\n", "Successfully queried - Tweet ID: 798686750113755136\n", "Successfully queried - Tweet ID: 798682547630837760\n", "Successfully queried - Tweet ID: 798673117451325440\n", "Successfully queried - Tweet ID: 798665375516884993\n", "Successfully queried - Tweet ID: 798644042770751489\n", "Successfully queried - Tweet ID: 798628517273620480\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 798585098161549313\n", "Successfully queried - Tweet ID: 798576900688019456\n", "Successfully queried - Tweet ID: 798340744599797760\n", "Successfully queried - Tweet ID: 798209839306514432\n", "Successfully queried - Tweet ID: 797971864723324932\n", "Successfully queried - Tweet ID: 797545162159308800\n", "Successfully queried - Tweet ID: 797236660651966464\n", "Successfully queried - Tweet ID: 797165961484890113\n", "Successfully queried - Tweet ID: 796904159865868288\n", "Successfully queried - Tweet ID: 796865951799083009\n", "Successfully queried - Tweet ID: 796759840936919040\n", "Successfully queried - Tweet ID: 796563435802726400\n", "Successfully queried - Tweet ID: 796484825502875648\n", "Successfully queried - Tweet ID: 796387464403357696\n", "Successfully queried - Tweet ID: 796177847564038144\n", "Successfully queried - Tweet ID: 796149749086875649\n", "Successfully queried - Tweet ID: 796125600683540480\n", "Successfully queried - Tweet ID: 796116448414461957\n", "Successfully queried - Tweet ID: 796080075804475393\n", "Successfully queried - Tweet ID: 796031486298386433\n", "Successfully queried - Tweet ID: 795464331001561088\n", "Successfully queried - Tweet ID: 795400264262053889\n", "Successfully queried - Tweet ID: 795076730285391872\n", "Successfully queried - Tweet ID: 794983741416415232\n", "Successfully queried - Tweet ID: 794926597468000259\n", "Successfully queried - Tweet ID: 794355576146903043\n", "Successfully queried - Tweet ID: 794332329137291264\n", "Successfully queried - Tweet ID: 794205286408003585\n", "Successfully queried - Tweet ID: 793962221541933056\n", "Successfully queried - Tweet ID: 793845145112371200\n", "Successfully queried - Tweet ID: 793614319594401792\n", "Successfully queried - Tweet ID: 793601777308463104\n", "Successfully queried - Tweet ID: 793500921481273345\n", "Successfully queried - Tweet ID: 793286476301799424\n", "Successfully queried - Tweet ID: 793271401113350145\n", "Successfully queried - Tweet ID: 793256262322548741\n", "Successfully queried - Tweet ID: 793241302385262592\n", "Successfully queried - Tweet ID: 793226087023144960\n", "Successfully queried - Tweet ID: 793210959003287553\n", "Successfully queried - Tweet ID: 793195938047070209\n", "Successfully queried - Tweet ID: 793180763617361921\n", "Successfully queried - Tweet ID: 793165685325201412\n", "Successfully queried - Tweet ID: 793150605191548928\n", "Successfully queried - Tweet ID: 793135492858580992\n", "Successfully queried - Tweet ID: 793120401413079041\n", "Successfully queried - Tweet ID: 792913359805018113\n", "Successfully queried - Tweet ID: 792883833364439040\n", "Successfully queried - Tweet ID: 792773781206999040\n", "Successfully queried - Tweet ID: 792394556390137856\n", "Successfully queried - Tweet ID: 792050063153438720\n", "Successfully queried - Tweet ID: 791821351946420224\n", "Successfully queried - Tweet ID: 791784077045166082\n", "Successfully queried - Tweet ID: 791780927877898241\n", "Successfully queried - Tweet ID: 791774931465953280\n", "Successfully queried - Tweet ID: 791672322847637504\n", "Successfully queried - Tweet ID: 791406955684368384\n", "Successfully queried - Tweet ID: 791312159183634433\n", "Successfully queried - Tweet ID: 791026214425268224\n", "Successfully queried - Tweet ID: 790987426131050500\n", "Successfully queried - Tweet ID: 790946055508652032\n", "Successfully queried - Tweet ID: 790723298204217344\n", "Successfully queried - Tweet ID: 790698755171364864\n", "Successfully queried - Tweet ID: 790581949425475584\n", "Successfully queried - Tweet ID: 790337589677002753\n", "Successfully queried - Tweet ID: 790277117346975746\n", "Successfully queried - Tweet ID: 790227638568808452\n", "Successfully queried - Tweet ID: 789986466051088384\n", "Successfully queried - Tweet ID: 789960241177853952\n", "Successfully queried - Tweet ID: 789903600034189313\n", "Successfully queried - Tweet ID: 789628658055020548\n", "Successfully queried - Tweet ID: 789599242079838210\n", "Successfully queried - Tweet ID: 789530877013393408\n", "Successfully queried - Tweet ID: 789314372632018944\n", "Successfully queried - Tweet ID: 789280767834746880\n", "Successfully queried - Tweet ID: 789268448748703744\n", "Successfully queried - Tweet ID: 789137962068021249\n", "Successfully queried - Tweet ID: 788908386943430656\n", "Successfully queried - Tweet ID: 788765914992902144\n", "Successfully queried - Tweet ID: 788552643979468800\n", "Successfully queried - Tweet ID: 788412144018661376\n", "Successfully queried - Tweet ID: 788178268662984705\n", "Successfully queried - Tweet ID: 788150585577050112\n", "Successfully queried - Tweet ID: 788070120937619456\n", "Successfully queried - Tweet ID: 788039637453406209\n", "Successfully queried - Tweet ID: 787810552592695296\n", "Successfully queried - Tweet ID: 787717603741622272\n", "Successfully queried - Tweet ID: 787397959788929025\n", "Successfully queried - Tweet ID: 787322443945877504\n", "Successfully queried - Tweet ID: 787111942498508800\n", "Successfully queried - Tweet ID: 786963064373534720\n", "Successfully queried - Tweet ID: 786729988674449408\n", "Successfully queried - Tweet ID: 786709082849828864\n", "Successfully queried - Tweet ID: 786664955043049472\n", "Successfully queried - Tweet ID: 786595970293370880\n", "Successfully queried - Tweet ID: 786363235746385920\n", "Successfully queried - Tweet ID: 786286427768250368\n", "Successfully queried - Tweet ID: 786233965241827333\n", "Successfully queried - Tweet ID: 786051337297522688\n", "Successfully queried - Tweet ID: 786036967502913536\n", "Successfully queried - Tweet ID: 785927819176054784\n", "Successfully queried - Tweet ID: 785872687017132033\n", "Successfully queried - Tweet ID: 785639753186217984\n", "Successfully queried - Tweet ID: 785533386513321988\n", "Successfully queried - Tweet ID: 785515384317313025\n", "Successfully queried - Tweet ID: 785264754247995392\n", "Successfully queried - Tweet ID: 785170936622350336\n", "Successfully queried - Tweet ID: 784826020293709826\n", "Successfully queried - Tweet ID: 784517518371221505\n", "Successfully queried - Tweet ID: 784431430411685888\n", "Successfully queried - Tweet ID: 784183165795655680\n", "Successfully queried - Tweet ID: 784057939640352768\n", "Successfully queried - Tweet ID: 783839966405230592\n", "Successfully queried - Tweet ID: 783821107061198850\n", "Successfully queried - Tweet ID: 783695101801398276\n", "Successfully queried - Tweet ID: 783466772167098368\n", "Successfully queried - Tweet ID: 783391753726550016\n", "Successfully queried - Tweet ID: 783347506784731136\n", "Successfully queried - Tweet ID: 783334639985389568\n", "Successfully queried - Tweet ID: 783085703974514689\n", "Successfully queried - Tweet ID: 782969140009107456\n", "Successfully queried - Tweet ID: 782747134529531904\n", "Successfully queried - Tweet ID: 782722598790725632\n", "Successfully queried - Tweet ID: 782598640137187329\n", "Successfully queried - Tweet ID: 782305867769217024\n", "Successfully queried - Tweet ID: 782021823840026624\n", "Successfully queried - Tweet ID: 781955203444699136\n", "Successfully queried - Tweet ID: 781661882474196992\n", "Successfully queried - Tweet ID: 781655249211752448\n", "Successfully queried - Tweet ID: 781524693396357120\n", "Successfully queried - Tweet ID: 781308096455073793\n", "Successfully queried - Tweet ID: 781251288990355457\n", "Successfully queried - Tweet ID: 781163403222056960\n", "Successfully queried - Tweet ID: 780931614150983680\n", "Successfully queried - Tweet ID: 780858289093574656\n", "Successfully queried - Tweet ID: 780800785462489090\n", "Successfully queried - Tweet ID: 780601303617732608\n", "Successfully queried - Tweet ID: 780543529827336192\n", "Successfully queried - Tweet ID: 780496263422808064\n", "Successfully queried - Tweet ID: 780476555013349377\n", "Successfully queried - Tweet ID: 780459368902959104\n", "Successfully queried - Tweet ID: 780192070812196864\n", "Successfully queried - Tweet ID: 780092040432480260\n", "Successfully queried - Tweet ID: 780074436359819264\n", "Successfully queried - Tweet ID: 779834332596887552\n", "Successfully queried - Tweet ID: 779377524342161408\n", "Successfully queried - Tweet ID: 779124354206535695\n", "Successfully queried - Tweet ID: 779123168116150273\n", "Successfully queried - Tweet ID: 779056095788752897\n", "Successfully queried - Tweet ID: 778990705243029504\n", "Successfully queried - Tweet ID: 778774459159379968\n", "Successfully queried - Tweet ID: 778764940568104960\n", "Successfully queried - Tweet ID: 778748913645780993\n", "Successfully queried - Tweet ID: 778650543019483137\n", "Successfully queried - Tweet ID: 778624900596654080\n", "Successfully queried - Tweet ID: 778408200802557953\n", "Successfully queried - Tweet ID: 778396591732486144\n", "Successfully queried - Tweet ID: 778383385161035776\n", "Successfully queried - Tweet ID: 778286810187399168\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 778039087836069888\n", "Successfully queried - Tweet ID: 778027034220126208\n", "Successfully queried - Tweet ID: 777953400541634568\n", "Successfully queried - Tweet ID: 777885040357281792\n", "Successfully queried - Tweet ID: 777684233540206592\n", "Successfully queried - Tweet ID: 777641927919427584\n", "Successfully queried - Tweet ID: 777621514455814149\n", "Successfully queried - Tweet ID: 777189768882946048\n", "Successfully queried - Tweet ID: 776819012571455488\n", "Successfully queried - Tweet ID: 776813020089548800\n", "Successfully queried - Tweet ID: 776477788987613185\n", "Successfully queried - Tweet ID: 776249906839351296\n", "Successfully queried - Tweet ID: 776218204058357768\n", "Successfully queried - Tweet ID: 776201521193218049\n", "Successfully queried - Tweet ID: 776113305656188928\n", "Successfully queried - Tweet ID: 776088319444877312\n", "Successfully queried - Tweet ID: 775898661951791106\n", "Successfully queried - Tweet ID: 775842724423557120\n", "Successfully queried - Tweet ID: 775733305207554048\n", "Successfully queried - Tweet ID: 775729183532220416\n", "Successfully queried - Tweet ID: 775364825476165632\n", "Successfully queried - Tweet ID: 775350846108426240\n", "Error Tweet_Id: 775096608509886464 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 775096608509886464\n", "Successfully queried - Tweet ID: 775085132600442880\n", "Successfully queried - Tweet ID: 774757898236878852\n", "Successfully queried - Tweet ID: 774639387460112384\n", "Successfully queried - Tweet ID: 774314403806253056\n", "Successfully queried - Tweet ID: 773985732834758656\n", "Successfully queried - Tweet ID: 773922284943896577\n", "Successfully queried - Tweet ID: 773704687002451968\n", "Successfully queried - Tweet ID: 773670353721753600\n", "Successfully queried - Tweet ID: 773547596996571136\n", "Successfully queried - Tweet ID: 773336787167145985\n", "Successfully queried - Tweet ID: 773308824254029826\n", "Successfully queried - Tweet ID: 773247561583001600\n", "Successfully queried - Tweet ID: 773191612633579521\n", "Successfully queried - Tweet ID: 772877495989305348\n", "Successfully queried - Tweet ID: 772826264096874500\n", "Successfully queried - Tweet ID: 772615324260794368\n", "Successfully queried - Tweet ID: 772581559778025472\n", "Successfully queried - Tweet ID: 772193107915964416\n", "Successfully queried - Tweet ID: 772152991789019136\n", "Successfully queried - Tweet ID: 772117678702071809\n", "Successfully queried - Tweet ID: 772114945936949249\n", "Successfully queried - Tweet ID: 772102971039580160\n", "Successfully queried - Tweet ID: 771908950375665664\n", "Successfully queried - Tweet ID: 771770456517009408\n", "Successfully queried - Tweet ID: 771500966810099713\n", "Successfully queried - Tweet ID: 771380798096281600\n", "Successfully queried - Tweet ID: 771171053431250945\n", "Successfully queried - Tweet ID: 771136648247640064\n", "Successfully queried - Tweet ID: 771102124360998913\n", "Successfully queried - Tweet ID: 771014301343748096\n", "Successfully queried - Tweet ID: 771004394259247104\n", "Successfully queried - Tweet ID: 770787852854652928\n", "Successfully queried - Tweet ID: 770772759874076672\n", "Error Tweet_Id: 770743923962707968 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 770743923962707968\n", "Successfully queried - Tweet ID: 770655142660169732\n", "Successfully queried - Tweet ID: 770414278348247044\n", "Successfully queried - Tweet ID: 770293558247038976\n", "Successfully queried - Tweet ID: 770093767776997377\n", "Successfully queried - Tweet ID: 770069151037685760\n", "Successfully queried - Tweet ID: 769940425801170949\n", "Successfully queried - Tweet ID: 769695466921623552\n", "Successfully queried - Tweet ID: 769335591808995329\n", "Successfully queried - Tweet ID: 769212283578875904\n", "Successfully queried - Tweet ID: 768970937022709760\n", "Successfully queried - Tweet ID: 768909767477751808\n", "Successfully queried - Tweet ID: 768855141948723200\n", "Successfully queried - Tweet ID: 768609597686943744\n", "Successfully queried - Tweet ID: 768596291618299904\n", "Successfully queried - Tweet ID: 768554158521745409\n", "Successfully queried - Tweet ID: 768473857036525572\n", "Successfully queried - Tweet ID: 768193404517830656\n", "Successfully queried - Tweet ID: 767884188863397888\n", "Successfully queried - Tweet ID: 767754930266464257\n", "Successfully queried - Tweet ID: 767500508068192258\n", "Successfully queried - Tweet ID: 767191397493538821\n", "Successfully queried - Tweet ID: 767122157629476866\n", "Successfully queried - Tweet ID: 766864461642756096\n", "Successfully queried - Tweet ID: 766793450729734144\n", "Successfully queried - Tweet ID: 766714921925144576\n", "Successfully queried - Tweet ID: 766693177336135680\n", "Successfully queried - Tweet ID: 766423258543644672\n", "Successfully queried - Tweet ID: 766313316352462849\n", "Successfully queried - Tweet ID: 766078092750233600\n", "Successfully queried - Tweet ID: 766069199026450432\n", "Successfully queried - Tweet ID: 766008592277377025\n", "Successfully queried - Tweet ID: 765719909049503744\n", "Successfully queried - Tweet ID: 765669560888528897\n", "Successfully queried - Tweet ID: 765395769549590528\n", "Successfully queried - Tweet ID: 765371061932261376\n", "Successfully queried - Tweet ID: 765222098633691136\n", "Successfully queried - Tweet ID: 764857477905154048\n", "Successfully queried - Tweet ID: 764259802650378240\n", "Successfully queried - Tweet ID: 763956972077010945\n", "Successfully queried - Tweet ID: 763837565564780549\n", "Successfully queried - Tweet ID: 763183847194451968\n", "Successfully queried - Tweet ID: 763167063695355904\n", "Successfully queried - Tweet ID: 763103485927849985\n", "Successfully queried - Tweet ID: 762699858130116608\n", "Successfully queried - Tweet ID: 762471784394268675\n", "Successfully queried - Tweet ID: 762464539388485633\n", "Successfully queried - Tweet ID: 762316489655476224\n", "Successfully queried - Tweet ID: 762035686371364864\n", "Successfully queried - Tweet ID: 761976711479193600\n", "Successfully queried - Tweet ID: 761750502866649088\n", "Successfully queried - Tweet ID: 761745352076779520\n", "Successfully queried - Tweet ID: 761672994376806400\n", "Successfully queried - Tweet ID: 761599872357261312\n", "Successfully queried - Tweet ID: 761371037149827077\n", "Successfully queried - Tweet ID: 761334018830917632\n", "Successfully queried - Tweet ID: 761292947749015552\n", "Successfully queried - Tweet ID: 761227390836215808\n", "Successfully queried - Tweet ID: 761004547850530816\n", "Successfully queried - Tweet ID: 760893934457552897\n", "Successfully queried - Tweet ID: 760656994973933572\n", "Successfully queried - Tweet ID: 760641137271070720\n", "Successfully queried - Tweet ID: 760539183865880579\n", "Successfully queried - Tweet ID: 760521673607086080\n", "Successfully queried - Tweet ID: 760290219849637889\n", "Successfully queried - Tweet ID: 760252756032651264\n", "Successfully queried - Tweet ID: 760190180481531904\n", "Successfully queried - Tweet ID: 760153949710192640\n", "Successfully queried - Tweet ID: 759943073749200896\n", "Successfully queried - Tweet ID: 759923798737051648\n", "Successfully queried - Tweet ID: 759846353224826880\n", "Successfully queried - Tweet ID: 759793422261743616\n", "Successfully queried - Tweet ID: 759566828574212096\n", "Successfully queried - Tweet ID: 759557299618865152\n", "Successfully queried - Tweet ID: 759447681597108224\n", "Successfully queried - Tweet ID: 759446261539934208\n", "Successfully queried - Tweet ID: 759197388317847553\n", "Successfully queried - Tweet ID: 759159934323924993\n", "Successfully queried - Tweet ID: 759099523532779520\n", "Successfully queried - Tweet ID: 759047813560868866\n", "Successfully queried - Tweet ID: 758854675097526272\n", "Successfully queried - Tweet ID: 758828659922702336\n", "Successfully queried - Tweet ID: 758740312047005698\n", "Successfully queried - Tweet ID: 758474966123810816\n", "Successfully queried - Tweet ID: 758467244762497024\n", "Successfully queried - Tweet ID: 758405701903519748\n", "Successfully queried - Tweet ID: 758355060040593408\n", "Successfully queried - Tweet ID: 758099635764359168\n", "Successfully queried - Tweet ID: 758041019896193024\n", "Successfully queried - Tweet ID: 757741869644341248\n", "Successfully queried - Tweet ID: 757729163776290825\n", "Successfully queried - Tweet ID: 757725642876129280\n", "Successfully queried - Tweet ID: 757611664640446465\n", "Successfully queried - Tweet ID: 757597904299253760\n", "Successfully queried - Tweet ID: 757596066325864448\n", "Successfully queried - Tweet ID: 757400162377592832\n", "Successfully queried - Tweet ID: 757393109802180609\n", "Successfully queried - Tweet ID: 757354760399941633\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 756998049151549440\n", "Successfully queried - Tweet ID: 756939218950160384\n", "Successfully queried - Tweet ID: 756651752796094464\n", "Successfully queried - Tweet ID: 756526248105566208\n", "Successfully queried - Tweet ID: 756303284449767430\n", "Successfully queried - Tweet ID: 756288534030475264\n", "Successfully queried - Tweet ID: 756275833623502848\n", "Successfully queried - Tweet ID: 755955933503782912\n", "Successfully queried - Tweet ID: 755206590534418437\n", "Successfully queried - Tweet ID: 755110668769038337\n", "Successfully queried - Tweet ID: 754874841593970688\n", "Successfully queried - Tweet ID: 754856583969079297\n", "Successfully queried - Tweet ID: 754747087846248448\n", "Successfully queried - Tweet ID: 754482103782404096\n", "Successfully queried - Tweet ID: 754449512966619136\n", "Successfully queried - Tweet ID: 754120377874386944\n", "Error Tweet_Id: 754011816964026368 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 754011816964026368\n", "Successfully queried - Tweet ID: 753655901052166144\n", "Successfully queried - Tweet ID: 753420520834629632\n", "Successfully queried - Tweet ID: 753398408988139520\n", "Successfully queried - Tweet ID: 753375668877008896\n", "Successfully queried - Tweet ID: 753298634498793472\n", "Successfully queried - Tweet ID: 753294487569522689\n", "Successfully queried - Tweet ID: 753039830821511168\n", "Successfully queried - Tweet ID: 753026973505581056\n", "Successfully queried - Tweet ID: 752932432744185856\n", "Successfully queried - Tweet ID: 752917284578922496\n", "Successfully queried - Tweet ID: 752701944171524096\n", "Successfully queried - Tweet ID: 752682090207055872\n", "Successfully queried - Tweet ID: 752660715232722944\n", "Successfully queried - Tweet ID: 752568224206688256\n", "Successfully queried - Tweet ID: 752519690950500352\n", "Successfully queried - Tweet ID: 752334515931054080\n", "Successfully queried - Tweet ID: 752309394570878976\n", "Successfully queried - Tweet ID: 752173152931807232\n", "Successfully queried - Tweet ID: 751950017322246144\n", "Successfully queried - Tweet ID: 751937170840121344\n", "Successfully queried - Tweet ID: 751830394383790080\n", "Successfully queried - Tweet ID: 751793661361422336\n", "Successfully queried - Tweet ID: 751598357617971201\n", "Successfully queried - Tweet ID: 751583847268179968\n", "Successfully queried - Tweet ID: 751538714308972544\n", "Successfully queried - Tweet ID: 751456908746354688\n", "Successfully queried - Tweet ID: 751251247299190784\n", "Successfully queried - Tweet ID: 751205363882532864\n", "Successfully queried - Tweet ID: 751132876104687617\n", "Successfully queried - Tweet ID: 750868782890057730\n", "Successfully queried - Tweet ID: 750719632563142656\n", "Successfully queried - Tweet ID: 750506206503038976\n", "Successfully queried - Tweet ID: 750429297815552001\n", "Successfully queried - Tweet ID: 750383411068534784\n", "Successfully queried - Tweet ID: 750381685133418496\n", "Successfully queried - Tweet ID: 750147208377409536\n", "Successfully queried - Tweet ID: 750132105863102464\n", "Successfully queried - Tweet ID: 750117059602808832\n", "Successfully queried - Tweet ID: 750101899009982464\n", "Successfully queried - Tweet ID: 750086836815486976\n", "Successfully queried - Tweet ID: 750071704093859840\n", "Successfully queried - Tweet ID: 750056684286914561\n", "Successfully queried - Tweet ID: 750041628174217216\n", "Successfully queried - Tweet ID: 750026558547456000\n", "Successfully queried - Tweet ID: 750011400160841729\n", "Successfully queried - Tweet ID: 749996283729883136\n", "Successfully queried - Tweet ID: 749981277374128128\n", "Successfully queried - Tweet ID: 749774190421639168\n", "Successfully queried - Tweet ID: 749417653287129088\n", "Successfully queried - Tweet ID: 749403093750648834\n", "Successfully queried - Tweet ID: 749395845976588288\n", "Successfully queried - Tweet ID: 749317047558017024\n", "Successfully queried - Tweet ID: 749075273010798592\n", "Successfully queried - Tweet ID: 749064354620928000\n", "Successfully queried - Tweet ID: 749036806121881602\n", "Successfully queried - Tweet ID: 748977405889503236\n", "Successfully queried - Tweet ID: 748932637671223296\n", "Successfully queried - Tweet ID: 748705597323898880\n", "Successfully queried - Tweet ID: 748699167502000129\n", "Successfully queried - Tweet ID: 748692773788876800\n", "Successfully queried - Tweet ID: 748575535303884801\n", "Successfully queried - Tweet ID: 748568946752774144\n", "Successfully queried - Tweet ID: 748346686624440324\n", "Successfully queried - Tweet ID: 748337862848962560\n", "Successfully queried - Tweet ID: 748324050481647620\n", "Successfully queried - Tweet ID: 748307329658011649\n", "Successfully queried - Tweet ID: 748220828303695873\n", "Successfully queried - Tweet ID: 747963614829678593\n", "Successfully queried - Tweet ID: 747933425676525569\n", "Successfully queried - Tweet ID: 747885874273214464\n", "Successfully queried - Tweet ID: 747844099428986880\n", "Successfully queried - Tweet ID: 747816857231626240\n", "Successfully queried - Tweet ID: 747651430853525504\n", "Successfully queried - Tweet ID: 747648653817413632\n", "Successfully queried - Tweet ID: 747600769478692864\n", "Successfully queried - Tweet ID: 747594051852075008\n", "Successfully queried - Tweet ID: 747512671126323200\n", "Successfully queried - Tweet ID: 747461612269887489\n", "Successfully queried - Tweet ID: 747439450712596480\n", "Successfully queried - Tweet ID: 747242308580548608\n", "Successfully queried - Tweet ID: 747219827526344708\n", "Successfully queried - Tweet ID: 747204161125646336\n", "Successfully queried - Tweet ID: 747103485104099331\n", "Successfully queried - Tweet ID: 746906459439529985\n", "Successfully queried - Tweet ID: 746872823977771008\n", "Successfully queried - Tweet ID: 746818907684614144\n", "Successfully queried - Tweet ID: 746790600704425984\n", "Successfully queried - Tweet ID: 746757706116112384\n", "Successfully queried - Tweet ID: 746726898085036033\n", "Successfully queried - Tweet ID: 746542875601690625\n", "Successfully queried - Tweet ID: 746521445350707200\n", "Successfully queried - Tweet ID: 746507379341139972\n", "Successfully queried - Tweet ID: 746369468511756288\n", "Successfully queried - Tweet ID: 746131877086527488\n", "Successfully queried - Tweet ID: 746056683365994496\n", "Successfully queried - Tweet ID: 745789745784041472\n", "Successfully queried - Tweet ID: 745712589599014916\n", "Successfully queried - Tweet ID: 745433870967832576\n", "Successfully queried - Tweet ID: 745422732645535745\n", "Successfully queried - Tweet ID: 745314880350101504\n", "Successfully queried - Tweet ID: 745074613265149952\n", "Successfully queried - Tweet ID: 745057283344719872\n", "Successfully queried - Tweet ID: 744995568523612160\n", "Successfully queried - Tweet ID: 744971049620602880\n", "Successfully queried - Tweet ID: 744709971296780288\n", "Successfully queried - Tweet ID: 744334592493166593\n", "Successfully queried - Tweet ID: 744234799360020481\n", "Successfully queried - Tweet ID: 744223424764059648\n", "Successfully queried - Tweet ID: 743980027717509120\n", "Successfully queried - Tweet ID: 743895849529389061\n", "Successfully queried - Tweet ID: 743835915802583040\n", "Successfully queried - Tweet ID: 743609206067040256\n", "Successfully queried - Tweet ID: 743595368194129920\n", "Successfully queried - Tweet ID: 743545585370791937\n", "Successfully queried - Tweet ID: 743510151680958465\n", "Successfully queried - Tweet ID: 743253157753532416\n", "Successfully queried - Tweet ID: 743222593470234624\n", "Successfully queried - Tweet ID: 743210557239623680\n", "Successfully queried - Tweet ID: 742534281772302336\n", "Successfully queried - Tweet ID: 742528092657332225\n", "Successfully queried - Tweet ID: 742465774154047488\n", "Successfully queried - Tweet ID: 742423170473463808\n", "Successfully queried - Tweet ID: 742385895052087300\n", "Successfully queried - Tweet ID: 742161199639494656\n", "Successfully queried - Tweet ID: 742150209887731712\n", "Successfully queried - Tweet ID: 741793263812808706\n", "Successfully queried - Tweet ID: 741743634094141440\n", "Successfully queried - Tweet ID: 741438259667034112\n", "Successfully queried - Tweet ID: 741303864243200000\n", "Successfully queried - Tweet ID: 741099773336379392\n", "Successfully queried - Tweet ID: 741067306818797568\n", "Successfully queried - Tweet ID: 740995100998766593\n", "Successfully queried - Tweet ID: 740711788199743490\n", "Successfully queried - Tweet ID: 740699697422163968\n", "Successfully queried - Tweet ID: 740676976021798912\n", "Successfully queried - Tweet ID: 740373189193256964\n", "Successfully queried - Tweet ID: 740365076218183684\n", "Successfully queried - Tweet ID: 740359016048689152\n", "Successfully queried - Tweet ID: 740214038584557568\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 739979191639244800\n", "Successfully queried - Tweet ID: 739932936087216128\n", "Successfully queried - Tweet ID: 739844404073074688\n", "Successfully queried - Tweet ID: 739623569819336705\n", "Successfully queried - Tweet ID: 739606147276148736\n", "Successfully queried - Tweet ID: 739544079319588864\n", "Successfully queried - Tweet ID: 739485634323156992\n", "Successfully queried - Tweet ID: 739238157791694849\n", "Successfully queried - Tweet ID: 738891149612572673\n", "Successfully queried - Tweet ID: 738885046782832640\n", "Successfully queried - Tweet ID: 738883359779196928\n", "Successfully queried - Tweet ID: 738537504001953792\n", "Successfully queried - Tweet ID: 738402415918125056\n", "Successfully queried - Tweet ID: 738184450748633089\n", "Successfully queried - Tweet ID: 738166403467907072\n", "Successfully queried - Tweet ID: 738156290900254721\n", "Successfully queried - Tweet ID: 737826014890496000\n", "Successfully queried - Tweet ID: 737800304142471168\n", "Successfully queried - Tweet ID: 737678689543020544\n", "Successfully queried - Tweet ID: 737445876994609152\n", "Successfully queried - Tweet ID: 737322739594330112\n", "Successfully queried - Tweet ID: 737310737551491075\n", "Successfully queried - Tweet ID: 736736130620620800\n", "Successfully queried - Tweet ID: 736392552031657984\n", "Successfully queried - Tweet ID: 736365877722001409\n", "Successfully queried - Tweet ID: 736225175608430592\n", "Successfully queried - Tweet ID: 736010884653420544\n", "Successfully queried - Tweet ID: 735991953473572864\n", "Successfully queried - Tweet ID: 735648611367784448\n", "Successfully queried - Tweet ID: 735635087207878657\n", "Successfully queried - Tweet ID: 735274964362878976\n", "Successfully queried - Tweet ID: 735256018284875776\n", "Successfully queried - Tweet ID: 735137028879360001\n", "Successfully queried - Tweet ID: 734912297295085568\n", "Successfully queried - Tweet ID: 734787690684657664\n", "Successfully queried - Tweet ID: 734776360183431168\n", "Successfully queried - Tweet ID: 734559631394082816\n", "Successfully queried - Tweet ID: 733828123016450049\n", "Successfully queried - Tweet ID: 733822306246479872\n", "Successfully queried - Tweet ID: 733482008106668032\n", "Successfully queried - Tweet ID: 733460102733135873\n", "Successfully queried - Tweet ID: 733109485275860992\n", "Successfully queried - Tweet ID: 732732193018155009\n", "Successfully queried - Tweet ID: 732726085725589504\n", "Successfully queried - Tweet ID: 732585889486888962\n", "Successfully queried - Tweet ID: 732375214819057664\n", "Successfully queried - Tweet ID: 732005617171337216\n", "Successfully queried - Tweet ID: 731285275100512256\n", "Successfully queried - Tweet ID: 731156023742988288\n", "Successfully queried - Tweet ID: 730924654643314689\n", "Successfully queried - Tweet ID: 730573383004487680\n", "Successfully queried - Tweet ID: 730427201120833536\n", "Successfully queried - Tweet ID: 730211855403241472\n", "Successfully queried - Tweet ID: 730196704625098752\n", "Successfully queried - Tweet ID: 729854734790754305\n", "Successfully queried - Tweet ID: 729838605770891264\n", "Successfully queried - Tweet ID: 729823566028484608\n", "Successfully queried - Tweet ID: 729463711119904772\n", "Successfully queried - Tweet ID: 729113531270991872\n", "Successfully queried - Tweet ID: 728986383096946689\n", "Successfully queried - Tweet ID: 728760639972315136\n", "Successfully queried - Tweet ID: 728751179681943552\n", "Successfully queried - Tweet ID: 728653952833728512\n", "Successfully queried - Tweet ID: 728409960103686147\n", "Successfully queried - Tweet ID: 728387165835677696\n", "Successfully queried - Tweet ID: 728046963732717569\n", "Successfully queried - Tweet ID: 728035342121635841\n", "Successfully queried - Tweet ID: 728015554473250816\n", "Successfully queried - Tweet ID: 727685679342333952\n", "Successfully queried - Tweet ID: 727644517743104000\n", "Successfully queried - Tweet ID: 727524757080539137\n", "Successfully queried - Tweet ID: 727314416056803329\n", "Successfully queried - Tweet ID: 727286334147182592\n", "Successfully queried - Tweet ID: 727175381690781696\n", "Successfully queried - Tweet ID: 727155742655025152\n", "Successfully queried - Tweet ID: 726935089318363137\n", "Successfully queried - Tweet ID: 726887082820554753\n", "Successfully queried - Tweet ID: 726828223124897792\n", "Successfully queried - Tweet ID: 726224900189511680\n", "Successfully queried - Tweet ID: 725842289046749185\n", "Successfully queried - Tweet ID: 725786712245440512\n", "Successfully queried - Tweet ID: 725729321944506368\n", "Successfully queried - Tweet ID: 725458796924002305\n", "Successfully queried - Tweet ID: 724983749226668032\n", "Successfully queried - Tweet ID: 724771698126512129\n", "Successfully queried - Tweet ID: 724405726123311104\n", "Successfully queried - Tweet ID: 724049859469295616\n", "Successfully queried - Tweet ID: 724046343203856385\n", "Successfully queried - Tweet ID: 724004602748780546\n", "Successfully queried - Tweet ID: 723912936180330496\n", "Successfully queried - Tweet ID: 723688335806480385\n", "Successfully queried - Tweet ID: 723673163800948736\n", "Successfully queried - Tweet ID: 723179728551723008\n", "Successfully queried - Tweet ID: 722974582966214656\n", "Successfully queried - Tweet ID: 722613351520608256\n", "Successfully queried - Tweet ID: 721503162398597120\n", "Successfully queried - Tweet ID: 721001180231503872\n", "Successfully queried - Tweet ID: 720785406564900865\n", "Successfully queried - Tweet ID: 720775346191278080\n", "Successfully queried - Tweet ID: 720415127506415616\n", "Successfully queried - Tweet ID: 720389942216527872\n", "Successfully queried - Tweet ID: 720340705894408192\n", "Successfully queried - Tweet ID: 720059472081784833\n", "Successfully queried - Tweet ID: 720043174954147842\n", "Successfully queried - Tweet ID: 719991154352222208\n", "Successfully queried - Tweet ID: 719704490224398336\n", "Successfully queried - Tweet ID: 719551379208073216\n", "Successfully queried - Tweet ID: 719367763014393856\n", "Successfully queried - Tweet ID: 719339463458033665\n", "Successfully queried - Tweet ID: 719332531645071360\n", "Successfully queried - Tweet ID: 718971898235854848\n", "Successfully queried - Tweet ID: 718939241951195136\n", "Successfully queried - Tweet ID: 718631497683582976\n", "Successfully queried - Tweet ID: 718613305783398402\n", "Successfully queried - Tweet ID: 718540630683709445\n", "Successfully queried - Tweet ID: 718460005985447936\n", "Successfully queried - Tweet ID: 718454725339934721\n", "Successfully queried - Tweet ID: 718246886998687744\n", "Successfully queried - Tweet ID: 718234618122661888\n", "Successfully queried - Tweet ID: 717841801130979328\n", "Successfully queried - Tweet ID: 717790033953034240\n", "Successfully queried - Tweet ID: 717537687239008257\n", "Successfully queried - Tweet ID: 717428917016076293\n", "Successfully queried - Tweet ID: 717421804990701568\n", "Successfully queried - Tweet ID: 717047459982213120\n", "Successfully queried - Tweet ID: 717009362452090881\n", "Successfully queried - Tweet ID: 716802964044845056\n", "Successfully queried - Tweet ID: 716791146589110272\n", "Successfully queried - Tweet ID: 716730379797970944\n", "Successfully queried - Tweet ID: 716447146686459905\n", "Successfully queried - Tweet ID: 716439118184652801\n", "Successfully queried - Tweet ID: 716285507865542656\n", "Successfully queried - Tweet ID: 716080869887381504\n", "Successfully queried - Tweet ID: 715928423106027520\n", "Successfully queried - Tweet ID: 715758151270801409\n", "Successfully queried - Tweet ID: 715733265223708672\n", "Successfully queried - Tweet ID: 715704790270025728\n", "Successfully queried - Tweet ID: 715696743237730304\n", "Successfully queried - Tweet ID: 715680795826982913\n", "Successfully queried - Tweet ID: 715360349751484417\n", "Successfully queried - Tweet ID: 715342466308784130\n", "Successfully queried - Tweet ID: 715220193576927233\n", "Successfully queried - Tweet ID: 715200624753819648\n", "Successfully queried - Tweet ID: 715009755312439296\n", "Successfully queried - Tweet ID: 714982300363173890\n", "Successfully queried - Tweet ID: 714962719905021952\n", "Successfully queried - Tweet ID: 714957620017307648\n", "Successfully queried - Tweet ID: 714631576617938945\n", "Successfully queried - Tweet ID: 714606013974974464\n", "Successfully queried - Tweet ID: 714485234495041536\n", "Successfully queried - Tweet ID: 714258258790387713\n", "Successfully queried - Tweet ID: 714251586676113411\n", "Successfully queried - Tweet ID: 714214115368108032\n", "Successfully queried - Tweet ID: 714141408463036416\n", "Successfully queried - Tweet ID: 713919462244790272\n", "Successfully queried - Tweet ID: 713909862279876608\n", "Successfully queried - Tweet ID: 713900603437621249\n", "Successfully queried - Tweet ID: 713761197720473600\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 713411074226274305\n", "Successfully queried - Tweet ID: 713177543487135744\n", "Successfully queried - Tweet ID: 713175907180089344\n", "Successfully queried - Tweet ID: 712809025985978368\n", "Successfully queried - Tweet ID: 712717840512598017\n", "Successfully queried - Tweet ID: 712668654853337088\n", "Successfully queried - Tweet ID: 712438159032893441\n", "Successfully queried - Tweet ID: 712309440758808576\n", "Successfully queried - Tweet ID: 712097430750289920\n", "Successfully queried - Tweet ID: 712092745624633345\n", "Successfully queried - Tweet ID: 712085617388212225\n", "Successfully queried - Tweet ID: 712065007010385924\n", "Successfully queried - Tweet ID: 711998809858043904\n", "Successfully queried - Tweet ID: 711968124745228288\n", "Successfully queried - Tweet ID: 711743778164514816\n", "Successfully queried - Tweet ID: 711732680602345472\n", "Successfully queried - Tweet ID: 711694788429553666\n", "Successfully queried - Tweet ID: 711652651650457602\n", "Successfully queried - Tweet ID: 711363825979756544\n", "Successfully queried - Tweet ID: 711306686208872448\n", "Successfully queried - Tweet ID: 711008018775851008\n", "Successfully queried - Tweet ID: 710997087345876993\n", "Successfully queried - Tweet ID: 710844581445812225\n", "Successfully queried - Tweet ID: 710833117892898816\n", "Successfully queried - Tweet ID: 710658690886586372\n", "Successfully queried - Tweet ID: 710609963652087808\n", "Successfully queried - Tweet ID: 710588934686908417\n", "Successfully queried - Tweet ID: 710296729921429505\n", "Successfully queried - Tweet ID: 710283270106132480\n", "Successfully queried - Tweet ID: 710272297844797440\n", "Successfully queried - Tweet ID: 710269109699739648\n", "Successfully queried - Tweet ID: 710153181850935296\n", "Successfully queried - Tweet ID: 710140971284037632\n", "Successfully queried - Tweet ID: 710117014656950272\n", "Successfully queried - Tweet ID: 709918798883774466\n", "Successfully queried - Tweet ID: 709901256215666688\n", "Successfully queried - Tweet ID: 709852847387627521\n", "Successfully queried - Tweet ID: 709566166965075968\n", "Successfully queried - Tweet ID: 709556954897764353\n", "Successfully queried - Tweet ID: 709519240576036864\n", "Successfully queried - Tweet ID: 709449600415961088\n", "Successfully queried - Tweet ID: 709409458133323776\n", "Successfully queried - Tweet ID: 709225125749587968\n", "Successfully queried - Tweet ID: 709207347839836162\n", "Successfully queried - Tweet ID: 709198395643068416\n", "Successfully queried - Tweet ID: 709179584944730112\n", "Successfully queried - Tweet ID: 709158332880297985\n", "Successfully queried - Tweet ID: 709042156699303936\n", "Successfully queried - Tweet ID: 708853462201716736\n", "Successfully queried - Tweet ID: 708845821941387268\n", "Successfully queried - Tweet ID: 708834316713893888\n", "Successfully queried - Tweet ID: 708810915978854401\n", "Successfully queried - Tweet ID: 708738143638450176\n", "Successfully queried - Tweet ID: 708711088997666817\n", "Successfully queried - Tweet ID: 708479650088034305\n", "Successfully queried - Tweet ID: 708469915515297792\n", "Successfully queried - Tweet ID: 708400866336894977\n", "Successfully queried - Tweet ID: 708356463048204288\n", "Successfully queried - Tweet ID: 708349470027751425\n", "Successfully queried - Tweet ID: 708149363256774660\n", "Successfully queried - Tweet ID: 708130923141795840\n", "Successfully queried - Tweet ID: 708119489313951744\n", "Successfully queried - Tweet ID: 708109389455101952\n", "Successfully queried - Tweet ID: 708026248782585858\n", "Successfully queried - Tweet ID: 707995814724026368\n", "Successfully queried - Tweet ID: 707983188426153984\n", "Successfully queried - Tweet ID: 707969809498152960\n", "Successfully queried - Tweet ID: 707776935007539200\n", "Successfully queried - Tweet ID: 707741517457260545\n", "Successfully queried - Tweet ID: 707738799544082433\n", "Successfully queried - Tweet ID: 707693576495472641\n", "Successfully queried - Tweet ID: 707629649552134146\n", "Successfully queried - Tweet ID: 707610948723478529\n", "Successfully queried - Tweet ID: 707420581654872064\n", "Successfully queried - Tweet ID: 707411934438625280\n", "Successfully queried - Tweet ID: 707387676719185920\n", "Successfully queried - Tweet ID: 707377100785885184\n", "Successfully queried - Tweet ID: 707315916783140866\n", "Successfully queried - Tweet ID: 707297311098011648\n", "Successfully queried - Tweet ID: 707059547140169728\n", "Successfully queried - Tweet ID: 707038192327901184\n", "Successfully queried - Tweet ID: 707021089608753152\n", "Successfully queried - Tweet ID: 707014260413456384\n", "Successfully queried - Tweet ID: 706904523814649856\n", "Successfully queried - Tweet ID: 706901761596989440\n", "Successfully queried - Tweet ID: 706681918348251136\n", "Successfully queried - Tweet ID: 706644897839910912\n", "Successfully queried - Tweet ID: 706593038911545345\n", "Successfully queried - Tweet ID: 706538006853918722\n", "Successfully queried - Tweet ID: 706516534877929472\n", "Successfully queried - Tweet ID: 706346369204748288\n", "Successfully queried - Tweet ID: 706310011488698368\n", "Successfully queried - Tweet ID: 706291001778950144\n", "Successfully queried - Tweet ID: 706265994973601792\n", "Successfully queried - Tweet ID: 706169069255446529\n", "Successfully queried - Tweet ID: 706166467411222528\n", "Successfully queried - Tweet ID: 706153300320784384\n", "Successfully queried - Tweet ID: 705975130514706432\n", "Successfully queried - Tweet ID: 705970349788291072\n", "Successfully queried - Tweet ID: 705898680587526145\n", "Successfully queried - Tweet ID: 705786532653883392\n", "Successfully queried - Tweet ID: 705591895322394625\n", "Successfully queried - Tweet ID: 705475953783398401\n", "Successfully queried - Tweet ID: 705442520700944385\n", "Successfully queried - Tweet ID: 705428427625635840\n", "Successfully queried - Tweet ID: 705239209544720384\n", "Successfully queried - Tweet ID: 705223444686888960\n", "Successfully queried - Tweet ID: 705102439679201280\n", "Successfully queried - Tweet ID: 705066031337840642\n", "Successfully queried - Tweet ID: 704871453724954624\n", "Successfully queried - Tweet ID: 704859558691414016\n", "Successfully queried - Tweet ID: 704847917308362754\n", "Successfully queried - Tweet ID: 704819833553219584\n", "Successfully queried - Tweet ID: 704761120771465216\n", "Successfully queried - Tweet ID: 704499785726889984\n", "Successfully queried - Tweet ID: 704491224099647488\n", "Successfully queried - Tweet ID: 704480331685040129\n", "Successfully queried - Tweet ID: 704364645503647744\n", "Successfully queried - Tweet ID: 704347321748819968\n", "Successfully queried - Tweet ID: 704134088924532736\n", "Successfully queried - Tweet ID: 704113298707505153\n", "Successfully queried - Tweet ID: 704054845121142784\n", "Successfully queried - Tweet ID: 703774238772166656\n", "Successfully queried - Tweet ID: 703769065844768768\n", "Successfully queried - Tweet ID: 703631701117943808\n", "Successfully queried - Tweet ID: 703611486317502464\n", "Successfully queried - Tweet ID: 703425003149250560\n", "Successfully queried - Tweet ID: 703407252292673536\n", "Successfully queried - Tweet ID: 703382836347330562\n", "Successfully queried - Tweet ID: 703356393781329922\n", "Successfully queried - Tweet ID: 703268521220972544\n", "Successfully queried - Tweet ID: 703079050210877440\n", "Successfully queried - Tweet ID: 703041949650034688\n", "Successfully queried - Tweet ID: 702932127499816960\n", "Successfully queried - Tweet ID: 702899151802126337\n", "Successfully queried - Tweet ID: 702684942141153280\n", "Successfully queried - Tweet ID: 702671118226825216\n", "Successfully queried - Tweet ID: 702598099714314240\n", "Successfully queried - Tweet ID: 702539513671897089\n", "Successfully queried - Tweet ID: 702332542343577600\n", "Successfully queried - Tweet ID: 702321140488925184\n", "Successfully queried - Tweet ID: 702276748847800320\n", "Successfully queried - Tweet ID: 702217446468493312\n", "Successfully queried - Tweet ID: 701981390485725185\n", "Successfully queried - Tweet ID: 701952816642965504\n", "Successfully queried - Tweet ID: 701889187134500865\n", "Successfully queried - Tweet ID: 701805642395348998\n", "Successfully queried - Tweet ID: 701601587219795968\n", "Successfully queried - Tweet ID: 701570477911896070\n", "Successfully queried - Tweet ID: 701545186879471618\n", "Successfully queried - Tweet ID: 701214700881756160\n", "Successfully queried - Tweet ID: 700890391244103680\n", "Successfully queried - Tweet ID: 700864154249383937\n", "Successfully queried - Tweet ID: 700847567345688576\n", "Successfully queried - Tweet ID: 700796979434098688\n", "Successfully queried - Tweet ID: 700747788515020802\n", "Successfully queried - Tweet ID: 700518061187723268\n", "Successfully queried - Tweet ID: 700505138482569216\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 700462010979500032\n", "Successfully queried - Tweet ID: 700167517596164096\n", "Successfully queried - Tweet ID: 700151421916807169\n", "Successfully queried - Tweet ID: 700143752053182464\n", "Successfully queried - Tweet ID: 700062718104104960\n", "Successfully queried - Tweet ID: 700029284593901568\n", "Successfully queried - Tweet ID: 700002074055016451\n", "Successfully queried - Tweet ID: 699801817392291840\n", "Successfully queried - Tweet ID: 699788877217865730\n", "Successfully queried - Tweet ID: 699779630832685056\n", "Successfully queried - Tweet ID: 699775878809702401\n", "Successfully queried - Tweet ID: 699691744225525762\n", "Successfully queried - Tweet ID: 699446877801091073\n", "Successfully queried - Tweet ID: 699434518667751424\n", "Successfully queried - Tweet ID: 699423671849451520\n", "Successfully queried - Tweet ID: 699413908797464576\n", "Successfully queried - Tweet ID: 699370870310113280\n", "Successfully queried - Tweet ID: 699323444782047232\n", "Successfully queried - Tweet ID: 699088579889332224\n", "Successfully queried - Tweet ID: 699079609774645248\n", "Successfully queried - Tweet ID: 699072405256409088\n", "Successfully queried - Tweet ID: 699060279947165696\n", "Successfully queried - Tweet ID: 699036661657767936\n", "Successfully queried - Tweet ID: 698989035503689728\n", "Successfully queried - Tweet ID: 698953797952008193\n", "Successfully queried - Tweet ID: 698907974262222848\n", "Successfully queried - Tweet ID: 698710712454139905\n", "Successfully queried - Tweet ID: 698703483621523456\n", "Successfully queried - Tweet ID: 698635131305795584\n", "Successfully queried - Tweet ID: 698549713696649216\n", "Successfully queried - Tweet ID: 698355670425473025\n", "Successfully queried - Tweet ID: 698342080612007937\n", "Successfully queried - Tweet ID: 698262614669991936\n", "Successfully queried - Tweet ID: 698195409219559425\n", "Successfully queried - Tweet ID: 698178924120031232\n", "Successfully queried - Tweet ID: 697995514407682048\n", "Successfully queried - Tweet ID: 697990423684476929\n", "Successfully queried - Tweet ID: 697943111201378304\n", "Successfully queried - Tweet ID: 697881462549430272\n", "Successfully queried - Tweet ID: 697630435728322560\n", "Successfully queried - Tweet ID: 697616773278015490\n", "Successfully queried - Tweet ID: 697596423848730625\n", "Successfully queried - Tweet ID: 697575480820686848\n", "Successfully queried - Tweet ID: 697516214579523584\n", "Successfully queried - Tweet ID: 697482927769255936\n", "Successfully queried - Tweet ID: 697463031882764288\n", "Successfully queried - Tweet ID: 697270446429966336\n", "Successfully queried - Tweet ID: 697259378236399616\n", "Successfully queried - Tweet ID: 697255105972801536\n", "Successfully queried - Tweet ID: 697242256848379904\n", "Successfully queried - Tweet ID: 696900204696625153\n", "Successfully queried - Tweet ID: 696894894812565505\n", "Successfully queried - Tweet ID: 696886256886657024\n", "Successfully queried - Tweet ID: 696877980375769088\n", "Successfully queried - Tweet ID: 696754882863349760\n", "Successfully queried - Tweet ID: 696744641916489729\n", "Successfully queried - Tweet ID: 696713835009417216\n", "Successfully queried - Tweet ID: 696518437233913856\n", "Successfully queried - Tweet ID: 696490539101908992\n", "Successfully queried - Tweet ID: 696488710901260288\n", "Successfully queried - Tweet ID: 696405997980676096\n", "Successfully queried - Tweet ID: 696100768806522880\n", "Successfully queried - Tweet ID: 695816827381944320\n", "Successfully queried - Tweet ID: 695794761660297217\n", "Successfully queried - Tweet ID: 695767669421768709\n", "Successfully queried - Tweet ID: 695629776980148225\n", "Successfully queried - Tweet ID: 695446424020918272\n", "Successfully queried - Tweet ID: 695409464418041856\n", "Successfully queried - Tweet ID: 695314793360662529\n", "Successfully queried - Tweet ID: 695095422348574720\n", "Successfully queried - Tweet ID: 695074328191332352\n", "Successfully queried - Tweet ID: 695064344191721472\n", "Successfully queried - Tweet ID: 695051054296211456\n", "Successfully queried - Tweet ID: 694925794720792577\n", "Successfully queried - Tweet ID: 694905863685980160\n", "Successfully queried - Tweet ID: 694669722378485760\n", "Successfully queried - Tweet ID: 694356675654983680\n", "Successfully queried - Tweet ID: 694352839993344000\n", "Successfully queried - Tweet ID: 694342028726001664\n", "Successfully queried - Tweet ID: 694329668942569472\n", "Successfully queried - Tweet ID: 694206574471057408\n", "Successfully queried - Tweet ID: 694183373896572928\n", "Successfully queried - Tweet ID: 694001791655137281\n", "Successfully queried - Tweet ID: 693993230313091072\n", "Successfully queried - Tweet ID: 693942351086120961\n", "Successfully queried - Tweet ID: 693647888581312512\n", "Successfully queried - Tweet ID: 693644216740769793\n", "Successfully queried - Tweet ID: 693642232151285760\n", "Successfully queried - Tweet ID: 693629975228977152\n", "Successfully queried - Tweet ID: 693622659251335168\n", "Successfully queried - Tweet ID: 693590843962331137\n", "Successfully queried - Tweet ID: 693582294167244802\n", "Successfully queried - Tweet ID: 693486665285931008\n", "Successfully queried - Tweet ID: 693280720173801472\n", "Successfully queried - Tweet ID: 693267061318012928\n", "Successfully queried - Tweet ID: 693262851218264065\n", "Successfully queried - Tweet ID: 693231807727280129\n", "Successfully queried - Tweet ID: 693155686491000832\n", "Successfully queried - Tweet ID: 693109034023534592\n", "Successfully queried - Tweet ID: 693095443459342336\n", "Successfully queried - Tweet ID: 692919143163629568\n", "Successfully queried - Tweet ID: 692905862751522816\n", "Successfully queried - Tweet ID: 692901601640583168\n", "Successfully queried - Tweet ID: 692894228850999298\n", "Successfully queried - Tweet ID: 692828166163931137\n", "Successfully queried - Tweet ID: 692752401762250755\n", "Successfully queried - Tweet ID: 692568918515392513\n", "Successfully queried - Tweet ID: 692535307825213440\n", "Successfully queried - Tweet ID: 692530551048294401\n", "Successfully queried - Tweet ID: 692423280028966913\n", "Successfully queried - Tweet ID: 692417313023332352\n", "Successfully queried - Tweet ID: 692187005137076224\n", "Successfully queried - Tweet ID: 692158366030913536\n", "Successfully queried - Tweet ID: 692142790915014657\n", "Successfully queried - Tweet ID: 692041934689402880\n", "Successfully queried - Tweet ID: 692017291282812928\n", "Successfully queried - Tweet ID: 691820333922455552\n", "Successfully queried - Tweet ID: 691793053716221953\n", "Successfully queried - Tweet ID: 691756958957883396\n", "Successfully queried - Tweet ID: 691675652215414786\n", "Successfully queried - Tweet ID: 691483041324204033\n", "Successfully queried - Tweet ID: 691459709405118465\n", "Successfully queried - Tweet ID: 691444869282295808\n", "Successfully queried - Tweet ID: 691416866452082688\n", "Successfully queried - Tweet ID: 691321916024623104\n", "Successfully queried - Tweet ID: 691096613310316544\n", "Successfully queried - Tweet ID: 691090071332753408\n", "Successfully queried - Tweet ID: 690989312272396288\n", "Successfully queried - Tweet ID: 690959652130045952\n", "Successfully queried - Tweet ID: 690938899477221376\n", "Successfully queried - Tweet ID: 690932576555528194\n", "Successfully queried - Tweet ID: 690735892932222976\n", "Successfully queried - Tweet ID: 690728923253055490\n", "Successfully queried - Tweet ID: 690690673629138944\n", "Successfully queried - Tweet ID: 690649993829576704\n", "Successfully queried - Tweet ID: 690607260360429569\n", "Successfully queried - Tweet ID: 690597161306841088\n", "Successfully queried - Tweet ID: 690400367696297985\n", "Successfully queried - Tweet ID: 690374419777196032\n", "Successfully queried - Tweet ID: 690360449368465409\n", "Successfully queried - Tweet ID: 690348396616552449\n", "Successfully queried - Tweet ID: 690248561355657216\n", "Successfully queried - Tweet ID: 690021994562220032\n", "Successfully queried - Tweet ID: 690015576308211712\n", "Successfully queried - Tweet ID: 690005060500217858\n", "Successfully queried - Tweet ID: 689999384604450816\n", "Successfully queried - Tweet ID: 689993469801164801\n", "Successfully queried - Tweet ID: 689977555533848577\n", "Successfully queried - Tweet ID: 689905486972461056\n", "Successfully queried - Tweet ID: 689877686181715968\n", "Successfully queried - Tweet ID: 689835978131935233\n", "Successfully queried - Tweet ID: 689661964914655233\n", "Successfully queried - Tweet ID: 689659372465688576\n", "Successfully queried - Tweet ID: 689623661272240129\n", "Successfully queried - Tweet ID: 689599056876867584\n", "Successfully queried - Tweet ID: 689557536375177216\n", "Successfully queried - Tweet ID: 689517482558820352\n", "Successfully queried - Tweet ID: 689289219123089408\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 689283819090870273\n", "Successfully queried - Tweet ID: 689280876073582592\n", "Successfully queried - Tweet ID: 689275259254616065\n", "Successfully queried - Tweet ID: 689255633275777024\n", "Successfully queried - Tweet ID: 689154315265683456\n", "Successfully queried - Tweet ID: 689143371370250240\n", "Successfully queried - Tweet ID: 688916208532455424\n", "Successfully queried - Tweet ID: 688908934925697024\n", "Successfully queried - Tweet ID: 688898160958271489\n", "Successfully queried - Tweet ID: 688894073864884227\n", "Successfully queried - Tweet ID: 688828561667567616\n", "Successfully queried - Tweet ID: 688804835492233216\n", "Successfully queried - Tweet ID: 688789766343622656\n", "Successfully queried - Tweet ID: 688547210804498433\n", "Successfully queried - Tweet ID: 688519176466644993\n", "Successfully queried - Tweet ID: 688385280030670848\n", "Successfully queried - Tweet ID: 688211956440801280\n", "Successfully queried - Tweet ID: 688179443353796608\n", "Successfully queried - Tweet ID: 688116655151435777\n", "Successfully queried - Tweet ID: 688064179421470721\n", "Successfully queried - Tweet ID: 687841446767013888\n", "Successfully queried - Tweet ID: 687826841265172480\n", "Successfully queried - Tweet ID: 687818504314159109\n", "Successfully queried - Tweet ID: 687807801670897665\n", "Successfully queried - Tweet ID: 687732144991551489\n", "Successfully queried - Tweet ID: 687704180304273409\n", "Successfully queried - Tweet ID: 687664829264453632\n", "Successfully queried - Tweet ID: 687494652870668288\n", "Successfully queried - Tweet ID: 687480748861947905\n", "Successfully queried - Tweet ID: 687476254459715584\n", "Successfully queried - Tweet ID: 687460506001633280\n", "Successfully queried - Tweet ID: 687399393394311168\n", "Successfully queried - Tweet ID: 687317306314240000\n", "Successfully queried - Tweet ID: 687312378585812992\n", "Successfully queried - Tweet ID: 687127927494963200\n", "Successfully queried - Tweet ID: 687124485711986689\n", "Successfully queried - Tweet ID: 687109925361856513\n", "Successfully queried - Tweet ID: 687102708889812993\n", "Successfully queried - Tweet ID: 687096057537363968\n", "Successfully queried - Tweet ID: 686947101016735744\n", "Successfully queried - Tweet ID: 686760001961103360\n", "Successfully queried - Tweet ID: 686749460672679938\n", "Successfully queried - Tweet ID: 686730991906516992\n", "Successfully queried - Tweet ID: 686683045143953408\n", "Successfully queried - Tweet ID: 686618349602762752\n", "Successfully queried - Tweet ID: 686606069955735556\n", "Successfully queried - Tweet ID: 686394059078897668\n", "Successfully queried - Tweet ID: 686386521809772549\n", "Successfully queried - Tweet ID: 686377065986265092\n", "Successfully queried - Tweet ID: 686358356425093120\n", "Successfully queried - Tweet ID: 686286779679375361\n", "Successfully queried - Tweet ID: 686050296934563840\n", "Successfully queried - Tweet ID: 686035780142297088\n", "Successfully queried - Tweet ID: 686034024800862208\n", "Successfully queried - Tweet ID: 686007916130873345\n", "Successfully queried - Tweet ID: 686003207160610816\n", "Successfully queried - Tweet ID: 685973236358713344\n", "Successfully queried - Tweet ID: 685943807276412928\n", "Successfully queried - Tweet ID: 685906723014619143\n", "Successfully queried - Tweet ID: 685681090388975616\n", "Successfully queried - Tweet ID: 685667379192414208\n", "Successfully queried - Tweet ID: 685663452032069632\n", "Successfully queried - Tweet ID: 685641971164143616\n", "Successfully queried - Tweet ID: 685547936038666240\n", "Successfully queried - Tweet ID: 685532292383666176\n", "Successfully queried - Tweet ID: 685325112850124800\n", "Successfully queried - Tweet ID: 685321586178670592\n", "Successfully queried - Tweet ID: 685315239903100929\n", "Successfully queried - Tweet ID: 685307451701334016\n", "Successfully queried - Tweet ID: 685268753634967552\n", "Successfully queried - Tweet ID: 685198997565345792\n", "Successfully queried - Tweet ID: 685169283572338688\n", "Successfully queried - Tweet ID: 684969860808454144\n", "Successfully queried - Tweet ID: 684959798585110529\n", "Successfully queried - Tweet ID: 684940049151070208\n", "Successfully queried - Tweet ID: 684926975086034944\n", "Successfully queried - Tweet ID: 684914660081053696\n", "Successfully queried - Tweet ID: 684902183876321280\n", "Successfully queried - Tweet ID: 684880619965411328\n", "Successfully queried - Tweet ID: 684830982659280897\n", "Successfully queried - Tweet ID: 684800227459624960\n", "Successfully queried - Tweet ID: 684594889858887680\n", "Successfully queried - Tweet ID: 684588130326986752\n", "Successfully queried - Tweet ID: 684567543613382656\n", "Successfully queried - Tweet ID: 684538444857667585\n", "Successfully queried - Tweet ID: 684481074559381504\n", "Successfully queried - Tweet ID: 684460069371654144\n", "Successfully queried - Tweet ID: 684241637099323392\n", "Successfully queried - Tweet ID: 684225744407494656\n", "Successfully queried - Tweet ID: 684222868335505415\n", "Successfully queried - Tweet ID: 684200372118904832\n", "Successfully queried - Tweet ID: 684195085588783105\n", "Successfully queried - Tweet ID: 684188786104872960\n", "Successfully queried - Tweet ID: 684177701129875456\n", "Successfully queried - Tweet ID: 684147889187209216\n", "Successfully queried - Tweet ID: 684122891630342144\n", "Successfully queried - Tweet ID: 684097758874210310\n", "Successfully queried - Tweet ID: 683857920510050305\n", "Successfully queried - Tweet ID: 683852578183077888\n", "Successfully queried - Tweet ID: 683849932751646720\n", "Successfully queried - Tweet ID: 683834909291606017\n", "Successfully queried - Tweet ID: 683828599284170753\n", "Successfully queried - Tweet ID: 683773439333797890\n", "Successfully queried - Tweet ID: 683742671509258241\n", "Successfully queried - Tweet ID: 683515932363329536\n", "Successfully queried - Tweet ID: 683498322573824003\n", "Successfully queried - Tweet ID: 683481228088049664\n", "Successfully queried - Tweet ID: 683462770029932544\n", "Successfully queried - Tweet ID: 683449695444799489\n", "Successfully queried - Tweet ID: 683391852557561860\n", "Successfully queried - Tweet ID: 683357973142474752\n", "Successfully queried - Tweet ID: 683142553609318400\n", "Successfully queried - Tweet ID: 683111407806746624\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Rate limit reached. Sleeping for: 571\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 683098815881154561\n", "Successfully queried - Tweet ID: 683078886620553216\n", "Successfully queried - Tweet ID: 683030066213818368\n", "Successfully queried - Tweet ID: 682962037429899265\n", "Successfully queried - Tweet ID: 682808988178739200\n", "Successfully queried - Tweet ID: 682788441537560576\n", "Successfully queried - Tweet ID: 682750546109968385\n", "Successfully queried - Tweet ID: 682697186228989953\n", "Successfully queried - Tweet ID: 682662431982772225\n", "Successfully queried - Tweet ID: 682638830361513985\n", "Successfully queried - Tweet ID: 682429480204398592\n", "Successfully queried - Tweet ID: 682406705142087680\n", "Successfully queried - Tweet ID: 682393905736888321\n", "Successfully queried - Tweet ID: 682389078323662849\n", "Successfully queried - Tweet ID: 682303737705140231\n", "Successfully queried - Tweet ID: 682259524040966145\n", "Successfully queried - Tweet ID: 682242692827447297\n", "Successfully queried - Tweet ID: 682088079302213632\n", "Successfully queried - Tweet ID: 682059653698686977\n", "Successfully queried - Tweet ID: 682047327939461121\n", "Successfully queried - Tweet ID: 682032003584274432\n", "Successfully queried - Tweet ID: 682003177596559360\n", "Successfully queried - Tweet ID: 681981167097122816\n", "Successfully queried - Tweet ID: 681891461017812993\n", "Successfully queried - Tweet ID: 681694085539872773\n", "Successfully queried - Tweet ID: 681679526984871937\n", "Successfully queried - Tweet ID: 681654059175129088\n", "Successfully queried - Tweet ID: 681610798867845120\n", "Successfully queried - Tweet ID: 681579835668455424\n", "Successfully queried - Tweet ID: 681523177663676416\n", "Successfully queried - Tweet ID: 681340665377193984\n", "Successfully queried - Tweet ID: 681339448655802368\n", "Successfully queried - Tweet ID: 681320187870711809\n", "Successfully queried - Tweet ID: 681302363064414209\n", "Successfully queried - Tweet ID: 681297372102656000\n", "Successfully queried - Tweet ID: 681281657291280384\n", "Successfully queried - Tweet ID: 681261549936340994\n", "Successfully queried - Tweet ID: 681242418453299201\n", "Successfully queried - Tweet ID: 681231109724700672\n", "Successfully queried - Tweet ID: 681193455364796417\n", "Successfully queried - Tweet ID: 680970795137544192\n", "Successfully queried - Tweet ID: 680959110691590145\n", "Successfully queried - Tweet ID: 680940246314430465\n", "Successfully queried - Tweet ID: 680934982542561280\n", "Successfully queried - Tweet ID: 680913438424612864\n", "Successfully queried - Tweet ID: 680889648562991104\n", "Successfully queried - Tweet ID: 680836378243002368\n", "Successfully queried - Tweet ID: 680805554198020098\n", "Successfully queried - Tweet ID: 680801747103793152\n", "Successfully queried - Tweet ID: 680798457301471234\n", "Successfully queried - Tweet ID: 680609293079592961\n", "Successfully queried - Tweet ID: 680583894916304897\n", "Successfully queried - Tweet ID: 680497766108381184\n", "Successfully queried - Tweet ID: 680494726643068929\n", "Successfully queried - Tweet ID: 680473011644985345\n", "Successfully queried - Tweet ID: 680440374763077632\n", "Successfully queried - Tweet ID: 680221482581123072\n", "Successfully queried - Tweet ID: 680206703334408192\n", "Successfully queried - Tweet ID: 680191257256136705\n", "Successfully queried - Tweet ID: 680176173301628928\n", "Successfully queried - Tweet ID: 680161097740095489\n", "Successfully queried - Tweet ID: 680145970311643136\n", "Successfully queried - Tweet ID: 680130881361686529\n", "Successfully queried - Tweet ID: 680115823365742593\n", "Successfully queried - Tweet ID: 680100725817409536\n", "Successfully queried - Tweet ID: 680085611152338944\n", "Successfully queried - Tweet ID: 680070545539371008\n", "Error Tweet_Id: 680055455951884288 : [{'code': 144, 'message': 'No status found with that ID.'}]\n", "Successfully queried - Tweet ID: 680055455951884288\n", "Successfully queried - Tweet ID: 679877062409191424\n", "Successfully queried - Tweet ID: 679872969355714560\n", "Successfully queried - Tweet ID: 679862121895714818\n", "Successfully queried - Tweet ID: 679854723806179328\n", "Successfully queried - Tweet ID: 679844490799091713\n", "Successfully queried - Tweet ID: 679828447187857408\n", "Successfully queried - Tweet ID: 679777920601223168\n", "Successfully queried - Tweet ID: 679736210798047232\n", "Successfully queried - Tweet ID: 679729593985699840\n", "Successfully queried - Tweet ID: 679722016581222400\n", "Successfully queried - Tweet ID: 679530280114372609\n", "Successfully queried - Tweet ID: 679527802031484928\n", "Successfully queried - Tweet ID: 679511351870550016\n", "Successfully queried - Tweet ID: 679503373272485890\n", "Successfully queried - Tweet ID: 679475951516934144\n", "Successfully queried - Tweet ID: 679462823135686656\n", "Successfully queried - Tweet ID: 679405845277462528\n", "Successfully queried - Tweet ID: 679158373988876288\n", "Successfully queried - Tweet ID: 679148763231985668\n", "Successfully queried - Tweet ID: 679132435750195208\n", "Successfully queried - Tweet ID: 679111216690831360\n", "Successfully queried - Tweet ID: 679062614270468097\n", "Successfully queried - Tweet ID: 679047485189439488\n", "Successfully queried - Tweet ID: 679001094530465792\n", "Successfully queried - Tweet ID: 678991772295516161\n", "Successfully queried - Tweet ID: 678969228704284672\n", "Successfully queried - Tweet ID: 678800283649069056\n", "Successfully queried - Tweet ID: 678798276842360832\n", "Successfully queried - Tweet ID: 678774928607469569\n", "Successfully queried - Tweet ID: 678767140346941444\n", "Successfully queried - Tweet ID: 678764513869611008\n", "Successfully queried - Tweet ID: 678755239630127104\n", "Successfully queried - Tweet ID: 678740035362037760\n", "Successfully queried - Tweet ID: 678708137298427904\n", "Successfully queried - Tweet ID: 678675843183484930\n", "Successfully queried - Tweet ID: 678643457146150913\n", "Successfully queried - Tweet ID: 678446151570427904\n", "Successfully queried - Tweet ID: 678424312106393600\n", "Successfully queried - Tweet ID: 678410210315247616\n", "Successfully queried - Tweet ID: 678399652199309312\n", "Successfully queried - Tweet ID: 678396796259975168\n", "Successfully queried - Tweet ID: 678389028614488064\n", "Successfully queried - Tweet ID: 678380236862578688\n", "Successfully queried - Tweet ID: 678341075375947776\n", "Successfully queried - Tweet ID: 678334497360859136\n", "Successfully queried - Tweet ID: 678278586130948096\n", "Successfully queried - Tweet ID: 678255464182861824\n", "Successfully queried - Tweet ID: 678023323247357953\n", "Successfully queried - Tweet ID: 678021115718029313\n", "Successfully queried - Tweet ID: 677961670166224897\n", "Successfully queried - Tweet ID: 677918531514703872\n", "Successfully queried - Tweet ID: 677895101218201600\n", "Successfully queried - Tweet ID: 677716515794329600\n", "Successfully queried - Tweet ID: 677700003327029250\n", "Successfully queried - Tweet ID: 677698403548192770\n", "Successfully queried - Tweet ID: 677687604918272002\n", "Successfully queried - Tweet ID: 677673981332312066\n", "Successfully queried - Tweet ID: 677662372920729601\n", "Successfully queried - Tweet ID: 677644091929329666\n", "Successfully queried - Tweet ID: 677573743309385728\n", "Successfully queried - Tweet ID: 677565715327688705\n", "Successfully queried - Tweet ID: 677557565589463040\n", "Successfully queried - Tweet ID: 677547928504967168\n", "Successfully queried - Tweet ID: 677530072887205888\n", "Successfully queried - Tweet ID: 677335745548390400\n", "Successfully queried - Tweet ID: 677334615166730240\n", "Successfully queried - Tweet ID: 677331501395156992\n", "Successfully queried - Tweet ID: 677328882937298944\n", "Successfully queried - Tweet ID: 677314812125323265\n", "Successfully queried - Tweet ID: 677301033169788928\n", "Successfully queried - Tweet ID: 677269281705472000\n", "Successfully queried - Tweet ID: 677228873407442944\n", "Successfully queried - Tweet ID: 677187300187611136\n", "Successfully queried - Tweet ID: 676975532580409345\n", "Successfully queried - Tweet ID: 676957860086095872\n", "Successfully queried - Tweet ID: 676949632774234114\n", "Successfully queried - Tweet ID: 676948236477857792\n", "Successfully queried - Tweet ID: 676946864479084545\n", "Successfully queried - Tweet ID: 676942428000112642\n", "Successfully queried - Tweet ID: 676936541936185344\n", "Successfully queried - Tweet ID: 676916996760600576\n", "Successfully queried - Tweet ID: 676897532954456065\n", "Successfully queried - Tweet ID: 676864501615042560\n", "Successfully queried - Tweet ID: 676821958043033607\n", "Successfully queried - Tweet ID: 676819651066732545\n", "Successfully queried - Tweet ID: 676811746707918848\n", "Successfully queried - Tweet ID: 676776431406465024\n", "Successfully queried - Tweet ID: 676617503762681856\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 676613908052996102\n", "Successfully queried - Tweet ID: 676606785097199616\n", "Successfully queried - Tweet ID: 676603393314578432\n", "Successfully queried - Tweet ID: 676593408224403456\n", "Successfully queried - Tweet ID: 676590572941893632\n", "Successfully queried - Tweet ID: 676588346097852417\n", "Successfully queried - Tweet ID: 676582956622721024\n", "Successfully queried - Tweet ID: 676575501977128964\n", "Successfully queried - Tweet ID: 676533798876651520\n", "Successfully queried - Tweet ID: 676496375194980353\n", "Successfully queried - Tweet ID: 676470639084101634\n", "Successfully queried - Tweet ID: 676440007570247681\n", "Successfully queried - Tweet ID: 676430933382295552\n", "Successfully queried - Tweet ID: 676263575653122048\n", "Successfully queried - Tweet ID: 676237365392908289\n", "Successfully queried - Tweet ID: 676219687039057920\n", "Successfully queried - Tweet ID: 676215927814406144\n", "Successfully queried - Tweet ID: 676191832485810177\n", "Successfully queried - Tweet ID: 676146341966438401\n", "Successfully queried - Tweet ID: 676121918416756736\n", "Successfully queried - Tweet ID: 676101918813499392\n", "Successfully queried - Tweet ID: 676098748976615425\n", "Successfully queried - Tweet ID: 676089483918516224\n", "Successfully queried - Tweet ID: 675898130735476737\n", "Successfully queried - Tweet ID: 675891555769696257\n", "Successfully queried - Tweet ID: 675888385639251968\n", "Successfully queried - Tweet ID: 675878199931371520\n", "Successfully queried - Tweet ID: 675870721063669760\n", "Successfully queried - Tweet ID: 675853064436391936\n", "Successfully queried - Tweet ID: 675849018447167488\n", "Successfully queried - Tweet ID: 675845657354215424\n", "Successfully queried - Tweet ID: 675822767435051008\n", "Successfully queried - Tweet ID: 675820929667219457\n", "Successfully queried - Tweet ID: 675798442703122432\n", "Successfully queried - Tweet ID: 675781562965868544\n", "Successfully queried - Tweet ID: 675740360753160193\n", "Successfully queried - Tweet ID: 675710890956750848\n", "Successfully queried - Tweet ID: 675707330206547968\n", "Successfully queried - Tweet ID: 675706639471788032\n", "Successfully queried - Tweet ID: 675534494439489536\n", "Successfully queried - Tweet ID: 675531475945709568\n", "Successfully queried - Tweet ID: 675522403582218240\n", "Successfully queried - Tweet ID: 675517828909424640\n", "Successfully queried - Tweet ID: 675501075957489664\n", "Successfully queried - Tweet ID: 675497103322386432\n", "Successfully queried - Tweet ID: 675489971617296384\n", "Successfully queried - Tweet ID: 675483430902214656\n", "Successfully queried - Tweet ID: 675432746517426176\n", "Successfully queried - Tweet ID: 675372240448454658\n", "Successfully queried - Tweet ID: 675362609739206656\n", "Successfully queried - Tweet ID: 675354435921575936\n", "Successfully queried - Tweet ID: 675349384339542016\n", "Successfully queried - Tweet ID: 675334060156301312\n", "Successfully queried - Tweet ID: 675166823650848770\n", "Successfully queried - Tweet ID: 675153376133427200\n", "Successfully queried - Tweet ID: 675149409102012420\n", "Successfully queried - Tweet ID: 675147105808306176\n", "Successfully queried - Tweet ID: 675146535592706048\n", "Successfully queried - Tweet ID: 675145476954566656\n", "Successfully queried - Tweet ID: 675135153782571009\n", "Successfully queried - Tweet ID: 675113801096802304\n", "Successfully queried - Tweet ID: 675111688094527488\n", "Successfully queried - Tweet ID: 675109292475830276\n", "Successfully queried - Tweet ID: 675047298674663426\n", "Successfully queried - Tweet ID: 675015141583413248\n", "Successfully queried - Tweet ID: 675006312288268288\n", "Successfully queried - Tweet ID: 675003128568291329\n", "Successfully queried - Tweet ID: 674999807681908736\n", "Successfully queried - Tweet ID: 674805413498527744\n", "Successfully queried - Tweet ID: 674800520222154752\n", "Successfully queried - Tweet ID: 674793399141146624\n", "Successfully queried - Tweet ID: 674790488185167872\n", "Successfully queried - Tweet ID: 674788554665512960\n", "Successfully queried - Tweet ID: 674781762103414784\n", "Successfully queried - Tweet ID: 674774481756377088\n", "Successfully queried - Tweet ID: 674767892831932416\n", "Successfully queried - Tweet ID: 674764817387900928\n", "Successfully queried - Tweet ID: 674754018082705410\n", "Successfully queried - Tweet ID: 674752233200820224\n", "Successfully queried - Tweet ID: 674743008475090944\n", "Successfully queried - Tweet ID: 674742531037511680\n", "Successfully queried - Tweet ID: 674739953134403584\n", "Successfully queried - Tweet ID: 674737130913071104\n", "Successfully queried - Tweet ID: 674690135443775488\n", "Successfully queried - Tweet ID: 674670581682434048\n", "Successfully queried - Tweet ID: 674664755118911488\n", "Successfully queried - Tweet ID: 674646392044941312\n", "Successfully queried - Tweet ID: 674644256330530816\n", "Successfully queried - Tweet ID: 674638615994089473\n", "Successfully queried - Tweet ID: 674632714662858753\n", "Successfully queried - Tweet ID: 674606911342424069\n", "Successfully queried - Tweet ID: 674468880899788800\n", "Successfully queried - Tweet ID: 674447403907457024\n", "Successfully queried - Tweet ID: 674436901579923456\n", "Successfully queried - Tweet ID: 674422304705744896\n", "Successfully queried - Tweet ID: 674416750885273600\n", "Successfully queried - Tweet ID: 674410619106390016\n", "Successfully queried - Tweet ID: 674394782723014656\n", "Successfully queried - Tweet ID: 674372068062928900\n", "Successfully queried - Tweet ID: 674330906434379776\n", "Successfully queried - Tweet ID: 674318007229923329\n", "Successfully queried - Tweet ID: 674307341513269249\n", "Successfully queried - Tweet ID: 674291837063053312\n", "Successfully queried - Tweet ID: 674271431610523648\n", "Successfully queried - Tweet ID: 674269164442398721\n", "Successfully queried - Tweet ID: 674265582246694913\n", "Successfully queried - Tweet ID: 674262580978937856\n", "Successfully queried - Tweet ID: 674255168825880576\n", "Successfully queried - Tweet ID: 674082852460433408\n", "Successfully queried - Tweet ID: 674075285688614912\n", "Successfully queried - Tweet ID: 674063288070742018\n", "Successfully queried - Tweet ID: 674053186244734976\n", "Successfully queried - Tweet ID: 674051556661161984\n", "Successfully queried - Tweet ID: 674045139690631169\n", "Successfully queried - Tweet ID: 674042553264685056\n", "Successfully queried - Tweet ID: 674038233588723717\n", "Successfully queried - Tweet ID: 674036086168010753\n", "Successfully queried - Tweet ID: 674024893172875264\n", "Successfully queried - Tweet ID: 674019345211760640\n", "Successfully queried - Tweet ID: 674014384960745472\n", "Successfully queried - Tweet ID: 674008982932058114\n", "Successfully queried - Tweet ID: 673956914389192708\n", "Successfully queried - Tweet ID: 673919437611909120\n", "Successfully queried - Tweet ID: 673906403526995968\n", "Successfully queried - Tweet ID: 673887867907739649\n", "Successfully queried - Tweet ID: 673716320723169284\n", "Successfully queried - Tweet ID: 673715861853720576\n", "Successfully queried - Tweet ID: 673711475735838725\n", "Successfully queried - Tweet ID: 673709992831262724\n", "Successfully queried - Tweet ID: 673708611235921920\n", "Successfully queried - Tweet ID: 673707060090052608\n", "Successfully queried - Tweet ID: 673705679337693185\n", "Successfully queried - Tweet ID: 673700254269775872\n", "Successfully queried - Tweet ID: 673697980713705472\n", "Successfully queried - Tweet ID: 673689733134946305\n", "Successfully queried - Tweet ID: 673688752737402881\n", "Successfully queried - Tweet ID: 673686845050527744\n", "Successfully queried - Tweet ID: 673680198160809984\n", "Successfully queried - Tweet ID: 673662677122719744\n", "Successfully queried - Tweet ID: 673656262056419329\n", "Successfully queried - Tweet ID: 673636718965334016\n", "Successfully queried - Tweet ID: 673612854080196609\n", "Successfully queried - Tweet ID: 673583129559498752\n", "Successfully queried - Tweet ID: 673580926094458881\n", "Successfully queried - Tweet ID: 673576835670777856\n", "Successfully queried - Tweet ID: 673363615379013632\n", "Successfully queried - Tweet ID: 673359818736984064\n", "Successfully queried - Tweet ID: 673355879178194945\n", "Successfully queried - Tweet ID: 673352124999274496\n", "Successfully queried - Tweet ID: 673350198937153538\n", "Successfully queried - Tweet ID: 673345638550134785\n", "Successfully queried - Tweet ID: 673343217010679808\n", "Successfully queried - Tweet ID: 673342308415348736\n", "Successfully queried - Tweet ID: 673320132811366400\n", "Successfully queried - Tweet ID: 673317986296586240\n", "Successfully queried - Tweet ID: 673295268553605120\n", "Successfully queried - Tweet ID: 673270968295534593\n", "Successfully queried - Tweet ID: 673240798075449344\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 673213039743795200\n", "Successfully queried - Tweet ID: 673148804208660480\n", "Successfully queried - Tweet ID: 672997845381865473\n", "Successfully queried - Tweet ID: 672995267319328768\n", "Successfully queried - Tweet ID: 672988786805112832\n", "Successfully queried - Tweet ID: 672984142909456390\n", "Successfully queried - Tweet ID: 672980819271634944\n", "Successfully queried - Tweet ID: 672975131468300288\n", "Successfully queried - Tweet ID: 672970152493887488\n", "Successfully queried - Tweet ID: 672968025906282496\n", "Successfully queried - Tweet ID: 672964561327235073\n", "Successfully queried - Tweet ID: 672902681409806336\n", "Successfully queried - Tweet ID: 672898206762672129\n", "Successfully queried - Tweet ID: 672884426393653248\n", "Successfully queried - Tweet ID: 672877615439593473\n", "Successfully queried - Tweet ID: 672834301050937345\n", "Successfully queried - Tweet ID: 672828477930868736\n", "Successfully queried - Tweet ID: 672640509974827008\n", "Successfully queried - Tweet ID: 672622327801233409\n", "Successfully queried - Tweet ID: 672614745925664768\n", "Successfully queried - Tweet ID: 672609152938721280\n", "Successfully queried - Tweet ID: 672604026190569472\n", "Successfully queried - Tweet ID: 672594978741354496\n", "Successfully queried - Tweet ID: 672591762242805761\n", "Successfully queried - Tweet ID: 672591271085670400\n", "Successfully queried - Tweet ID: 672538107540070400\n", "Successfully queried - Tweet ID: 672523490734551040\n", "Successfully queried - Tweet ID: 672488522314567680\n", "Successfully queried - Tweet ID: 672482722825261057\n", "Successfully queried - Tweet ID: 672481316919734272\n", "Successfully queried - Tweet ID: 672475084225949696\n", "Successfully queried - Tweet ID: 672466075045466113\n", "Successfully queried - Tweet ID: 672272411274932228\n", "Successfully queried - Tweet ID: 672267570918129665\n", "Successfully queried - Tweet ID: 672264251789176834\n", "Successfully queried - Tweet ID: 672256522047614977\n", "Successfully queried - Tweet ID: 672254177670729728\n", "Successfully queried - Tweet ID: 672248013293752320\n", "Successfully queried - Tweet ID: 672245253877968896\n", "Successfully queried - Tweet ID: 672239279297454080\n", "Successfully queried - Tweet ID: 672231046314901505\n", "Successfully queried - Tweet ID: 672222792075620352\n", "Successfully queried - Tweet ID: 672205392827572224\n", "Successfully queried - Tweet ID: 672169685991993344\n", "Successfully queried - Tweet ID: 672160042234327040\n", "Successfully queried - Tweet ID: 672139350159835138\n", "Successfully queried - Tweet ID: 672125275208069120\n", "Successfully queried - Tweet ID: 672095186491711488\n", "Successfully queried - Tweet ID: 672082170312290304\n", "Successfully queried - Tweet ID: 672068090318987265\n", "Successfully queried - Tweet ID: 671896809300709376\n", "Successfully queried - Tweet ID: 671891728106971137\n", "Successfully queried - Tweet ID: 671882082306625538\n", "Successfully queried - Tweet ID: 671879137494245376\n", "Successfully queried - Tweet ID: 671874878652489728\n", "Successfully queried - Tweet ID: 671866342182637568\n", "Successfully queried - Tweet ID: 671855973984772097\n", "Successfully queried - Tweet ID: 671789708968640512\n", "Successfully queried - Tweet ID: 671768281401958400\n", "Successfully queried - Tweet ID: 671763349865160704\n", "Successfully queried - Tweet ID: 671744970634719232\n", "Successfully queried - Tweet ID: 671743150407421952\n", "Successfully queried - Tweet ID: 671735591348891648\n", "Successfully queried - Tweet ID: 671729906628341761\n", "Successfully queried - Tweet ID: 671561002136281088\n", "Successfully queried - Tweet ID: 671550332464455680\n", "Successfully queried - Tweet ID: 671547767500775424\n", "Successfully queried - Tweet ID: 671544874165002241\n", "Successfully queried - Tweet ID: 671542985629241344\n", "Successfully queried - Tweet ID: 671538301157904385\n", "Successfully queried - Tweet ID: 671536543010570240\n", "Successfully queried - Tweet ID: 671533943490011136\n", "Successfully queried - Tweet ID: 671528761649688577\n", "Successfully queried - Tweet ID: 671520732782923777\n", "Successfully queried - Tweet ID: 671518598289059840\n", "Successfully queried - Tweet ID: 671511350426865664\n", "Successfully queried - Tweet ID: 671504605491109889\n", "Successfully queried - Tweet ID: 671497587707535361\n", "Successfully queried - Tweet ID: 671488513339211776\n", "Successfully queried - Tweet ID: 671486386088865792\n", "Successfully queried - Tweet ID: 671485057807351808\n", "Successfully queried - Tweet ID: 671390180817915904\n", "Successfully queried - Tweet ID: 671362598324076544\n", "Successfully queried - Tweet ID: 671357843010908160\n", "Successfully queried - Tweet ID: 671355857343524864\n", "Successfully queried - Tweet ID: 671347597085433856\n", "Successfully queried - Tweet ID: 671186162933985280\n", "Successfully queried - Tweet ID: 671182547775299584\n", "Successfully queried - Tweet ID: 671166507850801152\n", "Successfully queried - Tweet ID: 671163268581498880\n", "Successfully queried - Tweet ID: 671159727754231808\n", "Successfully queried - Tweet ID: 671154572044468225\n", "Successfully queried - Tweet ID: 671151324042559489\n", "Successfully queried - Tweet ID: 671147085991960577\n", "Successfully queried - Tweet ID: 671141549288370177\n", "Successfully queried - Tweet ID: 671138694582165504\n", "Successfully queried - Tweet ID: 671134062904504320\n", "Successfully queried - Tweet ID: 671122204919246848\n", "Successfully queried - Tweet ID: 671115716440031232\n", "Successfully queried - Tweet ID: 671109016219725825\n", "Successfully queried - Tweet ID: 670995969505435648\n", "Successfully queried - Tweet ID: 670842764863651840\n", "Successfully queried - Tweet ID: 670840546554966016\n", "Successfully queried - Tweet ID: 670838202509447168\n", "Successfully queried - Tweet ID: 670833812859932673\n", "Successfully queried - Tweet ID: 670832455012716544\n", "Successfully queried - Tweet ID: 670826280409919488\n", "Successfully queried - Tweet ID: 670823764196741120\n", "Successfully queried - Tweet ID: 670822709593571328\n", "Successfully queried - Tweet ID: 670815497391357952\n", "Successfully queried - Tweet ID: 670811965569282048\n", "Successfully queried - Tweet ID: 670807719151067136\n", "Successfully queried - Tweet ID: 670804601705242624\n", "Successfully queried - Tweet ID: 670803562457407488\n", "Successfully queried - Tweet ID: 670797304698376195\n", "Successfully queried - Tweet ID: 670792680469889025\n", "Successfully queried - Tweet ID: 670789397210615808\n", "Successfully queried - Tweet ID: 670786190031921152\n", "Successfully queried - Tweet ID: 670783437142401025\n", "Successfully queried - Tweet ID: 670782429121134593\n", "Successfully queried - Tweet ID: 670780561024270336\n", "Successfully queried - Tweet ID: 670778058496974848\n", "Successfully queried - Tweet ID: 670764103623966721\n", "Successfully queried - Tweet ID: 670755717859713024\n", "Successfully queried - Tweet ID: 670733412878163972\n", "Successfully queried - Tweet ID: 670727704916926465\n", "Successfully queried - Tweet ID: 670717338665226240\n", "Successfully queried - Tweet ID: 670704688707301377\n", "Successfully queried - Tweet ID: 670691627984359425\n", "Successfully queried - Tweet ID: 670679630144274432\n", "Successfully queried - Tweet ID: 670676092097810432\n", "Successfully queried - Tweet ID: 670668383499735048\n", "Successfully queried - Tweet ID: 670474236058800128\n", "Successfully queried - Tweet ID: 670468609693655041\n", "Successfully queried - Tweet ID: 670465786746662913\n", "Successfully queried - Tweet ID: 670452855871037440\n", "Successfully queried - Tweet ID: 670449342516494336\n", "Successfully queried - Tweet ID: 670444955656130560\n", "Successfully queried - Tweet ID: 670442337873600512\n", "Successfully queried - Tweet ID: 670435821946826752\n", "Successfully queried - Tweet ID: 670434127938719744\n", "Successfully queried - Tweet ID: 670433248821026816\n", "Successfully queried - Tweet ID: 670428280563085312\n", "Successfully queried - Tweet ID: 670427002554466305\n", "Successfully queried - Tweet ID: 670421925039075328\n", "Successfully queried - Tweet ID: 670420569653809152\n", "Successfully queried - Tweet ID: 670417414769758208\n", "Successfully queried - Tweet ID: 670411370698022913\n", "Successfully queried - Tweet ID: 670408998013820928\n", "Successfully queried - Tweet ID: 670403879788544000\n", "Successfully queried - Tweet ID: 670385711116361728\n", "Successfully queried - Tweet ID: 670374371102445568\n", "Successfully queried - Tweet ID: 670361874861563904\n", "Successfully queried - Tweet ID: 670338931251150849\n", "Successfully queried - Tweet ID: 670319130621435904\n", "Successfully queried - Tweet ID: 670303360680108032\n", "Successfully queried - Tweet ID: 670290420111441920\n", "Successfully queried - Tweet ID: 670093938074779648\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 670086499208155136\n", "Successfully queried - Tweet ID: 670079681849372674\n", "Successfully queried - Tweet ID: 670073503555706880\n", "Successfully queried - Tweet ID: 670069087419133954\n", "Successfully queried - Tweet ID: 670061506722140161\n", "Successfully queried - Tweet ID: 670055038660800512\n", "Successfully queried - Tweet ID: 670046952931721218\n", "Successfully queried - Tweet ID: 670040295598354432\n", "Successfully queried - Tweet ID: 670037189829525505\n", "Successfully queried - Tweet ID: 670003130994700288\n", "Successfully queried - Tweet ID: 669993076832759809\n", "Successfully queried - Tweet ID: 669972011175813120\n", "Successfully queried - Tweet ID: 669970042633789440\n", "Successfully queried - Tweet ID: 669942763794931712\n", "Successfully queried - Tweet ID: 669926384437997569\n", "Successfully queried - Tweet ID: 669923323644657664\n", "Successfully queried - Tweet ID: 669753178989142016\n", "Successfully queried - Tweet ID: 669749430875258880\n", "Successfully queried - Tweet ID: 669684865554620416\n", "Successfully queried - Tweet ID: 669683899023405056\n", "Successfully queried - Tweet ID: 669682095984410625\n", "Successfully queried - Tweet ID: 669680153564442624\n", "Successfully queried - Tweet ID: 669661792646373376\n", "Successfully queried - Tweet ID: 669625907762618368\n", "Successfully queried - Tweet ID: 669603084620980224\n", "Successfully queried - Tweet ID: 669597912108789760\n", "Successfully queried - Tweet ID: 669583744538451968\n", "Successfully queried - Tweet ID: 669573570759163904\n", "Successfully queried - Tweet ID: 669571471778410496\n", "Successfully queried - Tweet ID: 669567591774625800\n", "Successfully queried - Tweet ID: 669564461267722241\n", "Successfully queried - Tweet ID: 669393256313184256\n", "Successfully queried - Tweet ID: 669375718304980992\n", "Successfully queried - Tweet ID: 669371483794317312\n", "Successfully queried - Tweet ID: 669367896104181761\n", "Successfully queried - Tweet ID: 669363888236994561\n", "Successfully queried - Tweet ID: 669359674819481600\n", "Successfully queried - Tweet ID: 669354382627049472\n", "Successfully queried - Tweet ID: 669353438988365824\n", "Successfully queried - Tweet ID: 669351434509529089\n", "Successfully queried - Tweet ID: 669328503091937280\n", "Successfully queried - Tweet ID: 669327207240699904\n", "Successfully queried - Tweet ID: 669324657376567296\n", "Successfully queried - Tweet ID: 669216679721873412\n", "Successfully queried - Tweet ID: 669214165781868544\n", "Successfully queried - Tweet ID: 669203728096960512\n", "Successfully queried - Tweet ID: 669037058363662336\n", "Successfully queried - Tweet ID: 669015743032369152\n", "Successfully queried - Tweet ID: 669006782128353280\n", "Successfully queried - Tweet ID: 669000397445533696\n", "Successfully queried - Tweet ID: 668994913074286592\n", "Successfully queried - Tweet ID: 668992363537309700\n", "Successfully queried - Tweet ID: 668989615043424256\n", "Successfully queried - Tweet ID: 668988183816871936\n", "Successfully queried - Tweet ID: 668986018524233728\n", "Successfully queried - Tweet ID: 668981893510119424\n", "Successfully queried - Tweet ID: 668979806671884288\n", "Successfully queried - Tweet ID: 668975677807423489\n", "Successfully queried - Tweet ID: 668967877119254528\n", "Successfully queried - Tweet ID: 668960084974809088\n", "Successfully queried - Tweet ID: 668955713004314625\n", "Successfully queried - Tweet ID: 668932921458302977\n", "Successfully queried - Tweet ID: 668902994700836864\n", "Successfully queried - Tweet ID: 668892474547511297\n", "Successfully queried - Tweet ID: 668872652652679168\n", "Successfully queried - Tweet ID: 668852170888998912\n", "Successfully queried - Tweet ID: 668826086256599040\n", "Successfully queried - Tweet ID: 668815180734689280\n", "Successfully queried - Tweet ID: 668779399630725120\n", "Successfully queried - Tweet ID: 668655139528511488\n", "Successfully queried - Tweet ID: 668645506898350081\n", "Successfully queried - Tweet ID: 668643542311546881\n", "Successfully queried - Tweet ID: 668641109086707712\n", "Successfully queried - Tweet ID: 668636665813057536\n", "Successfully queried - Tweet ID: 668633411083464705\n", "Successfully queried - Tweet ID: 668631377374486528\n", "Successfully queried - Tweet ID: 668627278264475648\n", "Successfully queried - Tweet ID: 668625577880875008\n", "Successfully queried - Tweet ID: 668623201287675904\n", "Successfully queried - Tweet ID: 668620235289837568\n", "Successfully queried - Tweet ID: 668614819948453888\n", "Successfully queried - Tweet ID: 668587383441514497\n", "Successfully queried - Tweet ID: 668567822092664832\n", "Successfully queried - Tweet ID: 668544745690562560\n", "Successfully queried - Tweet ID: 668542336805281792\n", "Successfully queried - Tweet ID: 668537837512433665\n", "Successfully queried - Tweet ID: 668528771708952576\n", "Successfully queried - Tweet ID: 668507509523615744\n", "Successfully queried - Tweet ID: 668496999348633600\n", "Successfully queried - Tweet ID: 668484198282485761\n", "Successfully queried - Tweet ID: 668480044826800133\n", "Successfully queried - Tweet ID: 668466899341221888\n", "Successfully queried - Tweet ID: 668297328638447616\n", "Successfully queried - Tweet ID: 668291999406125056\n", "Successfully queried - Tweet ID: 668286279830867968\n", "Successfully queried - Tweet ID: 668274247790391296\n", "Successfully queried - Tweet ID: 668268907921326080\n", "Successfully queried - Tweet ID: 668256321989451776\n", "Successfully queried - Tweet ID: 668248472370458624\n", "Successfully queried - Tweet ID: 668237644992782336\n", "Successfully queried - Tweet ID: 668226093875376128\n", "Successfully queried - Tweet ID: 668221241640230912\n", "Successfully queried - Tweet ID: 668204964695683073\n", "Successfully queried - Tweet ID: 668190681446379520\n", "Successfully queried - Tweet ID: 668171859951755264\n", "Successfully queried - Tweet ID: 668154635664932864\n", "Successfully queried - Tweet ID: 668142349051129856\n", "Successfully queried - Tweet ID: 668113020489474048\n", "Successfully queried - Tweet ID: 667937095915278337\n", "Successfully queried - Tweet ID: 667924896115245057\n", "Successfully queried - Tweet ID: 667915453470232577\n", "Successfully queried - Tweet ID: 667911425562669056\n", "Successfully queried - Tweet ID: 667902449697558528\n", "Successfully queried - Tweet ID: 667886921285246976\n", "Successfully queried - Tweet ID: 667885044254572545\n", "Successfully queried - Tweet ID: 667878741721415682\n", "Successfully queried - Tweet ID: 667873844930215936\n", "Successfully queried - Tweet ID: 667866724293877760\n", "Successfully queried - Tweet ID: 667861340749471744\n", "Successfully queried - Tweet ID: 667832474953625600\n", "Successfully queried - Tweet ID: 667806454573760512\n", "Successfully queried - Tweet ID: 667801013445750784\n", "Successfully queried - Tweet ID: 667793409583771648\n", "Successfully queried - Tweet ID: 667782464991965184\n", "Successfully queried - Tweet ID: 667773195014021121\n", "Successfully queried - Tweet ID: 667766675769573376\n", "Successfully queried - Tweet ID: 667728196545200128\n", "Successfully queried - Tweet ID: 667724302356258817\n", "Successfully queried - Tweet ID: 667550904950915073\n", "Successfully queried - Tweet ID: 667550882905632768\n", "Successfully queried - Tweet ID: 667549055577362432\n", "Successfully queried - Tweet ID: 667546741521195010\n", "Successfully queried - Tweet ID: 667544320556335104\n", "Successfully queried - Tweet ID: 667538891197542400\n", "Successfully queried - Tweet ID: 667534815156183040\n", "Successfully queried - Tweet ID: 667530908589760512\n", "Successfully queried - Tweet ID: 667524857454854144\n", "Successfully queried - Tweet ID: 667517642048163840\n", "Successfully queried - Tweet ID: 667509364010450944\n", "Successfully queried - Tweet ID: 667502640335572993\n", "Successfully queried - Tweet ID: 667495797102141441\n", "Successfully queried - Tweet ID: 667491009379606528\n", "Successfully queried - Tweet ID: 667470559035432960\n", "Successfully queried - Tweet ID: 667455448082227200\n", "Successfully queried - Tweet ID: 667453023279554560\n", "Successfully queried - Tweet ID: 667443425659232256\n", "Successfully queried - Tweet ID: 667437278097252352\n", "Successfully queried - Tweet ID: 667435689202614272\n", "Successfully queried - Tweet ID: 667405339315146752\n", "Successfully queried - Tweet ID: 667393430834667520\n", "Successfully queried - Tweet ID: 667369227918143488\n", "Successfully queried - Tweet ID: 667211855547486208\n", "Successfully queried - Tweet ID: 667200525029539841\n", "Successfully queried - Tweet ID: 667192066997374976\n", "Successfully queried - Tweet ID: 667188689915760640\n", "Successfully queried - Tweet ID: 667182792070062081\n", "Successfully queried - Tweet ID: 667177989038297088\n", "Successfully queried - Tweet ID: 667176164155375616\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully queried - Tweet ID: 667174963120574464\n", "Successfully queried - Tweet ID: 667171260800061440\n", "Successfully queried - Tweet ID: 667165590075940865\n", "Successfully queried - Tweet ID: 667160273090932737\n", "Successfully queried - Tweet ID: 667152164079423490\n", "Successfully queried - Tweet ID: 667138269671505920\n", "Successfully queried - Tweet ID: 667119796878725120\n", "Successfully queried - Tweet ID: 667090893657276420\n", "Successfully queried - Tweet ID: 667073648344346624\n", "Successfully queried - Tweet ID: 667070482143944705\n", "Successfully queried - Tweet ID: 667065535570550784\n", "Successfully queried - Tweet ID: 667062181243039745\n", "Successfully queried - Tweet ID: 667044094246576128\n", "Successfully queried - Tweet ID: 667012601033924608\n", "Successfully queried - Tweet ID: 666996132027977728\n", "Successfully queried - Tweet ID: 666983947667116034\n", "Successfully queried - Tweet ID: 666837028449972224\n", "Successfully queried - Tweet ID: 666835007768551424\n", "Successfully queried - Tweet ID: 666826780179869698\n", "Successfully queried - Tweet ID: 666817836334096384\n", "Successfully queried - Tweet ID: 666804364988780544\n", "Successfully queried - Tweet ID: 666786068205871104\n", "Successfully queried - Tweet ID: 666781792255496192\n", "Successfully queried - Tweet ID: 666776908487630848\n", "Successfully queried - Tweet ID: 666739327293083650\n", "Successfully queried - Tweet ID: 666701168228331520\n", "Successfully queried - Tweet ID: 666691418707132416\n", "Successfully queried - Tweet ID: 666649482315059201\n", "Successfully queried - Tweet ID: 666644823164719104\n", "Successfully queried - Tweet ID: 666454714377183233\n", "Successfully queried - Tweet ID: 666447344410484738\n", "Successfully queried - Tweet ID: 666437273139982337\n", "Successfully queried - Tweet ID: 666435652385423360\n", "Successfully queried - Tweet ID: 666430724426358785\n", "Successfully queried - Tweet ID: 666428276349472768\n", "Successfully queried - Tweet ID: 666421158376562688\n", "Successfully queried - Tweet ID: 666418789513326592\n", "Successfully queried - Tweet ID: 666411507551481857\n", "Successfully queried - Tweet ID: 666407126856765440\n", "Successfully queried - Tweet ID: 666396247373291520\n", "Successfully queried - Tweet ID: 666373753744588802\n", "Successfully queried - Tweet ID: 666362758909284353\n", "Successfully queried - Tweet ID: 666353288456101888\n", "Successfully queried - Tweet ID: 666345417576210432\n", "Successfully queried - Tweet ID: 666337882303524864\n", "Successfully queried - Tweet ID: 666293911632134144\n", "Successfully queried - Tweet ID: 666287406224695296\n", "Successfully queried - Tweet ID: 666273097616637952\n", "Successfully queried - Tweet ID: 666268910803644416\n", "Successfully queried - Tweet ID: 666104133288665088\n", "Successfully queried - Tweet ID: 666102155909144576\n", "Successfully queried - Tweet ID: 666099513787052032\n", "Successfully queried - Tweet ID: 666094000022159362\n", "Successfully queried - Tweet ID: 666082916733198337\n", "Successfully queried - Tweet ID: 666073100786774016\n", "Successfully queried - Tweet ID: 666071193221509120\n", "Successfully queried - Tweet ID: 666063827256086533\n", "Successfully queried - Tweet ID: 666058600524156928\n", "Successfully queried - Tweet ID: 666057090499244032\n", "Successfully queried - Tweet ID: 666055525042405380\n", "Successfully queried - Tweet ID: 666051853826850816\n", "Successfully queried - Tweet ID: 666050758794694657\n", "Successfully queried - Tweet ID: 666049248165822465\n", "Successfully queried - Tweet ID: 666044226329800704\n", "Successfully queried - Tweet ID: 666033412701032449\n", "Successfully queried - Tweet ID: 666029285002620928\n", "Successfully queried - Tweet ID: 666020888022790149\n" ] } ], "source": [ "# Just to keep the tweet IDs with no status found\n", "no_status_tweet_ids = []\n", "\n", "# Start code timer\n", "start = timer()\n", "\n", "# Create and open a tweet_json.txt file\n", "with open('tweet_json.txt', mode='w') as json_file:\n", " \n", " # Iterate over the \"tweet_id\" column from WeRateDogs twitter archive \"df_archive_clean\"\n", " for tweet_id in np.nditer(df_archive_clean.tweet_id):\n", " \n", " # Try...except in case some tweets IDs do not lead to any status\n", " try:\n", " # Get the object associated with tweet_id\n", " a_tweet = api.get_status(tweet_id, tweet_mode='extended')\n", " # Get the tweet JSON info from the object, and write it into a file\n", " json_file.write(json.dumps(a_tweet._json)+'\\n')\n", " except tweepy.TweepError as e:\n", " print(\"Error Tweet_Id: \" + str(tweet_id) + \" : \" + str(e))\n", " # Keep the Ids with no status found\n", " no_status_tweet_ids.append(str(tweet_id))\n", " \n", " # Print out the tweet ID after it was queried\n", " print(\"Successfully queried - Tweet ID: \" + str(tweet_id))\n", " \n", "# End code timer\n", "stop = timer()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Twitter Query and file writing duration: 1430.7260451979819\n" ] } ], "source": [ "# Get the duration for the whole data collection from twitter\n", "print(\"Twitter Query and file writing duration: \" + str(stop - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now as `tweet_json.txt` file has been properly created, we can read it and create the associated pandas DataFrame with at a minimum \"tweet ID\", \"retweet count\", and \"favorite count\" as columns." ] }, { "cell_type": "code", "execution_count": 791, "metadata": {}, "outputs": [], "source": [ "# A dictionary to build and convert to a DataFrame later\n", "df_tweet_json = []" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've got a basic understanding of the tweet json object from here : https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html. So I choose to collect \"created_at\", \"id\", \"retweet_count\", \"favorite_count\" and \"full_text\"." ] }, { "cell_type": "code", "execution_count": 792, "metadata": {}, "outputs": [], "source": [ "# Load \"tweet_json.txt\"\n", "with open('tweet_json.txt') as tweet_json:\n", " \n", " # Get each line and load it as a json object\n", " for a_line in tweet_json:\n", " line_data_json = json.loads(a_line)\n", " \n", " # Get the tweet ID\n", " tweet_id = line_data_json['id']\n", " \n", " # Get the tweet retweet count\n", " retweet_count = line_data_json['retweet_count']\n", " \n", " # Get the tweet favorites count\n", " favorite_count = line_data_json['favorite_count']\n", " \n", " # Get the tweet creation date\n", " creation_date = line_data_json['created_at']\n", " \n", " # Get the tweet full text\n", " tweet_full_text = line_data_json['full_text']\n", " \n", " \n", " # Add the extracted data to the dictionary\n", " df_tweet_json.append({'tweet_id': tweet_id,\n", " 'created_at': creation_date,\n", " 'retweet_count': retweet_count,\n", " 'favorite_count': favorite_count,\n", " 'tweet_full_text': tweet_full_text})" ] }, { "cell_type": "code", "execution_count": 793, "metadata": {}, "outputs": [], "source": [ "# Convert the dictionary to a DataFrame\n", "df_twitter = pd.DataFrame(df_tweet_json, columns = ['tweet_id', 'created_at', 'retweet_count', 'favorite_count', 'tweet_full_text'])" ] }, { "cell_type": "code", "execution_count": 794, "metadata": {}, "outputs": [], "source": [ "# Create a copy to use for assessing and cleaning\n", "df_twitter_clean = df_twitter.copy()" ] }, { "cell_type": "code", "execution_count": 795, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>created_at</th>\n", " <th>retweet_count</th>\n", " <th>favorite_count</th>\n", " <th>tweet_full_text</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>892420643555336193</td>\n", " <td>Tue Aug 01 16:23:56 +0000 2017</td>\n", " <td>8204</td>\n", " <td>37636</td>\n", " <td>This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>892177421306343426</td>\n", " <td>Tue Aug 01 00:17:27 +0000 2017</td>\n", " <td>6071</td>\n", " <td>32337</td>\n", " <td>This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>891815181378084864</td>\n", " <td>Mon Jul 31 00:18:03 +0000 2017</td>\n", " <td>4011</td>\n", " <td>24363</td>\n", " <td>This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>891689557279858688</td>\n", " <td>Sun Jul 30 15:58:51 +0000 2017</td>\n", " <td>8366</td>\n", " <td>40959</td>\n", " <td>This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>891327558926688256</td>\n", " <td>Sat Jul 29 16:00:24 +0000 2017</td>\n", " <td>9063</td>\n", " <td>39161</td>\n", " <td>This is Franklin. He would like you to stop calling him \"cute.\" He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id created_at retweet_count \\\n", "0 892420643555336193 Tue Aug 01 16:23:56 +0000 2017 8204 \n", "1 892177421306343426 Tue Aug 01 00:17:27 +0000 2017 6071 \n", "2 891815181378084864 Mon Jul 31 00:18:03 +0000 2017 4011 \n", "3 891689557279858688 Sun Jul 30 15:58:51 +0000 2017 8366 \n", "4 891327558926688256 Sat Jul 29 16:00:24 +0000 2017 9063 \n", "\n", " favorite_count \\\n", "0 37636 \n", "1 32337 \n", "2 24363 \n", "3 40959 \n", "4 39161 \n", "\n", " tweet_full_text \n", "0 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU \n", "1 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV \n", "2 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB \n", "3 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ \n", "4 This is Franklin. He would like you to stop calling him \"cute.\" He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f " ] }, "execution_count": 795, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view on the dataframe\n", "df_twitter_clean.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, We've got three DataFrames for the next assessment and cleaning activities: \n", "* `df_archive_clean` : WeRateDogs twitter enhanced archive. We know that the ratings are probably not all correct, same for the dog names and dog stages.\n", "* `df_image_clean` : the tweets images prediction, from `image-predictions.tsv` which we gathered from an URL.\n", "* `df_twitter_clean`: additional information - retweet counts, favorite counts - gathered from twitter because missing from `df_archive_clean`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='assessing'></a>\n", "### Assessing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We focus first on missing data, completness, format as quality issues. Then we examine tidiness. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='weratedogstwitter'></a>\n", "#### 1. The WeRateDogs twitter archive" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the information we already know about `df_archive_clean` : the ratings are probably not all correct, same for the dog names and dog stages. So we'll need to assess those columns." ] }, { "cell_type": "code", "execution_count": 796, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>in_reply_to_status_id</th>\n", " <th>in_reply_to_user_id</th>\n", " <th>timestamp</th>\n", " <th>source</th>\n", " <th>text</th>\n", " <th>retweeted_status_id</th>\n", " <th>retweeted_status_user_id</th>\n", " <th>retweeted_status_timestamp</th>\n", " <th>expanded_urls</th>\n", " <th>rating_numerator</th>\n", " <th>rating_denominator</th>\n", " <th>name</th>\n", " <th>doggo</th>\n", " <th>floofer</th>\n", " <th>pupper</th>\n", " <th>puppo</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>2147</th>\n", " <td>669753178989142016</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2015-11-26 05:42:55 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>Meet Chester. He just ate a lot and now he can't move. 10/10 that's going to be me in about 17 hours https://t.co/63jh1tYZa5</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/669753178989142016/photo/1</td>\n", " <td>10</td>\n", " <td>10</td>\n", " <td>Chester</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>603</th>\n", " <td>798628517273620480</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2016-11-15 20:47:30 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>RT @dog_rates: This a Norwegian Pewterschmidt named Tickles. Ears for days. 12/10 I care deeply for Tickles https://t.co/0aDF62KVP7</td>\n", " <td>6.675094e+17</td>\n", " <td>4.196984e+09</td>\n", " <td>2015-11-20 01:06:48 +0000</td>\n", " <td>https://twitter.com/dog_rates/status/667509364010450944/photo/1,https://twitter.com/dog_rates/status/667509364010450944/photo/1</td>\n", " <td>12</td>\n", " <td>10</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>2308</th>\n", " <td>666817836334096384</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2015-11-18 03:18:55 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Jeph. He is a German Boston Shuttlecock. Enjoys couch. Lost body during French Revolution. True hero 9/10 https://t.co/8whlkYw3mO</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/666817836334096384/photo/1</td>\n", " <td>9</td>\n", " <td>10</td>\n", " <td>Jeph</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>468</th>\n", " <td>817056546584727552</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2017-01-05 17:13:55 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Chloe. She fell asleep at the wheel. Absolute menace on the roadways. Sneaky tongue slip tho. 11/10 https://t.co/r6SLVN2VUH</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/817056546584727552/photo/1</td>\n", " <td>11</td>\n", " <td>10</td>\n", " <td>Chloe</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " <tr>\n", " <th>726</th>\n", " <td>782598640137187329</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>2016-10-02 15:10:30 +0000</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Timmy. He's quite large. According to a trusted source it's actually a dog wearing a dog suit. 11/10 https://t.co/BIUchFwHqn</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>https://twitter.com/dog_rates/status/782598640137187329/photo/1</td>\n", " <td>11</td>\n", " <td>10</td>\n", " <td>Timmy</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>None</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "2147 669753178989142016 NaN NaN \n", "603 798628517273620480 NaN NaN \n", "2308 666817836334096384 NaN NaN \n", "468 817056546584727552 NaN NaN \n", "726 782598640137187329 NaN NaN \n", "\n", " timestamp \\\n", "2147 2015-11-26 05:42:55 +0000 \n", "603 2016-11-15 20:47:30 +0000 \n", "2308 2015-11-18 03:18:55 +0000 \n", "468 2017-01-05 17:13:55 +0000 \n", "726 2016-10-02 15:10:30 +0000 \n", "\n", " source \\\n", "2147 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "603 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "2308 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "468 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "726 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "\n", " text \\\n", "2147 Meet Chester. He just ate a lot and now he can't move. 10/10 that's going to be me in about 17 hours https://t.co/63jh1tYZa5 \n", "603 RT @dog_rates: This a Norwegian Pewterschmidt named Tickles. Ears for days. 12/10 I care deeply for Tickles https://t.co/0aDF62KVP7 \n", "2308 This is Jeph. He is a German Boston Shuttlecock. Enjoys couch. Lost body during French Revolution. True hero 9/10 https://t.co/8whlkYw3mO \n", "468 This is Chloe. She fell asleep at the wheel. Absolute menace on the roadways. Sneaky tongue slip tho. 11/10 https://t.co/r6SLVN2VUH \n", "726 This is Timmy. He's quite large. According to a trusted source it's actually a dog wearing a dog suit. 11/10 https://t.co/BIUchFwHqn \n", "\n", " retweeted_status_id retweeted_status_user_id \\\n", "2147 NaN NaN \n", "603 6.675094e+17 4.196984e+09 \n", "2308 NaN NaN \n", "468 NaN NaN \n", "726 NaN NaN \n", "\n", " retweeted_status_timestamp \\\n", "2147 NaN \n", "603 2015-11-20 01:06:48 +0000 \n", "2308 NaN \n", "468 NaN \n", "726 NaN \n", "\n", " expanded_urls \\\n", "2147 https://twitter.com/dog_rates/status/669753178989142016/photo/1 \n", "603 https://twitter.com/dog_rates/status/667509364010450944/photo/1,https://twitter.com/dog_rates/status/667509364010450944/photo/1 \n", "2308 https://twitter.com/dog_rates/status/666817836334096384/photo/1 \n", "468 https://twitter.com/dog_rates/status/817056546584727552/photo/1 \n", "726 https://twitter.com/dog_rates/status/782598640137187329/photo/1 \n", "\n", " rating_numerator rating_denominator name doggo floofer pupper puppo \n", "2147 10 10 Chester None None None None \n", "603 12 10 None None None None None \n", "2308 9 10 Jeph None None None None \n", "468 11 10 Chloe None None None None \n", "726 11 10 Timmy None None None None " ] }, "execution_count": 796, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view on the dataset\n", "df_archive_clean.sample(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Elements from a first visual observation:\n", "* \"source\" column contains html tags. We do not consider this as an issue regarding the analysis we plan to conduct.\n", "* \"expanded_urls\" contains some duplicated values. " ] }, { "cell_type": "code", "execution_count": 797, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2356, 17)" ] }, "execution_count": 797, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the dataset size\n", "df_archive_clean.shape" ] }, { "cell_type": "code", "execution_count": 798, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 798, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have duplicates ?\n", "sum(df_archive_clean.duplicated())" ] }, { "cell_type": "code", "execution_count": 799, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "tweet_id 0 \n", "in_reply_to_status_id 2278\n", "in_reply_to_user_id 2278\n", "timestamp 0 \n", "source 0 \n", "text 0 \n", "retweeted_status_id 2175\n", "retweeted_status_user_id 2175\n", "retweeted_status_timestamp 2175\n", "expanded_urls 59 \n", "rating_numerator 0 \n", "rating_denominator 0 \n", "name 0 \n", "doggo 0 \n", "floofer 0 \n", "pupper 0 \n", "puppo 0 \n", "dtype: int64" ] }, "execution_count": 799, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# What about missing data ?\n", "df_archive_clean.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basically, the rows with data in the \"retweeted_status\" columns are the ones related to a retweet. As not having retweet is a prerequisite, I will delete all the rows with values in the \"retweeted_status*\" columns. I will do the same for the \"reply_to\" columns. Following to that, I will delete those columns because those will not be useful anymore." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality Issue 1 : The WeRateDgos twitter archive enhanced contains retweets, which could not be part or our study " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe 59 missing expanded_urls. \n", "#### Quality Issue 2 : The WeRateDgos twitter archive enhanced has 59 missing value from the \"expanded_urls\" column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's check the unique values for some columns, the ones we are most likely to use further." ] }, { "cell_type": "code", "execution_count": 800, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "array([ 13, 12, 14, 5, 17, 11, 10, 420, 666, 6, 15,\n", " 182, 960, 0, 75, 7, 84, 9, 24, 8, 1, 27,\n", " 3, 4, 165, 1776, 204, 50, 99, 80, 45, 60, 44,\n", " 143, 121, 20, 26, 2, 144, 88])" ] }, "execution_count": 800, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the unique values for rating numerator, as we knew we might have issues there\n", "df_archive_clean.rating_numerator.unique()" ] }, { "cell_type": "code", "execution_count": 801, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>rating_numerator</th>\n", " <th>rating_denominator</th>\n", " <th>text</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>315</th>\n", " <td>0</td>\n", " <td>10</td>\n", " <td>When you're so blinded by your systematic plagiarism that you forget what day it is. 0/10 https://t.co/YbEJPkg4Ag</td>\n", " </tr>\n", " <tr>\n", " <th>1016</th>\n", " <td>0</td>\n", " <td>10</td>\n", " <td>PUPDATE: can't see any. Even if I could, I couldn't reach them to pet. 0/10 much disappointment https://t.co/c7WXaB2nqX</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " rating_numerator rating_denominator \\\n", "315 0 10 \n", "1016 0 10 \n", "\n", " text \n", "315 When you're so blinded by your systematic plagiarism that you forget what day it is. 0/10 https://t.co/YbEJPkg4Ag \n", "1016 PUPDATE: can't see any. Even if I could, I couldn't reach them to pet. 0/10 much disappointment https://t.co/c7WXaB2nqX " ] }, "execution_count": 801, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# is the 0 value a normal situation ?\n", "df_archive_clean.query('rating_numerator == 0')[['rating_numerator', 'rating_denominator', 'text']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've got some very high values ! let's check the context to see if this situation really makes sense." ] }, { "cell_type": "code", "execution_count": 802, "metadata": {}, "outputs": [], "source": [ "# Set the column width to max, so we can easily view all the cells text data\n", "pd.set_option('display.max_colwidth', -1)" ] }, { "cell_type": "code", "execution_count": 803, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>rating_numerator</th>\n", " <th>rating_denominator</th>\n", " <th>text</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>313</th>\n", " <td>960</td>\n", " <td>0</td>\n", " <td>@jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho</td>\n", " </tr>\n", " <tr>\n", " <th>342</th>\n", " <td>11</td>\n", " <td>15</td>\n", " <td>@docmisterio account started on 11/15/15</td>\n", " </tr>\n", " <tr>\n", " <th>433</th>\n", " <td>84</td>\n", " <td>70</td>\n", " <td>The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd</td>\n", " </tr>\n", " <tr>\n", " <th>516</th>\n", " <td>24</td>\n", " <td>7</td>\n", " <td>Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. \\nKeep Sam smiling by clicking and sharing this link:\\nhttps://t.co/98tB8y7y7t https://t.co/LouL5vdvxx</td>\n", " </tr>\n", " <tr>\n", " <th>784</th>\n", " <td>9</td>\n", " <td>11</td>\n", " <td>RT @dog_rates: After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https:/…</td>\n", " </tr>\n", " <tr>\n", " <th>902</th>\n", " <td>165</td>\n", " <td>150</td>\n", " <td>Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE</td>\n", " </tr>\n", " <tr>\n", " <th>1068</th>\n", " <td>9</td>\n", " <td>11</td>\n", " <td>After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ</td>\n", " </tr>\n", " <tr>\n", " <th>1120</th>\n", " <td>204</td>\n", " <td>170</td>\n", " <td>Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv</td>\n", " </tr>\n", " <tr>\n", " <th>1165</th>\n", " <td>4</td>\n", " <td>20</td>\n", " <td>Happy 4/20 from the squad! 13/10 for all https://t.co/eV1diwds8a</td>\n", " </tr>\n", " <tr>\n", " <th>1202</th>\n", " <td>50</td>\n", " <td>50</td>\n", " <td>This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq</td>\n", " </tr>\n", " <tr>\n", " <th>1228</th>\n", " <td>99</td>\n", " <td>90</td>\n", " <td>Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1</td>\n", " </tr>\n", " <tr>\n", " <th>1254</th>\n", " <td>80</td>\n", " <td>80</td>\n", " <td>Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12</td>\n", " </tr>\n", " <tr>\n", " <th>1274</th>\n", " <td>45</td>\n", " <td>50</td>\n", " <td>From left to right:\\nCletus, Jerome, Alejandro, Burp, &amp; Titson\\nNone know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK</td>\n", " </tr>\n", " <tr>\n", " <th>1351</th>\n", " <td>60</td>\n", " <td>50</td>\n", " <td>Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa</td>\n", " </tr>\n", " <tr>\n", " <th>1433</th>\n", " <td>44</td>\n", " <td>40</td>\n", " <td>Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ</td>\n", " </tr>\n", " <tr>\n", " <th>1598</th>\n", " <td>4</td>\n", " <td>20</td>\n", " <td>Yes I do realize a rating of 4/20 would've been fitting. However, it would be unjust to give these cooperative pups that low of a rating</td>\n", " </tr>\n", " <tr>\n", " <th>1634</th>\n", " <td>143</td>\n", " <td>130</td>\n", " <td>Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3</td>\n", " </tr>\n", " <tr>\n", " <th>1635</th>\n", " <td>121</td>\n", " <td>110</td>\n", " <td>Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55</td>\n", " </tr>\n", " <tr>\n", " <th>1662</th>\n", " <td>7</td>\n", " <td>11</td>\n", " <td>This is Darrel. He just robbed a 7/11 and is in a high speed police chase. Was just spotted by the helicopter 10/10 https://t.co/7EsP8LmSp5</td>\n", " </tr>\n", " <tr>\n", " <th>1663</th>\n", " <td>20</td>\n", " <td>16</td>\n", " <td>I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible</td>\n", " </tr>\n", " <tr>\n", " <th>1779</th>\n", " <td>144</td>\n", " <td>120</td>\n", " <td>IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq</td>\n", " </tr>\n", " <tr>\n", " <th>1843</th>\n", " <td>88</td>\n", " <td>80</td>\n", " <td>Here we have an entire platoon of puppers. Total score: 88/80 would pet all at once https://t.co/y93p6FLvVw</td>\n", " </tr>\n", " <tr>\n", " <th>2335</th>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>This is an Albanian 3 1/2 legged Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " rating_numerator rating_denominator \\\n", "313 960 0 \n", "342 11 15 \n", "433 84 70 \n", "516 24 7 \n", "784 9 11 \n", "902 165 150 \n", "1068 9 11 \n", "1120 204 170 \n", "1165 4 20 \n", "1202 50 50 \n", "1228 99 90 \n", "1254 80 80 \n", "1274 45 50 \n", "1351 60 50 \n", "1433 44 40 \n", "1598 4 20 \n", "1634 143 130 \n", "1635 121 110 \n", "1662 7 11 \n", "1663 20 16 \n", "1779 144 120 \n", "1843 88 80 \n", "2335 1 2 \n", "\n", " text \n", "313 @jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho \n", "342 @docmisterio account started on 11/15/15 \n", "433 The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd \n", "516 Meet Sam. She smiles 24/7 & secretly aspires to be a reindeer. \\nKeep Sam smiling by clicking and sharing this link:\\nhttps://t.co/98tB8y7y7t https://t.co/LouL5vdvxx \n", "784 RT @dog_rates: After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https:/… \n", "902 Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE \n", "1068 After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ \n", "1120 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv \n", "1165 Happy 4/20 from the squad! 13/10 for all https://t.co/eV1diwds8a \n", "1202 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq \n", "1228 Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1 \n", "1254 Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12 \n", "1274 From left to right:\\nCletus, Jerome, Alejandro, Burp, & Titson\\nNone know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK \n", "1351 Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa \n", "1433 Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ \n", "1598 Yes I do realize a rating of 4/20 would've been fitting. However, it would be unjust to give these cooperative pups that low of a rating \n", "1634 Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3 \n", "1635 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 \n", "1662 This is Darrel. He just robbed a 7/11 and is in a high speed police chase. Was just spotted by the helicopter 10/10 https://t.co/7EsP8LmSp5 \n", "1663 I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible \n", "1779 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq \n", "1843 Here we have an entire platoon of puppers. Total score: 88/80 would pet all at once https://t.co/y93p6FLvVw \n", "2335 This is an Albanian 3 1/2 legged Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv " ] }, "execution_count": 803, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the cases with a suspicious denominator so we can observe them visually\n", "df_denominator_issues = df_archive_clean.query('rating_denominator != 10')[['rating_numerator', 'rating_denominator', 'text']]\n", "df_denominator_issues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe :\n", "* One case where the extracted information - 24/7 - is not a rating \n", "* Normal situations, as previously observed : the denominator is high but inline with the numerator value\n", "* Other situations where the first \"rating like\" values encountered have been extracted instead of the second ones. For example we see a line where 50/50 have been extracted instead of 11/10, which is the real rating within the same text message. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality Issue 4 : WeRateDgos twitter archive - in some cases, the relevant rating information, consequently rating_numerator and rating_denominator, have not been extracted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We continue to check unique values for some columns. \n", "Visually, we saw \"none\" as dog names. Let's dig in that." ] }, { "cell_type": "code", "execution_count": 804, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>name</th>\n", " <th>text</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>1928</th>\n", " <td>None</td>\n", " <td>Herd of wild dogs here. Not sure what they're trying to do. No real goals in life. 3/10 find your purpose puppers https://t.co/t5ih0VrK02</td>\n", " </tr>\n", " <tr>\n", " <th>2013</th>\n", " <td>None</td>\n", " <td>Exotic underwater dog here. Very shy. Wont return tennis balls I toss him. Never been petted. 5/10 I bet he's soft https://t.co/WH7Nzc5IBA</td>\n", " </tr>\n", " <tr>\n", " <th>1783</th>\n", " <td>None</td>\n", " <td>Endangered triangular pup here. Could be a wizard. Caught mid-laugh. No legs. Just fluff. Probably a wizard. 9/10 https://t.co/GFVIHIod0Z</td>\n", " </tr>\n", " <tr>\n", " <th>2067</th>\n", " <td>None</td>\n", " <td>Neat pup here. Enjoys lettuce. Long af ears. Short lil legs. Hops surprisingly high for dog. 9/10 still very petable https://t.co/HYR611wiA4</td>\n", " </tr>\n", " <tr>\n", " <th>494</th>\n", " <td>None</td>\n", " <td>We only rate dogs. Please don't send in other things like this very good Christmas tree. Thank you... 13/10 https://t.co/rvSANEsQZJ</td>\n", " </tr>\n", " <tr>\n", " <th>1408</th>\n", " <td>None</td>\n", " <td>ERMAHGERD 12/10 please enjoy https://t.co/7WrAWKdBac</td>\n", " </tr>\n", " <tr>\n", " <th>212</th>\n", " <td>None</td>\n", " <td>RT @eddie_coe98: Thanks @dog_rates completed my laptop. 10/10 would buy again https://t.co/bO0rThDlXI</td>\n", " </tr>\n", " <tr>\n", " <th>179</th>\n", " <td>None</td>\n", " <td>@Marc_IRL pixelated af 12/10</td>\n", " </tr>\n", " <tr>\n", " <th>2131</th>\n", " <td>None</td>\n", " <td>\"Hi yes this is dog. I can't help with that s- sir please... the manager isn't in right n- well that was rude\"\\n10/10 https://t.co/DuQXATW27f</td>\n", " </tr>\n", " <tr>\n", " <th>1935</th>\n", " <td>None</td>\n", " <td>This pup is sad bc he didn't get to be the toy car. Also he has shitty money management skills. 10/10 still cute tho https://t.co/PiSXXZjDSJ</td>\n", " </tr>\n", " <tr>\n", " <th>268</th>\n", " <td>None</td>\n", " <td>Here we have some incredible doggos for #K9VeteransDay. All brave as h*ck. Salute your dog in solidarity. 14/10 for all https://t.co/SVNMdFqKDL</td>\n", " </tr>\n", " <tr>\n", " <th>1901</th>\n", " <td>None</td>\n", " <td>Two gorgeous dogs here. Little waddling dog is a rebel. Refuses to look at camera. Must be a preteen. 5/10 &amp; 8/10 https://t.co/YPfw7oahbD</td>\n", " </tr>\n", " <tr>\n", " <th>189</th>\n", " <td>None</td>\n", " <td>@s8n You tried very hard to portray this good boy as not so good, but you have ultimately failed. His goodness shines through. 666/10</td>\n", " </tr>\n", " <tr>\n", " <th>1068</th>\n", " <td>None</td>\n", " <td>After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ</td>\n", " </tr>\n", " <tr>\n", " <th>493</th>\n", " <td>None</td>\n", " <td>Here's a doggo who has concluded that Christmas is entirely too bright. Requests you tone it down a notch. 11/10 https://t.co/cD967DjnIn</td>\n", " </tr>\n", " <tr>\n", " <th>2089</th>\n", " <td>None</td>\n", " <td>Two obedient dogs here. Left one has extra leg sticking out of its back. They each get 9/10. Would pet both at once https://t.co/RGcNPsmAfY</td>\n", " </tr>\n", " <tr>\n", " <th>1532</th>\n", " <td>None</td>\n", " <td>\"I'm the only one that ever does anything in this household\" 10/10 https://t.co/V8HcVIh4jt</td>\n", " </tr>\n", " <tr>\n", " <th>1248</th>\n", " <td>None</td>\n", " <td>\"Please, no puparazzi\" 11/10 https://t.co/nJIXSPfedK</td>\n", " </tr>\n", " <tr>\n", " <th>1234</th>\n", " <td>None</td>\n", " <td>Please don't send in any more polar bears. We only rate dogs. Thank you... 10/10 https://t.co/83RGhdIQz2</td>\n", " </tr>\n", " <tr>\n", " <th>568</th>\n", " <td>None</td>\n", " <td>RT @ChinoChinako: They're good products, Brent\\n\\nMug holds drinks; hoodie is comfy af. 13/10 \\n\\nPuppy Aika h*cking agrees. @dog_rates https:/…</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " name \\\n", "1928 None \n", "2013 None \n", "1783 None \n", "2067 None \n", "494 None \n", "1408 None \n", "212 None \n", "179 None \n", "2131 None \n", "1935 None \n", "268 None \n", "1901 None \n", "189 None \n", "1068 None \n", "493 None \n", "2089 None \n", "1532 None \n", "1248 None \n", "1234 None \n", "568 None \n", "\n", " text \n", "1928 Herd of wild dogs here. Not sure what they're trying to do. No real goals in life. 3/10 find your purpose puppers https://t.co/t5ih0VrK02 \n", "2013 Exotic underwater dog here. Very shy. Wont return tennis balls I toss him. Never been petted. 5/10 I bet he's soft https://t.co/WH7Nzc5IBA \n", "1783 Endangered triangular pup here. Could be a wizard. Caught mid-laugh. No legs. Just fluff. Probably a wizard. 9/10 https://t.co/GFVIHIod0Z \n", "2067 Neat pup here. Enjoys lettuce. Long af ears. Short lil legs. Hops surprisingly high for dog. 9/10 still very petable https://t.co/HYR611wiA4 \n", "494 We only rate dogs. Please don't send in other things like this very good Christmas tree. Thank you... 13/10 https://t.co/rvSANEsQZJ \n", "1408 ERMAHGERD 12/10 please enjoy https://t.co/7WrAWKdBac \n", "212 RT @eddie_coe98: Thanks @dog_rates completed my laptop. 10/10 would buy again https://t.co/bO0rThDlXI \n", "179 @Marc_IRL pixelated af 12/10 \n", "2131 \"Hi yes this is dog. I can't help with that s- sir please... the manager isn't in right n- well that was rude\"\\n10/10 https://t.co/DuQXATW27f \n", "1935 This pup is sad bc he didn't get to be the toy car. Also he has shitty money management skills. 10/10 still cute tho https://t.co/PiSXXZjDSJ \n", "268 Here we have some incredible doggos for #K9VeteransDay. All brave as h*ck. Salute your dog in solidarity. 14/10 for all https://t.co/SVNMdFqKDL \n", "1901 Two gorgeous dogs here. Little waddling dog is a rebel. Refuses to look at camera. Must be a preteen. 5/10 & 8/10 https://t.co/YPfw7oahbD \n", "189 @s8n You tried very hard to portray this good boy as not so good, but you have ultimately failed. His goodness shines through. 666/10 \n", "1068 After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ \n", "493 Here's a doggo who has concluded that Christmas is entirely too bright. Requests you tone it down a notch. 11/10 https://t.co/cD967DjnIn \n", "2089 Two obedient dogs here. Left one has extra leg sticking out of its back. They each get 9/10. Would pet both at once https://t.co/RGcNPsmAfY \n", "1532 \"I'm the only one that ever does anything in this household\" 10/10 https://t.co/V8HcVIh4jt \n", "1248 \"Please, no puparazzi\" 11/10 https://t.co/nJIXSPfedK \n", "1234 Please don't send in any more polar bears. We only rate dogs. Thank you... 10/10 https://t.co/83RGhdIQz2 \n", "568 RT @ChinoChinako: They're good products, Brent\\n\\nMug holds drinks; hoodie is comfy af. 13/10 \\n\\nPuppy Aika h*cking agrees. @dog_rates https:/… " ] }, "execution_count": 804, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the dog names, as we might have issues there. We get 20 rows for a visual observation\n", "df_name_issues = df_archive_clean.query('name == \"None\"')[['name', 'text']]\n", "df_name_issues.sample(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe that sometimes we have names in the text, so we still have the possibility to extract them. For example, at index 72, we could have \"Martha\", at index 2237, we have \"Oliver\". \n", "The \"None\" here as a string could be confusing. We do not consider this as an issue. Basically we just want to be able to distinguish when we do have a name, from when we don't. \n", "#### Quality Issue 5 : WeRateDgos twitter archive - We have \"None\" as name even if, sometimes, there is an available name in the text. So few names are missing and could be retrieved." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe \"None\" values in the dog stages \"doggo, floofer, pupper, puppo\" columns." ] }, { "cell_type": "code", "execution_count": 805, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1976, 5)" ] }, "execution_count": 805, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get all the rows where the dog stage has not been identified, meaning set to \"None\"\n", "df_dogtionary_issue = df_archive_clean.query('doggo == \"None\" and floofer == \"None\" and pupper == \"None\" and puppo == \"None\"')[['text', 'doggo', 'floofer', 'pupper', 'puppo']]\n", "df_dogtionary_issue.shape" ] }, { "cell_type": "code", "execution_count": 806, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 1969\n", "True 7 \n", "Name: text, dtype: int64" ] }, "execution_count": 806, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have any puppo in the associated text content ? meaning not identified ?\n", "df_dogtionary_issue['text'].apply(lambda x: ('puppo' in x)).value_counts()" ] }, { "cell_type": "code", "execution_count": 807, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 1966\n", "True 10 \n", "Name: text, dtype: int64" ] }, "execution_count": 807, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have any doggo in the associated text content ? meaning not identified ?\n", "df_dogtionary_issue['text'].apply(lambda x: ('doggo' in x)).value_counts()" ] }, { "cell_type": "code", "execution_count": 808, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 1976\n", "Name: text, dtype: int64" ] }, "execution_count": 808, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have any floofer in the associated text content ? meaning not identified ?\n", "df_dogtionary_issue['text'].apply(lambda x: ('floofer' in x)).value_counts()" ] }, { "cell_type": "code", "execution_count": 809, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 1952\n", "True 24 \n", "Name: text, dtype: int64" ] }, "execution_count": 809, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have any pupper in the associated text content ? meaning not identified ?\n", "df_dogtionary_issue['text'].apply(lambda x: ('pupper' in x)).value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a conclusion, we have a quality issue. Clearly we have the possibility to have several dog stages within the same text. So the stage extrated might not be the right one, sometimes. We do not consider this as an issue, because of the potential low number of rows impacted. \n", "#### Quality Issue 6 : WeRateDgos twitter archive - We have missing dog stages, meaning stages not properly extracted. This is the case for \"puppo\", \"doggo\" and \"pupper\"." ] }, { "cell_type": "code", "execution_count": 810, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 1658\n", "True 639 \n", "Name: expanded_urls, dtype: int64" ] }, "execution_count": 810, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the duplicated expanded_urls we observed previously\n", "df_archive_clean['expanded_urls'].str.contains(',').value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality Issue 7: WeRateDgos twitter archive - \"expanded_urls\" column contains duplicated urls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we check the column types. " ] }, { "cell_type": "code", "execution_count": 811, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "tweet_id int64 \n", "in_reply_to_status_id float64\n", "in_reply_to_user_id float64\n", "timestamp object \n", "source object \n", "text object \n", "retweeted_status_id float64\n", "retweeted_status_user_id float64\n", "retweeted_status_timestamp object \n", "expanded_urls object \n", "rating_numerator int64 \n", "rating_denominator int64 \n", "name object \n", "doggo object \n", "floofer object \n", "pupper object \n", "puppo object \n", "dtype: object" ] }, "execution_count": 811, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Examine the columns type\n", "df_archive_clean.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the previous investigations, we already know that \"source\", \"text\", \"name\" and dog stages are all string typed." ] }, { "cell_type": "code", "execution_count": 812, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 812, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# So we get more detailed visibility on timestamp. \n", "type(df_archive_clean.timestamp[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality Issue 8: WeRateDgos twitter archive - timestamp is using string type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From observation: the dog stages have been spread as differents columns. This is a tidiness issue.\n", "#### Tidiness Issue 1 : WeRateDgos twitter archive - Dog stages have been spread as columns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='tweetimagepreductions'></a>\n", "#### 2. The tweet image predictions" ] }, { "cell_type": "code", "execution_count": 813, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>jpg_url</th>\n", " <th>img_num</th>\n", " <th>p1</th>\n", " <th>p1_conf</th>\n", " <th>p1_dog</th>\n", " <th>p2</th>\n", " <th>p2_conf</th>\n", " <th>p2_dog</th>\n", " <th>p3</th>\n", " <th>p3_conf</th>\n", " <th>p3_dog</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>666020888022790149</td>\n", " <td>https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg</td>\n", " <td>1</td>\n", " <td>Welsh_springer_spaniel</td>\n", " <td>0.465074</td>\n", " <td>True</td>\n", " <td>collie</td>\n", " <td>0.156665</td>\n", " <td>True</td>\n", " <td>Shetland_sheepdog</td>\n", " <td>0.061428</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>666029285002620928</td>\n", " <td>https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg</td>\n", " <td>1</td>\n", " <td>redbone</td>\n", " <td>0.506826</td>\n", " <td>True</td>\n", " <td>miniature_pinscher</td>\n", " <td>0.074192</td>\n", " <td>True</td>\n", " <td>Rhodesian_ridgeback</td>\n", " <td>0.072010</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>666033412701032449</td>\n", " <td>https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg</td>\n", " <td>1</td>\n", " <td>German_shepherd</td>\n", " <td>0.596461</td>\n", " <td>True</td>\n", " <td>malinois</td>\n", " <td>0.138584</td>\n", " <td>True</td>\n", " <td>bloodhound</td>\n", " <td>0.116197</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>666044226329800704</td>\n", " <td>https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg</td>\n", " <td>1</td>\n", " <td>Rhodesian_ridgeback</td>\n", " <td>0.408143</td>\n", " <td>True</td>\n", " <td>redbone</td>\n", " <td>0.360687</td>\n", " <td>True</td>\n", " <td>miniature_pinscher</td>\n", " <td>0.222752</td>\n", " <td>True</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>666049248165822465</td>\n", " <td>https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg</td>\n", " <td>1</td>\n", " <td>miniature_pinscher</td>\n", " <td>0.560311</td>\n", " <td>True</td>\n", " <td>Rottweiler</td>\n", " <td>0.243682</td>\n", " <td>True</td>\n", " <td>Doberman</td>\n", " <td>0.154629</td>\n", " <td>True</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id jpg_url \\\n", "0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg \n", "1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg \n", "2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg \n", "3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg \n", "4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg \n", "\n", " img_num p1 p1_conf p1_dog p2 \\\n", "0 1 Welsh_springer_spaniel 0.465074 True collie \n", "1 1 redbone 0.506826 True miniature_pinscher \n", "2 1 German_shepherd 0.596461 True malinois \n", "3 1 Rhodesian_ridgeback 0.408143 True redbone \n", "4 1 miniature_pinscher 0.560311 True Rottweiler \n", "\n", " p2_conf p2_dog p3 p3_conf p3_dog \n", "0 0.156665 True Shetland_sheepdog 0.061428 True \n", "1 0.074192 True Rhodesian_ridgeback 0.072010 True \n", "2 0.138584 True bloodhound 0.116197 True \n", "3 0.360687 True miniature_pinscher 0.222752 True \n", "4 0.243682 True Doberman 0.154629 True " ] }, "execution_count": 813, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view on the dataframe\n", "df_image_clean.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observations: we have several predictions and their confidences columns. A column to inform if the prediction is a dog or not is not necessary assuming that we only need dog's prediction here. \n", "#### Quality issue 11 : Tweets images predictions - predictions and the associated confidences are spread over several columns. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observations: we have tweets informations here also. So tweets informations are spread across several files. This is a tidiness issue.\n", "#### Tidiness Issue 2 : Tweets information spreaded over several files and dataframes. " ] }, { "cell_type": "code", "execution_count": 814, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2075, 12)" ] }, "execution_count": 814, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the dataset size\n", "df_image_clean.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Smaller than the twitter archive dataset. " ] }, { "cell_type": "code", "execution_count": 815, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 815, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the duplicated rows\n", "sum(df_image_clean.duplicated())" ] }, { "cell_type": "code", "execution_count": 816, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tweet_id 0\n", "jpg_url 0\n", "img_num 0\n", "p1 0\n", "p1_conf 0\n", "p1_dog 0\n", "p2 0\n", "p2_conf 0\n", "p2_dog 0\n", "p3 0\n", "p3_conf 0\n", "p3_dog 0\n", "dtype: int64" ] }, "execution_count": 816, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have missing data somewhere ?\n", "df_image_clean.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 817, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tweet_id int64 \n", "jpg_url object \n", "img_num int64 \n", "p1 object \n", "p1_conf float64\n", "p1_dog bool \n", "p2 object \n", "p2_conf float64\n", "p2_dog bool \n", "p3 object \n", "p3_conf float64\n", "p3_dog bool \n", "dtype: object" ] }, "execution_count": 817, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view on the types\n", "df_image_clean.dtypes" ] }, { "cell_type": "code", "execution_count": 818, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True 1532\n", " False 543 \n", " Name: p1_dog, dtype: int64, True 1553\n", " False 522 \n", " Name: p2_dog, dtype: int64, True 1553\n", " False 522 \n", " Name: p2_dog, dtype: int64)" ] }, "execution_count": 818, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the distinct values for p1_dog, p2_dog and p3_dog\n", "df_image_clean.p1_dog.value_counts(), df_image_clean.p2_dog.value_counts(), df_image_clean.p2_dog.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality Issue 9 : Tweets images predictions - We have rows which are not dogs images predictions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='datafromtwitter'></a>\n", "#### 3. Data retrieved via twitter API" ] }, { "cell_type": "code", "execution_count": 819, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>created_at</th>\n", " <th>retweet_count</th>\n", " <th>favorite_count</th>\n", " <th>tweet_full_text</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>892420643555336193</td>\n", " <td>Tue Aug 01 16:23:56 +0000 2017</td>\n", " <td>8204</td>\n", " <td>37636</td>\n", " <td>This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>892177421306343426</td>\n", " <td>Tue Aug 01 00:17:27 +0000 2017</td>\n", " <td>6071</td>\n", " <td>32337</td>\n", " <td>This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>891815181378084864</td>\n", " <td>Mon Jul 31 00:18:03 +0000 2017</td>\n", " <td>4011</td>\n", " <td>24363</td>\n", " <td>This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>891689557279858688</td>\n", " <td>Sun Jul 30 15:58:51 +0000 2017</td>\n", " <td>8366</td>\n", " <td>40959</td>\n", " <td>This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>891327558926688256</td>\n", " <td>Sat Jul 29 16:00:24 +0000 2017</td>\n", " <td>9063</td>\n", " <td>39161</td>\n", " <td>This is Franklin. He would like you to stop calling him \"cute.\" He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id created_at retweet_count \\\n", "0 892420643555336193 Tue Aug 01 16:23:56 +0000 2017 8204 \n", "1 892177421306343426 Tue Aug 01 00:17:27 +0000 2017 6071 \n", "2 891815181378084864 Mon Jul 31 00:18:03 +0000 2017 4011 \n", "3 891689557279858688 Sun Jul 30 15:58:51 +0000 2017 8366 \n", "4 891327558926688256 Sat Jul 29 16:00:24 +0000 2017 9063 \n", "\n", " favorite_count \\\n", "0 37636 \n", "1 32337 \n", "2 24363 \n", "3 40959 \n", "4 39161 \n", "\n", " tweet_full_text \n", "0 This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU \n", "1 This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV \n", "2 This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB \n", "3 This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ \n", "4 This is Franklin. He would like you to stop calling him \"cute.\" He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f " ] }, "execution_count": 819, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view in the information gathered vi twitter API\n", "df_twitter_clean.head()" ] }, { "cell_type": "code", "execution_count": 820, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2337, 5)" ] }, "execution_count": 820, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the size\n", "df_twitter_clean.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is 19 rows smaller than `df_archive_clean` which is the WeRateDogs twitter archive. " ] }, { "cell_type": "code", "execution_count": 821, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 821, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the duplicated rows\n", "sum(df_twitter_clean.duplicated())" ] }, { "cell_type": "code", "execution_count": 822, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "tweet_id 0\n", "created_at 0\n", "retweet_count 0\n", "favorite_count 0\n", "tweet_full_text 0\n", "dtype: int64" ] }, "execution_count": 822, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have missing data somewhere ? - normally no, considering the way we collected the data\n", "df_twitter_clean.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 823, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tweet_id int64 \n", "created_at object\n", "retweet_count int64 \n", "favorite_count int64 \n", "tweet_full_text object\n", "dtype: object" ] }, "execution_count": 823, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get a view on the types\n", "df_twitter_clean.dtypes" ] }, { "cell_type": "code", "execution_count": 824, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 824, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inspect the type for created_at which should be timestamp\n", "type(df_twitter_clean.created_at[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality Issue 10 : Data retrieved via twitter API - \"created_at\" column type is not timestamp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='assesssynthesis'></a>\n", "#### 4. Synthesis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Tidiness issues identified :\n", "1. WeRateDgos twitter archive - Dog stages have been spread as columns.\n", "2. Tweets information spreaded over several files and dataframes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality issues identified :\n", "1. The WeRateDgos twitter archive enhanced contains retweets, which could not be part or our study\n", "2. The WeRateDgos twitter archive enhanced has 59 missing value from the \"expanded_urls\" column\n", "3. REMOVED!\n", "4. WeRateDgos twitter archive - in some cases, the relevant rating information, consequently rating_numerator and rating_denominator, have not been extracted.\n", "5. WeRateDgos twitter archive - We have \"None\" as name even if, sometimes, there is an available name in the text. So few names are missing and could be retrieved.\n", "6. WeRateDgos twitter archive - We have missing dog stages, meaning stages not properly extracted. This is the case for \"puppo\", \"doggo\" and \"pupper\".\n", "7. WeRateDgos twitter archive - \"expanded_urls\" column contains duplicated urls.\n", "8. WeRateDgos twitter archive - timestamp is using string type.\n", "9. Tweets images predictions - We have rows which are not dogs images predictions.\n", "10. Data retrieved via twitter API - \"created_at\" column type is not timestamp.\n", "11. Tweets images predictions - predictions and the associated confidences are spread over several columns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='cleaning'></a>\n", "### Cleaning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Approach\n", "1. First, we address the missing data and completness issues (Quality issues 1, 2, 4, 5, 6, 9)\n", "2. Then we resolve the tidiness problems we identified.\n", "3. Finally we will correct the quality issues (Quality issues 7, 8, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 1 : The WeRateDgos twitter archive enhanced contains retweets, which could not be part or our study" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We found out previously that we have lines with values in retweeted_status_id, retweeted_status_user_id and retweeted_status_timestamp columns. We checked their meaning from the doc here : https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html \n", "The requirements we have are regarding retweets only. For simplification, we make the choice to remove the replys also." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define : we remove the rows with existing values in \"retweeted_status_id\" and \"in_reply_to_status_id\"" ] }, { "cell_type": "code", "execution_count": 825, "metadata": {}, "outputs": [], "source": [ "# Code : It is more easy to keep the rows with NaN. \n", "# Use value != value as an isnan check, \n", "# my source (from internet search) : https://stackoverflow.com/questions/26535563/querying-for-nan-and-other-names-in-pandas\n", "df_archive_clean = df_archive_clean.query('retweeted_status_id != retweeted_status_id and in_reply_to_status_id != in_reply_to_status_id')" ] }, { "cell_type": "code", "execution_count": 826, "metadata": {}, "outputs": [], "source": [ "# Test : the number of null values is the number of rows\n", "assert df_archive_clean.shape[0] == df_archive_clean.retweeted_status_id.isnull().sum()\n", "assert df_archive_clean.shape[0] == df_archive_clean.in_reply_to_status_id.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 827, "metadata": {}, "outputs": [], "source": [ "# Closure : the following columns are not needed anymore : in_reply_to_status_id, in_reply_to_user_id\n", "# retweeted_status_id, retweeted_status_user_id and retweeted_status_timestamp\n", "df_archive_clean.drop('in_reply_to_status_id', axis=1, inplace=True)\n", "df_archive_clean.drop('in_reply_to_user_id', axis=1, inplace=True)\n", "df_archive_clean.drop('retweeted_status_id', axis=1, inplace=True)\n", "df_archive_clean.drop('retweeted_status_user_id', axis=1, inplace=True)\n", "df_archive_clean.drop('retweeted_status_timestamp', axis=1, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 2 : The WeRateDgos twitter archive enhanced has 59 missing value from the \"expanded_urls\" column" ] }, { "cell_type": "code", "execution_count": 828, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 828, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We still have missing value\n", "df_archive_clean.expanded_urls.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 829, "metadata": {}, "outputs": [], "source": [ "# Define and Code : we drop the rows with missing values\n", "df_archive_clean.dropna(inplace=True)" ] }, { "cell_type": "code", "execution_count": 830, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 830, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test : do we still have missing values ?\n", "df_archive_clean.isnull().sum().any()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 4 : WeRateDgos twitter archive - in some cases, the relevant rating information, consequently rating_numerator and rating_denominator, have not been extracted." ] }, { "cell_type": "code", "execution_count": 831, "metadata": {}, "outputs": [], "source": [ "# Define : for all the 'text' cells, we find all the ratings and always take the last one. \n", "# Then we split in 2 parts to get the numerator and the denominator. \n", "\n", "# for the findall syntax: https://docs.python.org/3/library/re.html#re.findall\n", "# for the regex, we use https://regex101.com/ with few examples\n", "\n", "# Code : Extract numerator rating or return nothing, meaning we do not change the current rating\n", "# The new regular expression has been provided by the reviewer\n", "def extract_numerator_rating(row):\n", " match = re.findall('((?:\\d+\\.)?\\d+)\\/(\\d+)', row['text'])\n", " if match:\n", " return float(match[-1][0])\n", " else:\n", " return\n", "\n", "# Extract denominator rating or return nothing, meaning we do not change the current rating\n", "def extract_denominator_rating(row):\n", " match = re.findall('((?:\\d+\\.)?\\d+)\\/(\\d+)', row['text'])\n", " if match:\n", " return float(match[-1][1])\n", " else:\n", " return" ] }, { "cell_type": "code", "execution_count": 832, "metadata": {}, "outputs": [], "source": [ "# Code : apply the previous fonctions to all the rows in the dataframe\n", "# Source : looking for \"function every row pandas\" on a search engine leads to \n", "# http://jonathansoma.com/lede/foundations/classes/pandas%20columns%20and%20functions/apply-a-function-to-every-row-in-a-pandas-dataframe/\n", "df_archive_clean['rating_numerator'] = df_archive_clean.apply(extract_numerator_rating, axis=1)\n", "df_archive_clean['rating_denominator'] = df_archive_clean.apply(extract_denominator_rating, axis=1)" ] }, { "cell_type": "code", "execution_count": 833, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>rating_numerator</th>\n", " <th>rating_denominator</th>\n", " <th>text</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>433</th>\n", " <td>820690176645140481</td>\n", " <td>84.0</td>\n", " <td>70.0</td>\n", " <td>The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd</td>\n", " </tr>\n", " <tr>\n", " <th>516</th>\n", " <td>810984652412424192</td>\n", " <td>24.0</td>\n", " <td>7.0</td>\n", " <td>Meet Sam. She smiles 24/7 &amp; secretly aspires to be a reindeer. \\nKeep Sam smiling by clicking and sharing this link:\\nhttps://t.co/98tB8y7y7t https://t.co/LouL5vdvxx</td>\n", " </tr>\n", " <tr>\n", " <th>902</th>\n", " <td>758467244762497024</td>\n", " <td>165.0</td>\n", " <td>150.0</td>\n", " <td>Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE</td>\n", " </tr>\n", " <tr>\n", " <th>1120</th>\n", " <td>731156023742988288</td>\n", " <td>204.0</td>\n", " <td>170.0</td>\n", " <td>Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv</td>\n", " </tr>\n", " <tr>\n", " <th>1228</th>\n", " <td>713900603437621249</td>\n", " <td>99.0</td>\n", " <td>90.0</td>\n", " <td>Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1</td>\n", " </tr>\n", " <tr>\n", " <th>1254</th>\n", " <td>710658690886586372</td>\n", " <td>80.0</td>\n", " <td>80.0</td>\n", " <td>Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12</td>\n", " </tr>\n", " <tr>\n", " <th>1274</th>\n", " <td>709198395643068416</td>\n", " <td>45.0</td>\n", " <td>50.0</td>\n", " <td>From left to right:\\nCletus, Jerome, Alejandro, Burp, &amp; Titson\\nNone know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK</td>\n", " </tr>\n", " <tr>\n", " <th>1351</th>\n", " <td>704054845121142784</td>\n", " <td>60.0</td>\n", " <td>50.0</td>\n", " <td>Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa</td>\n", " </tr>\n", " <tr>\n", " <th>1433</th>\n", " <td>697463031882764288</td>\n", " <td>44.0</td>\n", " <td>40.0</td>\n", " <td>Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ</td>\n", " </tr>\n", " <tr>\n", " <th>1635</th>\n", " <td>684222868335505415</td>\n", " <td>121.0</td>\n", " <td>110.0</td>\n", " <td>Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55</td>\n", " </tr>\n", " <tr>\n", " <th>1779</th>\n", " <td>677716515794329600</td>\n", " <td>144.0</td>\n", " <td>120.0</td>\n", " <td>IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq</td>\n", " </tr>\n", " <tr>\n", " <th>1843</th>\n", " <td>675853064436391936</td>\n", " <td>88.0</td>\n", " <td>80.0</td>\n", " <td>Here we have an entire platoon of puppers. Total score: 88/80 would pet all at once https://t.co/y93p6FLvVw</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id rating_numerator rating_denominator \\\n", "433 820690176645140481 84.0 70.0 \n", "516 810984652412424192 24.0 7.0 \n", "902 758467244762497024 165.0 150.0 \n", "1120 731156023742988288 204.0 170.0 \n", "1228 713900603437621249 99.0 90.0 \n", "1254 710658690886586372 80.0 80.0 \n", "1274 709198395643068416 45.0 50.0 \n", "1351 704054845121142784 60.0 50.0 \n", "1433 697463031882764288 44.0 40.0 \n", "1635 684222868335505415 121.0 110.0 \n", "1779 677716515794329600 144.0 120.0 \n", "1843 675853064436391936 88.0 80.0 \n", "\n", " text \n", "433 The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd \n", "516 Meet Sam. She smiles 24/7 & secretly aspires to be a reindeer. \\nKeep Sam smiling by clicking and sharing this link:\\nhttps://t.co/98tB8y7y7t https://t.co/LouL5vdvxx \n", "902 Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE \n", "1120 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv \n", "1228 Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1 \n", "1254 Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12 \n", "1274 From left to right:\\nCletus, Jerome, Alejandro, Burp, & Titson\\nNone know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK \n", "1351 Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa \n", "1433 Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ \n", "1635 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 \n", "1779 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq \n", "1843 Here we have an entire platoon of puppers. Total score: 88/80 would pet all at once https://t.co/y93p6FLvVw " ] }, "execution_count": 833, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test: Observe all the rows with denominator != 10\n", "df_archive_clean.query('rating_denominator != 10')[['tweet_id', 'rating_numerator', 'rating_denominator', 'text']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have one line with a wrong value 24/7. As there is no rating within the associated text, we simply delete the line." ] }, { "cell_type": "code", "execution_count": 834, "metadata": {}, "outputs": [], "source": [ "# Delete the line with 24/7 as rating\n", "df_archive_clean.drop([516], inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 5 : WeRateDgos twitter archive - We have \"None\" as name even if, sometimes, there is an available name in the text. So few names are missing and could be retrieved." ] }, { "cell_type": "code", "execution_count": 835, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>name</th>\n", " <th>text</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>5</th>\n", " <td>891087950875897856</td>\n", " <td>None</td>\n", " <td>Here we have a majestic great white breaching off South Africa's coast. Absolutely h*ckin breathtaking. 13/10 (IG: tucker_marlo) #BarkWeek https://t.co/kQ04fDDRmh</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>890729181411237888</td>\n", " <td>None</td>\n", " <td>When you watch your owner call another dog a good boy but then they turn back to you and say you're a great boy. 13/10 https://t.co/v0nONBcwxq</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>889665388333682689</td>\n", " <td>None</td>\n", " <td>Here's a puppo that seems to be on the fence about something haha no but seriously someone help her. 13/10 https://t.co/BxvuXk0UCm</td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>887343217045368832</td>\n", " <td>None</td>\n", " <td>You may not have known you needed to see this today. 13/10 please enjoy (IG: emmylouroo) https://t.co/WZqNqygEyV</td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>887101392804085760</td>\n", " <td>None</td>\n", " <td>This... is a Jubilant Antarctic House Bear. We only rate dogs. Please only send dogs. Thank you... 12/10 would suffocate in floof https://t.co/4Ad1jzJSdp</td>\n", " </tr>\n", " <tr>\n", " <th>35</th>\n", " <td>885518971528720385</td>\n", " <td>None</td>\n", " <td>I have a new hero and his name is Howard. 14/10 https://t.co/gzLHboL7Sk</td>\n", " </tr>\n", " <tr>\n", " <th>37</th>\n", " <td>885167619883638784</td>\n", " <td>None</td>\n", " <td>Here we have a corgi undercover as a malamute. Pawbably doing important investigative work. Zero control over tongue happenings. 13/10 https://t.co/44ItaMubBf</td>\n", " </tr>\n", " <tr>\n", " <th>41</th>\n", " <td>884441805382717440</td>\n", " <td>None</td>\n", " <td>I present to you, Pup in Hat. Pup in Hat is great for all occasions. Extremely versatile. Compact as h*ck. 14/10 (IG: itselizabethgales) https://t.co/vvBOcC2VdC</td>\n", " </tr>\n", " <tr>\n", " <th>42</th>\n", " <td>884247878851493888</td>\n", " <td>None</td>\n", " <td>OMG HE DIDN'T MEAN TO HE WAS JUST TRYING A LITTLE BARKOUR HE'S SUPER SORRY 13/10 WOULD FORGIVE IMMEDIATE https://t.co/uF3pQ8Wubj</td>\n", " </tr>\n", " <tr>\n", " <th>47</th>\n", " <td>883117836046086144</td>\n", " <td>None</td>\n", " <td>Please only send dogs. We don't rate mechanics, no matter how h*ckin good. Thank you... 13/10 would sneak a pat https://t.co/Se5fZ9wp5E</td>\n", " </tr>\n", " <tr>\n", " <th>59</th>\n", " <td>880872448815771648</td>\n", " <td>None</td>\n", " <td>Ugh not again. We only rate dogs. Please don't send in well-dressed floppy-tongued street penguins. Dogs only please. Thank you... 12/10 https://t.co/WiAMbTkDPf</td>\n", " </tr>\n", " <tr>\n", " <th>62</th>\n", " <td>880095782870896641</td>\n", " <td>None</td>\n", " <td>Please don't send in photos without dogs in them. We're not @porch_rates. Insubordinate and churlish. Pretty good porch tho 11/10 https://t.co/HauE8M3Bu4</td>\n", " </tr>\n", " <tr>\n", " <th>72</th>\n", " <td>878604707211726852</td>\n", " <td>None</td>\n", " <td>Martha is stunning how h*ckin dare you. 13/10 https://t.co/9uABQXgjwa</td>\n", " </tr>\n", " <tr>\n", " <th>83</th>\n", " <td>876537666061221889</td>\n", " <td>None</td>\n", " <td>I can say with the pupmost confidence that the doggos who assisted with this search are heroic as h*ck. 14/10 for all https://t.co/8yoc1CNTsu</td>\n", " </tr>\n", " <tr>\n", " <th>88</th>\n", " <td>875097192612077568</td>\n", " <td>None</td>\n", " <td>You'll get your package when that precious man is done appreciating the pups. 13/10 for everyone https://t.co/PFp4MghzBW</td>\n", " </tr>\n", " <tr>\n", " <th>89</th>\n", " <td>875021211251597312</td>\n", " <td>None</td>\n", " <td>Guys please stop sending pictures without any dogs in th- oh never mind hello excuse me sir. 12/10 stealthy as h*ck https://t.co/brCQoqc8AW</td>\n", " </tr>\n", " <tr>\n", " <th>93</th>\n", " <td>874057562936811520</td>\n", " <td>None</td>\n", " <td>I can't believe this keeps happening. This, is a birb taking a bath. We only rate dogs. Please only send dogs. Thank you... 12/10 https://t.co/pwY9PQhtP2</td>\n", " </tr>\n", " <tr>\n", " <th>96</th>\n", " <td>873580283840344065</td>\n", " <td>None</td>\n", " <td>We usually don't rate Deck-bound Saskatoon Black Bears, but this one is h*ckin flawless. Sneaky tongue slip too. 13/10 would hug firmly https://t.co/mNuMH9400n</td>\n", " </tr>\n", " <tr>\n", " <th>99</th>\n", " <td>872967104147763200</td>\n", " <td>None</td>\n", " <td>Here's a very large dog. He has a date later. Politely asked this water person to check if his breath is bad. 12/10 good to go doggo https://t.co/EMYIdoblMR</td>\n", " </tr>\n", " <tr>\n", " <th>100</th>\n", " <td>872820683541237760</td>\n", " <td>None</td>\n", " <td>Here are my favorite #dogsatpollingstations \\nMost voted for a more consistent walking schedule and to increase daily pats tenfold. All 13/10 https://t.co/17FVMl4VZ5</td>\n", " </tr>\n", " <tr>\n", " <th>103</th>\n", " <td>872486979161796608</td>\n", " <td>None</td>\n", " <td>We. Only. Rate. Dogs. Do not send in other things like this fluffy floor shark clearly ready to attack. Get it together guys... 12/10 https://t.co/BZHiKx3FpQ</td>\n", " </tr>\n", " <tr>\n", " <th>110</th>\n", " <td>871102520638267392</td>\n", " <td>None</td>\n", " <td>Never doubt a doggo 14/10 https://t.co/AbBLh2FZCH</td>\n", " </tr>\n", " <tr>\n", " <th>112</th>\n", " <td>870804317367881728</td>\n", " <td>None</td>\n", " <td>Real funny guys. Sending in a pic without a dog in it. Hilarious. We'll rate the rug tho because it's giving off a very good vibe. 11/10 https://t.co/GCD1JccCyi</td>\n", " </tr>\n", " <tr>\n", " <th>125</th>\n", " <td>868622495443632128</td>\n", " <td>None</td>\n", " <td>Here's a h*ckin peaceful boy. Unbothered by the comings and goings. 13/10 please reveal your wise ways https://t.co/yeaH8Ej5eM</td>\n", " </tr>\n", " <tr>\n", " <th>127</th>\n", " <td>867900495410671616</td>\n", " <td>None</td>\n", " <td>Unbelievable. We only rate dogs. Please don't send in non-canines like the \"I\" from Pixar's opening credits. Thank you... 12/10 https://t.co/JMhDNv5wXZ</td>\n", " </tr>\n", " <tr>\n", " <th>131</th>\n", " <td>867051520902168576</td>\n", " <td>None</td>\n", " <td>Oh my this spooked me up. We only rate dogs, not happy ghosts. Please send dogs only. It's a very simple premise. Thank you... 13/10 https://t.co/M5Rz0R8SIQ</td>\n", " </tr>\n", " <tr>\n", " <th>133</th>\n", " <td>866720684873056260</td>\n", " <td>None</td>\n", " <td>He was providing for his family 13/10 how dare you https://t.co/Q8mVwWN3f4</td>\n", " </tr>\n", " <tr>\n", " <th>141</th>\n", " <td>864873206498414592</td>\n", " <td>None</td>\n", " <td>We only rate dogs. Please don't send in Jesus. We're trying to remain professional and legitimate. Thank you... 14/10 https://t.co/wr3xsjeCIR</td>\n", " </tr>\n", " <tr>\n", " <th>154</th>\n", " <td>862096992088072192</td>\n", " <td>None</td>\n", " <td>We only rate dogs. Please don't send perfectly toasted marshmallows attempting to drive. Thank you... 13/10 https://t.co/nvZyyrp0kd</td>\n", " </tr>\n", " <tr>\n", " <th>157</th>\n", " <td>861288531465048066</td>\n", " <td>None</td>\n", " <td>HI. MY. NAME. IS. BOOMER. AND. I. WANT. TO. SAY. IT'S. H*CKIN. RIDICULOUS. THAT. DOGS. CAN'T VOTE. ABSOLUTE. CODSWALLUP. THANK. YOU. 13/10 https://t.co/SqKJPwbQ2g</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>2288</th>\n", " <td>667176164155375616</td>\n", " <td>None</td>\n", " <td>These are strange dogs. All have toupees. Long neck for dogs. In a shed of sorts? Work in groups? 4/10 still petable https://t.co/PZxSarAfSN</td>\n", " </tr>\n", " <tr>\n", " <th>2294</th>\n", " <td>667138269671505920</td>\n", " <td>None</td>\n", " <td>Extremely intelligent dog here. Has learned to walk like human. Even has his own dog. Very impressive 10/10 https://t.co/0DvHAMdA4V</td>\n", " </tr>\n", " <tr>\n", " <th>2299</th>\n", " <td>667065535570550784</td>\n", " <td>None</td>\n", " <td>Here we have a Hufflepuff. Loves vest. Eyes wide af. Flaccid tail. Matches carpet. Always a little blurry. 8/10 https://t.co/7JdgVqDnvR</td>\n", " </tr>\n", " <tr>\n", " <th>2301</th>\n", " <td>667044094246576128</td>\n", " <td>None</td>\n", " <td>12/10 gimme now https://t.co/QZAnwgnOMB</td>\n", " </tr>\n", " <tr>\n", " <th>2305</th>\n", " <td>666837028449972224</td>\n", " <td>None</td>\n", " <td>My goodness. Very rare dog here. Large. Tail dangerous. Kinda fat. Only eats leaves. Doesn't come when called 3/10 https://t.co/xYGdBrMS9h</td>\n", " </tr>\n", " <tr>\n", " <th>2306</th>\n", " <td>666835007768551424</td>\n", " <td>None</td>\n", " <td>These are Peruvian Feldspars. Their names are Cupit and Prencer. Both resemble Rand Paul. Sick outfits 10/10 &amp; 10/10 https://t.co/ZnEMHBsAs1</td>\n", " </tr>\n", " <tr>\n", " <th>2307</th>\n", " <td>666826780179869698</td>\n", " <td>None</td>\n", " <td>12/10 simply brilliant pup https://t.co/V6ZzG45zzG</td>\n", " </tr>\n", " <tr>\n", " <th>2310</th>\n", " <td>666786068205871104</td>\n", " <td>None</td>\n", " <td>Unfamiliar with this breed. Ears pointy af. Won't let go of seashell. Won't eat kibble. Not very fast. Bad dog 2/10 https://t.co/EIn5kElY1S</td>\n", " </tr>\n", " <tr>\n", " <th>2316</th>\n", " <td>666649482315059201</td>\n", " <td>None</td>\n", " <td>Cool dog. Enjoys couch. Low monotone bark. Very nice kicks. Pisses milk (must be rare). Can't go down stairs. 4/10 https://t.co/vXMKrJC81s</td>\n", " </tr>\n", " <tr>\n", " <th>2320</th>\n", " <td>666437273139982337</td>\n", " <td>None</td>\n", " <td>Here we see a lone northeastern Cumberbatch. Half ladybug. Only builds with bricks. Very confident with body. 7/10 https://t.co/7LtjBS0GPK</td>\n", " </tr>\n", " <tr>\n", " <th>2321</th>\n", " <td>666435652385423360</td>\n", " <td>None</td>\n", " <td>\"Can you behave? You're ruining my wedding day\"\\nDOG: idgaf this flashlight tastes good as hell\\n\\n10/10 https://t.co/GlFZPzqcEU</td>\n", " </tr>\n", " <tr>\n", " <th>2322</th>\n", " <td>666430724426358785</td>\n", " <td>None</td>\n", " <td>Oh boy what a pup! Sunglasses take this one to the next level. Weirdly folds front legs. Pretty big. 6/10 https://t.co/yECbFrSArM</td>\n", " </tr>\n", " <tr>\n", " <th>2323</th>\n", " <td>666428276349472768</td>\n", " <td>None</td>\n", " <td>Here we have an Austrian Pulitzer. Collectors edition. Levitates (?). 7/10 would garden with https://t.co/NMQq6HIglK</td>\n", " </tr>\n", " <tr>\n", " <th>2324</th>\n", " <td>666421158376562688</td>\n", " <td>None</td>\n", " <td>*internally screaming* 12/10 https://t.co/YMcrXC2Y6R</td>\n", " </tr>\n", " <tr>\n", " <th>2328</th>\n", " <td>666396247373291520</td>\n", " <td>None</td>\n", " <td>Oh goodness. A super rare northeast Qdoba kangaroo mix. Massive feet. No pouch (disappointing). Seems alert. 9/10 https://t.co/Dc7b0E8qFE</td>\n", " </tr>\n", " <tr>\n", " <th>2329</th>\n", " <td>666373753744588802</td>\n", " <td>None</td>\n", " <td>Those are sunglasses and a jean jacket. 11/10 dog cool af https://t.co/uHXrPkUEyl</td>\n", " </tr>\n", " <tr>\n", " <th>2330</th>\n", " <td>666362758909284353</td>\n", " <td>None</td>\n", " <td>Unique dog here. Very small. Lives in container of Frosted Flakes (?). Short legs. Must be rare 6/10 would still pet https://t.co/XMD9CwjEnM</td>\n", " </tr>\n", " <tr>\n", " <th>2331</th>\n", " <td>666353288456101888</td>\n", " <td>None</td>\n", " <td>Here we have a mixed Asiago from the Galápagos Islands. Only one ear working. Big fan of marijuana carpet. 8/10 https://t.co/tltQ5w9aUO</td>\n", " </tr>\n", " <tr>\n", " <th>2332</th>\n", " <td>666345417576210432</td>\n", " <td>None</td>\n", " <td>Look at this jokester thinking seat belt laws don't apply to him. Great tongue tho 10/10 https://t.co/VFKG1vxGjB</td>\n", " </tr>\n", " <tr>\n", " <th>2336</th>\n", " <td>666273097616637952</td>\n", " <td>None</td>\n", " <td>Can take selfies 11/10 https://t.co/ws2AMaNwPW</td>\n", " </tr>\n", " <tr>\n", " <th>2337</th>\n", " <td>666268910803644416</td>\n", " <td>None</td>\n", " <td>Very concerned about fellow dog trapped in computer. 10/10 https://t.co/0yxApIikpk</td>\n", " </tr>\n", " <tr>\n", " <th>2338</th>\n", " <td>666104133288665088</td>\n", " <td>None</td>\n", " <td>Not familiar with this breed. No tail (weird). Only 2 legs. Doesn't bark. Surprisingly quick. Shits eggs. 1/10 https://t.co/Asgdc6kuLX</td>\n", " </tr>\n", " <tr>\n", " <th>2339</th>\n", " <td>666102155909144576</td>\n", " <td>None</td>\n", " <td>Oh my. Here you are seeing an Adobe Setter giving birth to twins!!! The world is an amazing place. 11/10 https://t.co/11LvqN4WLq</td>\n", " </tr>\n", " <tr>\n", " <th>2340</th>\n", " <td>666099513787052032</td>\n", " <td>None</td>\n", " <td>Can stand on stump for what seems like a while. Built that birdhouse? Impressive. Made friends with a squirrel. 8/10 https://t.co/Ri4nMTLq5C</td>\n", " </tr>\n", " <tr>\n", " <th>2341</th>\n", " <td>666094000022159362</td>\n", " <td>None</td>\n", " <td>This appears to be a Mongolian Presbyterian mix. Very tired. Tongue slip confirmed. 9/10 would lie down with https://t.co/mnioXo3IfP</td>\n", " </tr>\n", " <tr>\n", " <th>2342</th>\n", " <td>666082916733198337</td>\n", " <td>None</td>\n", " <td>Here we have a well-established sunblockerspaniel. Lost his other flip-flop. 6/10 not very waterproof https://t.co/3RU6x0vHB7</td>\n", " </tr>\n", " <tr>\n", " <th>2343</th>\n", " <td>666073100786774016</td>\n", " <td>None</td>\n", " <td>Let's hope this flight isn't Malaysian (lol). What a dog! Almost completely camouflaged. 10/10 I trust this pilot https://t.co/Yk6GHE9tOY</td>\n", " </tr>\n", " <tr>\n", " <th>2344</th>\n", " <td>666071193221509120</td>\n", " <td>None</td>\n", " <td>Here we have a northern speckled Rhododendron. Much sass. Gives 0 fucks. Good tongue. 9/10 would caress sensually https://t.co/ZoL8kq2XFx</td>\n", " </tr>\n", " <tr>\n", " <th>2351</th>\n", " <td>666049248165822465</td>\n", " <td>None</td>\n", " <td>Here we have a 1949 1st generation vulpix. Enjoys sweat tea and Fox News. Cannot be phased. 5/10 https://t.co/4B7cOc1EDq</td>\n", " </tr>\n", " <tr>\n", " <th>2355</th>\n", " <td>666020888022790149</td>\n", " <td>None</td>\n", " <td>Here we have a Japanese Irish Setter. Lost eye in Vietnam (?). Big fan of relaxing on stair. 8/10 would pet https://t.co/BLDqew2Ijj</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>600 rows × 3 columns</p>\n", "</div>" ], "text/plain": [ " tweet_id name \\\n", "5 891087950875897856 None \n", "7 890729181411237888 None \n", "12 889665388333682689 None \n", "24 887343217045368832 None \n", "25 887101392804085760 None \n", "35 885518971528720385 None \n", "37 885167619883638784 None \n", "41 884441805382717440 None \n", "42 884247878851493888 None \n", "47 883117836046086144 None \n", "59 880872448815771648 None \n", "62 880095782870896641 None \n", "72 878604707211726852 None \n", "83 876537666061221889 None \n", "88 875097192612077568 None \n", "89 875021211251597312 None \n", "93 874057562936811520 None \n", "96 873580283840344065 None \n", "99 872967104147763200 None \n", "100 872820683541237760 None \n", "103 872486979161796608 None \n", "110 871102520638267392 None \n", "112 870804317367881728 None \n", "125 868622495443632128 None \n", "127 867900495410671616 None \n", "131 867051520902168576 None \n", "133 866720684873056260 None \n", "141 864873206498414592 None \n", "154 862096992088072192 None \n", "157 861288531465048066 None \n", "... ... ... \n", "2288 667176164155375616 None \n", "2294 667138269671505920 None \n", "2299 667065535570550784 None \n", "2301 667044094246576128 None \n", "2305 666837028449972224 None \n", "2306 666835007768551424 None \n", "2307 666826780179869698 None \n", "2310 666786068205871104 None \n", "2316 666649482315059201 None \n", "2320 666437273139982337 None \n", "2321 666435652385423360 None \n", "2322 666430724426358785 None \n", "2323 666428276349472768 None \n", "2324 666421158376562688 None \n", "2328 666396247373291520 None \n", "2329 666373753744588802 None \n", "2330 666362758909284353 None \n", "2331 666353288456101888 None \n", "2332 666345417576210432 None \n", "2336 666273097616637952 None \n", "2337 666268910803644416 None \n", "2338 666104133288665088 None \n", "2339 666102155909144576 None \n", "2340 666099513787052032 None \n", "2341 666094000022159362 None \n", "2342 666082916733198337 None \n", "2343 666073100786774016 None \n", "2344 666071193221509120 None \n", "2351 666049248165822465 None \n", "2355 666020888022790149 None \n", "\n", " text \n", "5 Here we have a majestic great white breaching off South Africa's coast. Absolutely h*ckin breathtaking. 13/10 (IG: tucker_marlo) #BarkWeek https://t.co/kQ04fDDRmh \n", "7 When you watch your owner call another dog a good boy but then they turn back to you and say you're a great boy. 13/10 https://t.co/v0nONBcwxq \n", "12 Here's a puppo that seems to be on the fence about something haha no but seriously someone help her. 13/10 https://t.co/BxvuXk0UCm \n", "24 You may not have known you needed to see this today. 13/10 please enjoy (IG: emmylouroo) https://t.co/WZqNqygEyV \n", "25 This... is a Jubilant Antarctic House Bear. We only rate dogs. Please only send dogs. Thank you... 12/10 would suffocate in floof https://t.co/4Ad1jzJSdp \n", "35 I have a new hero and his name is Howard. 14/10 https://t.co/gzLHboL7Sk \n", "37 Here we have a corgi undercover as a malamute. Pawbably doing important investigative work. Zero control over tongue happenings. 13/10 https://t.co/44ItaMubBf \n", "41 I present to you, Pup in Hat. Pup in Hat is great for all occasions. Extremely versatile. Compact as h*ck. 14/10 (IG: itselizabethgales) https://t.co/vvBOcC2VdC \n", "42 OMG HE DIDN'T MEAN TO HE WAS JUST TRYING A LITTLE BARKOUR HE'S SUPER SORRY 13/10 WOULD FORGIVE IMMEDIATE https://t.co/uF3pQ8Wubj \n", "47 Please only send dogs. We don't rate mechanics, no matter how h*ckin good. Thank you... 13/10 would sneak a pat https://t.co/Se5fZ9wp5E \n", "59 Ugh not again. We only rate dogs. Please don't send in well-dressed floppy-tongued street penguins. Dogs only please. Thank you... 12/10 https://t.co/WiAMbTkDPf \n", "62 Please don't send in photos without dogs in them. We're not @porch_rates. Insubordinate and churlish. Pretty good porch tho 11/10 https://t.co/HauE8M3Bu4 \n", "72 Martha is stunning how h*ckin dare you. 13/10 https://t.co/9uABQXgjwa \n", "83 I can say with the pupmost confidence that the doggos who assisted with this search are heroic as h*ck. 14/10 for all https://t.co/8yoc1CNTsu \n", "88 You'll get your package when that precious man is done appreciating the pups. 13/10 for everyone https://t.co/PFp4MghzBW \n", "89 Guys please stop sending pictures without any dogs in th- oh never mind hello excuse me sir. 12/10 stealthy as h*ck https://t.co/brCQoqc8AW \n", "93 I can't believe this keeps happening. This, is a birb taking a bath. We only rate dogs. Please only send dogs. Thank you... 12/10 https://t.co/pwY9PQhtP2 \n", "96 We usually don't rate Deck-bound Saskatoon Black Bears, but this one is h*ckin flawless. Sneaky tongue slip too. 13/10 would hug firmly https://t.co/mNuMH9400n \n", "99 Here's a very large dog. He has a date later. Politely asked this water person to check if his breath is bad. 12/10 good to go doggo https://t.co/EMYIdoblMR \n", "100 Here are my favorite #dogsatpollingstations \\nMost voted for a more consistent walking schedule and to increase daily pats tenfold. All 13/10 https://t.co/17FVMl4VZ5 \n", "103 We. Only. Rate. Dogs. Do not send in other things like this fluffy floor shark clearly ready to attack. Get it together guys... 12/10 https://t.co/BZHiKx3FpQ \n", "110 Never doubt a doggo 14/10 https://t.co/AbBLh2FZCH \n", "112 Real funny guys. Sending in a pic without a dog in it. Hilarious. We'll rate the rug tho because it's giving off a very good vibe. 11/10 https://t.co/GCD1JccCyi \n", "125 Here's a h*ckin peaceful boy. Unbothered by the comings and goings. 13/10 please reveal your wise ways https://t.co/yeaH8Ej5eM \n", "127 Unbelievable. We only rate dogs. Please don't send in non-canines like the \"I\" from Pixar's opening credits. Thank you... 12/10 https://t.co/JMhDNv5wXZ \n", "131 Oh my this spooked me up. We only rate dogs, not happy ghosts. Please send dogs only. It's a very simple premise. Thank you... 13/10 https://t.co/M5Rz0R8SIQ \n", "133 He was providing for his family 13/10 how dare you https://t.co/Q8mVwWN3f4 \n", "141 We only rate dogs. Please don't send in Jesus. We're trying to remain professional and legitimate. Thank you... 14/10 https://t.co/wr3xsjeCIR \n", "154 We only rate dogs. Please don't send perfectly toasted marshmallows attempting to drive. Thank you... 13/10 https://t.co/nvZyyrp0kd \n", "157 HI. MY. NAME. IS. BOOMER. AND. I. WANT. TO. SAY. IT'S. H*CKIN. RIDICULOUS. THAT. DOGS. CAN'T VOTE. ABSOLUTE. CODSWALLUP. THANK. YOU. 13/10 https://t.co/SqKJPwbQ2g \n", "... ... \n", "2288 These are strange dogs. All have toupees. Long neck for dogs. In a shed of sorts? Work in groups? 4/10 still petable https://t.co/PZxSarAfSN \n", "2294 Extremely intelligent dog here. Has learned to walk like human. Even has his own dog. Very impressive 10/10 https://t.co/0DvHAMdA4V \n", "2299 Here we have a Hufflepuff. Loves vest. Eyes wide af. Flaccid tail. Matches carpet. Always a little blurry. 8/10 https://t.co/7JdgVqDnvR \n", "2301 12/10 gimme now https://t.co/QZAnwgnOMB \n", "2305 My goodness. Very rare dog here. Large. Tail dangerous. Kinda fat. Only eats leaves. Doesn't come when called 3/10 https://t.co/xYGdBrMS9h \n", "2306 These are Peruvian Feldspars. Their names are Cupit and Prencer. Both resemble Rand Paul. Sick outfits 10/10 & 10/10 https://t.co/ZnEMHBsAs1 \n", "2307 12/10 simply brilliant pup https://t.co/V6ZzG45zzG \n", "2310 Unfamiliar with this breed. Ears pointy af. Won't let go of seashell. Won't eat kibble. Not very fast. Bad dog 2/10 https://t.co/EIn5kElY1S \n", "2316 Cool dog. Enjoys couch. Low monotone bark. Very nice kicks. Pisses milk (must be rare). Can't go down stairs. 4/10 https://t.co/vXMKrJC81s \n", "2320 Here we see a lone northeastern Cumberbatch. Half ladybug. Only builds with bricks. Very confident with body. 7/10 https://t.co/7LtjBS0GPK \n", "2321 \"Can you behave? You're ruining my wedding day\"\\nDOG: idgaf this flashlight tastes good as hell\\n\\n10/10 https://t.co/GlFZPzqcEU \n", "2322 Oh boy what a pup! Sunglasses take this one to the next level. Weirdly folds front legs. Pretty big. 6/10 https://t.co/yECbFrSArM \n", "2323 Here we have an Austrian Pulitzer. Collectors edition. Levitates (?). 7/10 would garden with https://t.co/NMQq6HIglK \n", "2324 *internally screaming* 12/10 https://t.co/YMcrXC2Y6R \n", "2328 Oh goodness. A super rare northeast Qdoba kangaroo mix. Massive feet. No pouch (disappointing). Seems alert. 9/10 https://t.co/Dc7b0E8qFE \n", "2329 Those are sunglasses and a jean jacket. 11/10 dog cool af https://t.co/uHXrPkUEyl \n", "2330 Unique dog here. Very small. Lives in container of Frosted Flakes (?). Short legs. Must be rare 6/10 would still pet https://t.co/XMD9CwjEnM \n", "2331 Here we have a mixed Asiago from the Galápagos Islands. Only one ear working. Big fan of marijuana carpet. 8/10 https://t.co/tltQ5w9aUO \n", "2332 Look at this jokester thinking seat belt laws don't apply to him. Great tongue tho 10/10 https://t.co/VFKG1vxGjB \n", "2336 Can take selfies 11/10 https://t.co/ws2AMaNwPW \n", "2337 Very concerned about fellow dog trapped in computer. 10/10 https://t.co/0yxApIikpk \n", "2338 Not familiar with this breed. No tail (weird). Only 2 legs. Doesn't bark. Surprisingly quick. Shits eggs. 1/10 https://t.co/Asgdc6kuLX \n", "2339 Oh my. Here you are seeing an Adobe Setter giving birth to twins!!! The world is an amazing place. 11/10 https://t.co/11LvqN4WLq \n", "2340 Can stand on stump for what seems like a while. Built that birdhouse? Impressive. Made friends with a squirrel. 8/10 https://t.co/Ri4nMTLq5C \n", "2341 This appears to be a Mongolian Presbyterian mix. Very tired. Tongue slip confirmed. 9/10 would lie down with https://t.co/mnioXo3IfP \n", "2342 Here we have a well-established sunblockerspaniel. Lost his other flip-flop. 6/10 not very waterproof https://t.co/3RU6x0vHB7 \n", "2343 Let's hope this flight isn't Malaysian (lol). What a dog! Almost completely camouflaged. 10/10 I trust this pilot https://t.co/Yk6GHE9tOY \n", "2344 Here we have a northern speckled Rhododendron. Much sass. Gives 0 fucks. Good tongue. 9/10 would caress sensually https://t.co/ZoL8kq2XFx \n", "2351 Here we have a 1949 1st generation vulpix. Enjoys sweat tea and Fox News. Cannot be phased. 5/10 https://t.co/4B7cOc1EDq \n", "2355 Here we have a Japanese Irish Setter. Lost eye in Vietnam (?). Big fan of relaxing on stair. 8/10 would pet https://t.co/BLDqew2Ijj \n", "\n", "[600 rows x 3 columns]" ] }, "execution_count": 835, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# List the rows with missing names.\n", "df_nametofind = df_archive_clean.query('name == \"None\"')[['tweet_id', 'name', 'text']]\n", "df_nametofind" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From a visual observation within the \"text\", we identify 3 situations: \n", "* Two cases where the follows \"name is\" and \"IS.\" \n", "* One case where the name \"Martha\" starts the text line\n", "* All the other cases, the name was really missing within the text \n", "Over all the 600 rows impacted here, as the name is really missing for 597 rows, we do not see the benefit of having a huge regular expression to detect the name. Thus we do not clean this issue. This cleaning will not bring any added value to our further analysis and insights." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 6 : WeRateDgos twitter archive - We have missing dog stages, meaning stages not properly extracted. This is the case for \"puppo\", \"doggo\" and \"pupper\"." ] }, { "cell_type": "code", "execution_count": 836, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None 2069\n", "puppo 24 \n", "Name: puppo, dtype: int64\n", "None 2010\n", "doggo 83 \n", "Name: doggo, dtype: int64\n", "None 1864\n", "pupper 229 \n", "Name: pupper, dtype: int64\n", "None 2083\n", "floofer 10 \n", "Name: floofer, dtype: int64\n" ] } ], "source": [ "# First we count the missing stages\n", "print(df_archive_clean.puppo.value_counts())\n", "print(df_archive_clean.doggo.value_counts())\n", "print(df_archive_clean.pupper.value_counts())\n", "print(df_archive_clean.floofer.value_counts())" ] }, { "cell_type": "code", "execution_count": 837, "metadata": {}, "outputs": [], "source": [ "# Define : For each non identified stage, we do through the associated text to find the stage\n", "\n", "# Code : function which extract \"puppo\" from the text\n", "def extract_puppo(row):\n", " found = row['text'].lower().find(\"puppo\")\n", " if found != -1:\n", " return \"puppo\"\n", " else:\n", " return\n", "\n", "# Code : function which extract \"doggo\" from the text\n", "def extract_doggo(row):\n", " found = row['text'].lower().find(\"doggo\")\n", " if found != -1:\n", " return \"doggo\"\n", " else:\n", " return\n", "\n", "# Code : function which extract \"pupper\" from the text\n", "def extract_pupper(row):\n", " found = row['text'].lower().find(\"pupper\")\n", " if found != -1:\n", " return \"pupper\"\n", " else:\n", " return" ] }, { "cell_type": "code", "execution_count": 838, "metadata": {}, "outputs": [], "source": [ "# Code : apply the detection on the dataset\n", "df_archive_clean['puppo'] = df_archive_clean.apply(extract_puppo, axis=1)\n", "df_archive_clean['doggo'] = df_archive_clean.apply(extract_doggo, axis=1)\n", "df_archive_clean['pupper'] = df_archive_clean.apply(extract_pupper, axis=1)" ] }, { "cell_type": "code", "execution_count": 839, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "puppo 30\n", "Name: puppo, dtype: int64\n", "doggo 91\n", "Name: doggo, dtype: int64\n", "pupper 254\n", "Name: pupper, dtype: int64\n" ] } ], "source": [ "# Test: we count the missing stages, again\n", "print(df_archive_clean.puppo.value_counts())\n", "print(df_archive_clean.doggo.value_counts())\n", "print(df_archive_clean.pupper.value_counts())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality issue 9 : Tweets images predictions - We have rows which are not dogs images predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This cleaning include the correction for the \n", "#### Quality issue 11 : Tweets images predictions - predictions and the associated confidences are spread over several columns. \n", "Define : if the p1_dog is not a dog, we get the first from p2 and p3 which are dog prediction. In case, none of them is a dog prediction, we get the most highest confidence prediction, whatever it is. We know we are looking for dogs, so even with a lower confidence, we take the dog detection." ] }, { "cell_type": "code", "execution_count": 840, "metadata": {}, "outputs": [], "source": [ "# Code : set prediction - for p1, p2 and p3, we get the prediction from the 1st which is a dog, \n", "# otherwise we stay on p1 which used to have the highest confidence\n", "def set_prediction(row):\n", " prediction = row['p1']\n", " if(row['p1_dog'] == False):\n", " if(row['p2_dog'] == True):\n", " prediction = row['p2']\n", " else:\n", " if(row['p3_dog'] == True):\n", " prediction = row['p3']\n", " \n", " return prediction\n", "\n", "# Code: set confidence - se stay on p1_conf unless we find a dog on p2 or p3,\n", "# and in that case we take the associated confidence\n", "def set_confidence(row):\n", " confidence = row['p1_conf']\n", " if(row['p1_dog'] == False):\n", " if(row['p2_dog'] == True):\n", " confidence = row['p2_conf']\n", " else:\n", " if(row['p3_dog'] == True):\n", " confidence = row['p3_conf']\n", " \n", " return confidence\n", "\n", "# Code: set the type - As we might not know all the predictions which are not dogs, \n", "# we still need to know easily if we have a dog or something else\n", "def set_detectiontype(row):\n", " detected_type = \"dog\"\n", " if(row['p1_dog'] == False):\n", " if(row['p2_dog'] == False):\n", " if(row['p3_dog'] == False):\n", " detected_type = \"other\"\n", " \n", " return detected_type" ] }, { "cell_type": "code", "execution_count": 841, "metadata": {}, "outputs": [], "source": [ "# Code : create the new columns with the values\n", "df_image_clean['prediction'] = df_image_clean.apply(set_prediction, axis=1)\n", "df_image_clean['confidence'] = df_image_clean.apply(set_confidence, axis=1)\n", "df_image_clean['detectiontype'] = df_image_clean.apply(set_detectiontype, axis=1)" ] }, { "cell_type": "code", "execution_count": 842, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>p1</th>\n", " <th>p1_conf</th>\n", " <th>p1_dog</th>\n", " <th>p2</th>\n", " <th>p2_conf</th>\n", " <th>p2_dog</th>\n", " <th>p3</th>\n", " <th>p3_conf</th>\n", " <th>p3_dog</th>\n", " <th>prediction</th>\n", " <th>confidence</th>\n", " <th>detectiontype</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>132</th>\n", " <td>shopping_basket</td>\n", " <td>0.398361</td>\n", " <td>False</td>\n", " <td>hamper</td>\n", " <td>0.363222</td>\n", " <td>False</td>\n", " <td>bassinet</td>\n", " <td>0.084173</td>\n", " <td>False</td>\n", " <td>shopping_basket</td>\n", " <td>0.398361</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>1979</th>\n", " <td>home_theater</td>\n", " <td>0.168290</td>\n", " <td>False</td>\n", " <td>sandbar</td>\n", " <td>0.098040</td>\n", " <td>False</td>\n", " <td>television</td>\n", " <td>0.079729</td>\n", " <td>False</td>\n", " <td>home_theater</td>\n", " <td>0.168290</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>1097</th>\n", " <td>alp</td>\n", " <td>0.320126</td>\n", " <td>False</td>\n", " <td>lawn_mower</td>\n", " <td>0.080808</td>\n", " <td>False</td>\n", " <td>viaduct</td>\n", " <td>0.065321</td>\n", " <td>False</td>\n", " <td>alp</td>\n", " <td>0.320126</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>1902</th>\n", " <td>pencil_box</td>\n", " <td>0.662183</td>\n", " <td>False</td>\n", " <td>purse</td>\n", " <td>0.066505</td>\n", " <td>False</td>\n", " <td>pillow</td>\n", " <td>0.044725</td>\n", " <td>False</td>\n", " <td>pencil_box</td>\n", " <td>0.662183</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>453</th>\n", " <td>seashore</td>\n", " <td>0.352321</td>\n", " <td>False</td>\n", " <td>promontory</td>\n", " <td>0.131753</td>\n", " <td>False</td>\n", " <td>wreck</td>\n", " <td>0.095597</td>\n", " <td>False</td>\n", " <td>seashore</td>\n", " <td>0.352321</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>1036</th>\n", " <td>espresso</td>\n", " <td>0.430135</td>\n", " <td>False</td>\n", " <td>coffee_mug</td>\n", " <td>0.418483</td>\n", " <td>False</td>\n", " <td>cup</td>\n", " <td>0.088391</td>\n", " <td>False</td>\n", " <td>espresso</td>\n", " <td>0.430135</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>1142</th>\n", " <td>doormat</td>\n", " <td>0.359586</td>\n", " <td>False</td>\n", " <td>china_cabinet</td>\n", " <td>0.053901</td>\n", " <td>False</td>\n", " <td>passenger_car</td>\n", " <td>0.052665</td>\n", " <td>False</td>\n", " <td>doormat</td>\n", " <td>0.359586</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>1937</th>\n", " <td>lakeside</td>\n", " <td>0.312299</td>\n", " <td>False</td>\n", " <td>dock</td>\n", " <td>0.159842</td>\n", " <td>False</td>\n", " <td>canoe</td>\n", " <td>0.070795</td>\n", " <td>False</td>\n", " <td>lakeside</td>\n", " <td>0.312299</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>45</th>\n", " <td>snail</td>\n", " <td>0.999888</td>\n", " <td>False</td>\n", " <td>slug</td>\n", " <td>0.000055</td>\n", " <td>False</td>\n", " <td>acorn</td>\n", " <td>0.000026</td>\n", " <td>False</td>\n", " <td>snail</td>\n", " <td>0.999888</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>832</th>\n", " <td>washbasin</td>\n", " <td>0.272451</td>\n", " <td>False</td>\n", " <td>doormat</td>\n", " <td>0.165871</td>\n", " <td>False</td>\n", " <td>bathtub</td>\n", " <td>0.066368</td>\n", " <td>False</td>\n", " <td>washbasin</td>\n", " <td>0.272451</td>\n", " <td>other</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " p1 p1_conf p1_dog p2 p2_conf p2_dog \\\n", "132 shopping_basket 0.398361 False hamper 0.363222 False \n", "1979 home_theater 0.168290 False sandbar 0.098040 False \n", "1097 alp 0.320126 False lawn_mower 0.080808 False \n", "1902 pencil_box 0.662183 False purse 0.066505 False \n", "453 seashore 0.352321 False promontory 0.131753 False \n", "1036 espresso 0.430135 False coffee_mug 0.418483 False \n", "1142 doormat 0.359586 False china_cabinet 0.053901 False \n", "1937 lakeside 0.312299 False dock 0.159842 False \n", "45 snail 0.999888 False slug 0.000055 False \n", "832 washbasin 0.272451 False doormat 0.165871 False \n", "\n", " p3 p3_conf p3_dog prediction confidence \\\n", "132 bassinet 0.084173 False shopping_basket 0.398361 \n", "1979 television 0.079729 False home_theater 0.168290 \n", "1097 viaduct 0.065321 False alp 0.320126 \n", "1902 pillow 0.044725 False pencil_box 0.662183 \n", "453 wreck 0.095597 False seashore 0.352321 \n", "1036 cup 0.088391 False espresso 0.430135 \n", "1142 passenger_car 0.052665 False doormat 0.359586 \n", "1937 canoe 0.070795 False lakeside 0.312299 \n", "45 acorn 0.000026 False snail 0.999888 \n", "832 bathtub 0.066368 False washbasin 0.272451 \n", "\n", " detectiontype \n", "132 other \n", "1979 other \n", "1097 other \n", "1902 other \n", "453 other \n", "1036 other \n", "1142 other \n", "1937 other \n", "45 other \n", "832 other " ] }, "execution_count": 842, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test : Visual verification\n", "df_image_clean.query('detectiontype == \"other\"')[['p1', 'p1_conf', 'p1_dog', 'p2', 'p2_conf', 'p2_dog', 'p3', 'p3_conf', 'p3_dog', 'prediction', 'confidence', 'detectiontype']].sample(10)" ] }, { "cell_type": "code", "execution_count": 843, "metadata": {}, "outputs": [], "source": [ "# Clean the p columns\n", "df_image_clean.drop('p1', axis=1, inplace=True)\n", "df_image_clean.drop('p1_conf', axis=1, inplace=True)\n", "df_image_clean.drop('p1_dog', axis=1, inplace=True)\n", "df_image_clean.drop('p2', axis=1, inplace=True)\n", "df_image_clean.drop('p2_conf', axis=1, inplace=True)\n", "df_image_clean.drop('p2_dog', axis=1, inplace=True)\n", "df_image_clean.drop('p3', axis=1, inplace=True)\n", "df_image_clean.drop('p3_conf', axis=1, inplace=True)\n", "df_image_clean.drop('p3_dog', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 844, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>jpg_url</th>\n", " <th>img_num</th>\n", " <th>prediction</th>\n", " <th>confidence</th>\n", " <th>detectiontype</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>689</th>\n", " <td>684188786104872960</td>\n", " <td>https://pbs.twimg.com/media/CX66EiJWkAAVjA-.jpg</td>\n", " <td>1</td>\n", " <td>kelpie</td>\n", " <td>0.537782</td>\n", " <td>dog</td>\n", " </tr>\n", " <tr>\n", " <th>972</th>\n", " <td>706644897839910912</td>\n", " <td>https://pbs.twimg.com/ext_tw_video_thumb/706644797256241152/pu/img/NTqvmIUQExGmKFSR.jpg</td>\n", " <td>1</td>\n", " <td>Chihuahua</td>\n", " <td>0.132928</td>\n", " <td>dog</td>\n", " </tr>\n", " <tr>\n", " <th>233</th>\n", " <td>670420569653809152</td>\n", " <td>https://pbs.twimg.com/media/CU3P82RWEAAIVrE.jpg</td>\n", " <td>1</td>\n", " <td>bow_tie</td>\n", " <td>0.268759</td>\n", " <td>other</td>\n", " </tr>\n", " <tr>\n", " <th>1867</th>\n", " <td>843856843873095681</td>\n", " <td>https://pbs.twimg.com/media/C7X7Ui0XgAA3m19.jpg</td>\n", " <td>1</td>\n", " <td>Labrador_retriever</td>\n", " <td>0.922540</td>\n", " <td>dog</td>\n", " </tr>\n", " <tr>\n", " <th>488</th>\n", " <td>675517828909424640</td>\n", " <td>https://pbs.twimg.com/media/CV_r3v4VAAALvwg.jpg</td>\n", " <td>1</td>\n", " <td>Scottish_deerhound</td>\n", " <td>0.240591</td>\n", " <td>dog</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id \\\n", "689 684188786104872960 \n", "972 706644897839910912 \n", "233 670420569653809152 \n", "1867 843856843873095681 \n", "488 675517828909424640 \n", "\n", " jpg_url \\\n", "689 https://pbs.twimg.com/media/CX66EiJWkAAVjA-.jpg \n", "972 https://pbs.twimg.com/ext_tw_video_thumb/706644797256241152/pu/img/NTqvmIUQExGmKFSR.jpg \n", "233 https://pbs.twimg.com/media/CU3P82RWEAAIVrE.jpg \n", "1867 https://pbs.twimg.com/media/C7X7Ui0XgAA3m19.jpg \n", "488 https://pbs.twimg.com/media/CV_r3v4VAAALvwg.jpg \n", "\n", " img_num prediction confidence detectiontype \n", "689 1 kelpie 0.537782 dog \n", "972 1 Chihuahua 0.132928 dog \n", "233 1 bow_tie 0.268759 other \n", "1867 1 Labrador_retriever 0.922540 dog \n", "488 1 Scottish_deerhound 0.240591 dog " ] }, "execution_count": 844, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View on the dataset\n", "df_image_clean.sample(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The \"detectiontype\" column is not really necessary here. We added it to have a quick and easy understanding about the prediction, if it is a kind of dog or not. It might be useful for the analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Clean Tidiness issue 1: WeRateDgos twitter archive - Dog stages have been spread as columns." ] }, { "cell_type": "code", "execution_count": 845, "metadata": {}, "outputs": [], "source": [ "# Define : for each doggo, floofer, pupper and puppo columns, we get the stage we encounter\n", "# In case we got a stage earlier, we add collect them separated by a \",\"\n", "# Code : Get the stage from each column\n", "def set_stage(row):\n", " stage = None\n", " \n", " # Collect doggo and any other stage encoutered separated by a \",\"\n", " if(row['doggo'] == \"doggo\"):\n", " if stage != None:\n", " stage = stage + \",\" + \"floofer\"\n", " print(\"more floofer : \" + stage)\n", " else:\n", " stage = \"doggo\"\n", " \n", " # Collect floofer and any other stage encoutered separated by a \",\"\n", " if(row['floofer'] == \"floofer\"):\n", " \n", " if stage != None:\n", " stage = stage + \",\" + \"floofer\"\n", " print(\"more floofer : \" + stage)\n", " else:\n", " stage = \"floofer\"\n", " \n", " # Collect pupper and any other stage encoutered separated by a \",\"\n", " if(row['pupper'] == \"pupper\"):\n", " \n", " if stage != None:\n", " stage = stage + \",\" + \"pupper\"\n", " print(\"more pupper : \" + stage)\n", " else:\n", " stage = \"pupper\"\n", " \n", " # Collect puppo and any other stage encoutered separated by a \",\"\n", " if(row['puppo'] == \"puppo\"):\n", " \n", " if stage != None:\n", " stage = stage + \",\" + \"puppo\"\n", " print(\"more puppo : \" + stage)\n", " else:\n", " stage = \"puppo\"\n", " \n", " return stage" ] }, { "cell_type": "code", "execution_count": 846, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "more puppo : doggo,puppo\n", "more puppo : doggo,puppo\n", "more floofer : doggo,floofer\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n", "more pupper : doggo,pupper\n" ] } ], "source": [ "# Code : set the new column using the previous functions\n", "df_archive_clean['stage'] = df_archive_clean.apply(set_stage, axis=1)" ] }, { "cell_type": "code", "execution_count": 847, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([None, 'doggo', 'puppo', 'pupper', 'floofer', 'doggo,puppo',\n", " 'doggo,floofer', 'doggo,pupper'], dtype=object)" ] }, "execution_count": 847, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test : get the unique values from \"stage\" column\n", "df_archive_clean['stage'].unique()" ] }, { "cell_type": "code", "execution_count": 848, "metadata": {}, "outputs": [], "source": [ "# Clean the non useful columns\n", "df_archive_clean.drop('puppo', axis=1, inplace=True)\n", "df_archive_clean.drop('doggo', axis=1, inplace=True)\n", "df_archive_clean.drop('floofer', axis=1, inplace=True)\n", "df_archive_clean.drop('pupper', axis=1, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 7 : WeRateDgos twitter archive - \"expanded_urls\" column contains duplicated urls." ] }, { "cell_type": "code", "execution_count": 849, "metadata": {}, "outputs": [], "source": [ "# Define : we keep only the 1st url in \"expanded_urls\". \n", "# The reason: From previous observation, we have rare cases where the urls are differents for the same tweet.\n", "# We do not plan to use the 2nd or the 3rd urls for our analysis\n", "\n", "# Code : function to split the current \"expanded_urls\" content and keep the 1st part\n", "def split_expanded_urls(row):\n", " retrieved_url = None\n", " found_urls = row['expanded_urls'].split(',')\n", " retrieved_url = found_urls[0]\n", " \n", " return retrieved_url" ] }, { "cell_type": "code", "execution_count": 850, "metadata": {}, "outputs": [], "source": [ "# Code : apply the function above\n", "df_archive_clean['expanded_urls'] = df_archive_clean.apply(split_expanded_urls, axis=1)" ] }, { "cell_type": "code", "execution_count": 851, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 2093\n", "Name: expanded_urls, dtype: int64" ] }, "execution_count": 851, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test : Do we still have multiple urls ?\n", "df_archive_clean['expanded_urls'].str.contains(',').value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 8 : WeRateDgos twitter archive - timestamp is using string type." ] }, { "cell_type": "code", "execution_count": 852, "metadata": {}, "outputs": [], "source": [ "# Define and Code: Convert timestamp to datetime using pd.to_datetime function\n", "df_archive_clean['timestamp'] = pd.to_datetime(df_archive_clean['timestamp'])" ] }, { "cell_type": "code", "execution_count": 853, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas._libs.tslibs.timestamps.Timestamp" ] }, "execution_count": 853, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test: get the \"timestamp\" column type\n", "type(df_archive_clean['timestamp'][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning - Quality Issue 10 : Data retrieved via twitter API - \"created_at\" column type is not timestamp." ] }, { "cell_type": "code", "execution_count": 854, "metadata": {}, "outputs": [], "source": [ "# Define and Code: Convert timestamp to datetime using pd.to_datetime function\n", "df_twitter_clean['created_at'] = pd.to_datetime(df_twitter_clean['created_at'])" ] }, { "cell_type": "code", "execution_count": 855, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas._libs.tslibs.timestamps.Timestamp" ] }, "execution_count": 855, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test: get the \"created_at\" column type\n", "type(df_twitter_clean['created_at'][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Requirement : we do not keep the tweets beyond August 1st 2017, because we won't be able to have the associated images predictions" ] }, { "cell_type": "code", "execution_count": 856, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>timestamp</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>892420643555336193</td>\n", " <td>2017-08-01 16:23:56</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>892177421306343426</td>\n", " <td>2017-08-01 00:17:27</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id timestamp\n", "0 892420643555336193 2017-08-01 16:23:56\n", "1 892177421306343426 2017-08-01 00:17:27" ] }, "execution_count": 856, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# twitter enhanced archive : Which lines are beyond August 1st 2017\n", "df_archive_clean.query('timestamp > \"2017-08-01 00:00:00\"')[['tweet_id', 'timestamp']]" ] }, { "cell_type": "code", "execution_count": 857, "metadata": {}, "outputs": [], "source": [ "# Delete the line\n", "df_archive_clean.drop([0], inplace=True)\n", "df_archive_clean.drop([1], inplace=True)" ] }, { "cell_type": "code", "execution_count": 858, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>timestamp</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "Empty DataFrame\n", "Columns: [tweet_id, timestamp]\n", "Index: []" ] }, "execution_count": 858, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test : ensure the removal\n", "df_archive_clean.query('timestamp > \"2017-08-01 00:00:00\"')[['tweet_id', 'timestamp']]" ] }, { "cell_type": "code", "execution_count": 859, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>created_at</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>892420643555336193</td>\n", " <td>2017-08-01 16:23:56</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>892177421306343426</td>\n", " <td>2017-08-01 00:17:27</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id created_at\n", "0 892420643555336193 2017-08-01 16:23:56\n", "1 892177421306343426 2017-08-01 00:17:27" ] }, "execution_count": 859, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dataframe from twitter AP : Which lines are beyond August 1st 2017\n", "df_twitter_clean.query('created_at > \"2017-08-01 00:00:00\"')[['tweet_id', 'created_at']]" ] }, { "cell_type": "code", "execution_count": 860, "metadata": {}, "outputs": [], "source": [ "# Delete the line\n", "df_twitter_clean.drop([0], inplace=True)\n", "df_twitter_clean.drop([1], inplace=True)" ] }, { "cell_type": "code", "execution_count": 861, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>created_at</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "Empty DataFrame\n", "Columns: [tweet_id, created_at]\n", "Index: []" ] }, "execution_count": 861, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the removal\n", "df_twitter_clean.query('created_at > \"2017-08-01 00:00:00\"')[['tweet_id', 'created_at']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning Tidiness issue 2 : Merging the twitter enhanced archive dataframe, the twitter dataframe built from the APIs and the images predictions dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Merge `df_archive_clean` with `df_twitter_clean`, and then merge the result with `df_image_clean`" ] }, { "cell_type": "code", "execution_count": 862, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((2091, 9), (2335, 5), (2075, 6))" ] }, "execution_count": 862, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the sizes of off the dataframes to merge\n", "df_archive_clean.shape, df_twitter_clean.shape, df_image_clean.shape" ] }, { "cell_type": "code", "execution_count": 863, "metadata": {}, "outputs": [], "source": [ "# This approach does the job, but some rows disappeared (1962 rows as result) during the merge and we don't understand why\n", "# so we abandon it\n", "# https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns\n", "# Merge 3 dataframes \n", "\n", "# Import the python reduce function from the functools package\n", "#from functools import reduce\n", "\n", "# Set the list of the dataframes we want to merge\n", "#datasets_list = [df_image_clean, df_twitter_clean[['tweet_id', 'retweet_count', 'favorite_count']], df_archive_clean]\n", "\n", "# Merge to create df_master\n", "#df_master = reduce(lambda left,right: pd.merge(left,right,on='tweet_id'), datasets_list)\n", "\n", "# Check the size\n", "#df_master.shape" ] }, { "cell_type": "code", "execution_count": 864, "metadata": {}, "outputs": [], "source": [ "# Code : Merge df_archive_clean and df_twitter_clean\n", "df_temp_tweets = df_archive_clean.merge(df_twitter_clean[['tweet_id', 'retweet_count', 'favorite_count']], how='left', on='tweet_id')" ] }, { "cell_type": "code", "execution_count": 865, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2091, 11)" ] }, "execution_count": 865, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test : we should have the same number of rows as df_archive_clean\n", "df_temp_tweets.shape" ] }, { "cell_type": "code", "execution_count": 866, "metadata": {}, "outputs": [], "source": [ "# Code : merge the temporary dataframe with df_image_clean\n", "df_master = df_image_clean.merge(df_temp_tweets, how='left', on='tweet_id')" ] }, { "cell_type": "code", "execution_count": 867, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "(2075, 16)" ] }, "execution_count": 867, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test: check the size\n", "df_master.shape" ] }, { "cell_type": "code", "execution_count": 868, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "tweet_id int64 \n", "jpg_url object \n", "img_num int64 \n", "prediction object \n", "confidence float64 \n", "detectiontype object \n", "timestamp datetime64[ns]\n", "source object \n", "text object \n", "expanded_urls object \n", "rating_numerator float64 \n", "rating_denominator float64 \n", "name object \n", "stage object \n", "retweet_count float64 \n", "favorite_count float64 \n", "dtype: object" ] }, "execution_count": 868, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check df_master's columns names and types\n", "df_master.dtypes" ] }, { "cell_type": "code", "execution_count": 869, "metadata": {}, "outputs": [], "source": [ "# For our analysis, we prefer to have only one column for the rating\n", "\n", "# Calculate the rating by rating_numerator / rating_denominator, as a float\n", "def calculate_rating(row):\n", " return float(row['rating_numerator']/row['rating_denominator'])\n", "\n", "# Apply the rating function to each row\n", "df_master['rating'] = df_master.apply(calculate_rating, axis=1)" ] }, { "cell_type": "code", "execution_count": 870, "metadata": {}, "outputs": [], "source": [ "# Remove rating numerator and denominator columns\n", "df_master.drop('rating_numerator', axis=1, inplace=True)\n", "df_master.drop('rating_denominator', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 871, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>jpg_url</th>\n", " <th>img_num</th>\n", " <th>prediction</th>\n", " <th>confidence</th>\n", " <th>detectiontype</th>\n", " <th>timestamp</th>\n", " <th>source</th>\n", " <th>text</th>\n", " <th>expanded_urls</th>\n", " <th>name</th>\n", " <th>stage</th>\n", " <th>retweet_count</th>\n", " <th>favorite_count</th>\n", " <th>rating</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>617</th>\n", " <td>680206703334408192</td>\n", " <td>https://pbs.twimg.com/media/CXCUYcRW8AAObYM.jpg</td>\n", " <td>1</td>\n", " <td>Christmas_stocking</td>\n", " <td>0.149758</td>\n", " <td>other</td>\n", " <td>2015-12-25 02:01:30</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>I hope everyone enjoys this picture as much as I do. This is Toby. 12/10 https://t.co/vHnu1g9EJm</td>\n", " <td>https://twitter.com/dog_rates/status/680206703334408192/photo/1</td>\n", " <td>Toby</td>\n", " <td>None</td>\n", " <td>1246.0</td>\n", " <td>2941.0</td>\n", " <td>1.2</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>666049248165822465</td>\n", " <td>https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg</td>\n", " <td>1</td>\n", " <td>miniature_pinscher</td>\n", " <td>0.560311</td>\n", " <td>dog</td>\n", " <td>2015-11-16 00:24:50</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>Here we have a 1949 1st generation vulpix. Enjoys sweat tea and Fox News. Cannot be phased. 5/10 https://t.co/4B7cOc1EDq</td>\n", " <td>https://twitter.com/dog_rates/status/666049248165822465/photo/1</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>42.0</td>\n", " <td>105.0</td>\n", " <td>0.5</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id jpg_url \\\n", "617 680206703334408192 https://pbs.twimg.com/media/CXCUYcRW8AAObYM.jpg \n", "4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg \n", "\n", " img_num prediction confidence detectiontype \\\n", "617 1 Christmas_stocking 0.149758 other \n", "4 1 miniature_pinscher 0.560311 dog \n", "\n", " timestamp \\\n", "617 2015-12-25 02:01:30 \n", "4 2015-11-16 00:24:50 \n", "\n", " source \\\n", "617 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "4 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "\n", " text \\\n", "617 I hope everyone enjoys this picture as much as I do. This is Toby. 12/10 https://t.co/vHnu1g9EJm \n", "4 Here we have a 1949 1st generation vulpix. Enjoys sweat tea and Fox News. Cannot be phased. 5/10 https://t.co/4B7cOc1EDq \n", "\n", " expanded_urls name \\\n", "617 https://twitter.com/dog_rates/status/680206703334408192/photo/1 Toby \n", "4 https://twitter.com/dog_rates/status/666049248165822465/photo/1 None \n", "\n", " stage retweet_count favorite_count rating \n", "617 None 1246.0 2941.0 1.2 \n", "4 None 42.0 105.0 0.5 " ] }, "execution_count": 871, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Observe the new dataframe\n", "df_master.sample(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='storing'></a>\n", "### Storing Cleaned Data" ] }, { "cell_type": "code", "execution_count": 872, "metadata": {}, "outputs": [], "source": [ "# Store cleaned data into csv files\n", "df_master.to_csv('twitter_archive_master.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Store the dataframes into SQLite database. \n", "We've learnt the way to do from here : https://stackoverflow.com/questions/50803109/how-to-store-pandas-dataframe-in-sqlite-db#50803252" ] }, { "cell_type": "code", "execution_count": 873, "metadata": {}, "outputs": [], "source": [ "# in case the databse already exists\n", "!rm weratedogs.sqlite" ] }, { "cell_type": "code", "execution_count": 874, "metadata": {}, "outputs": [], "source": [ "# Import the required package\n", "from sqlalchemy import create_engine\n", "\n", "# Create the engine - in memory\n", "engine = create_engine('sqlite:///weratedogs.sqlite', echo=False)\n", "\n", "# Store our dataframes into 2 tables\n", "df_master.to_sql('twitter_archive_master', con=engine)" ] }, { "cell_type": "code", "execution_count": 875, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None 42.0\n", "Atticus 177.6\n" ] } ], "source": [ "# Test: select information from the tables\n", "# Send the request\n", "my_request = engine.execute(\"SELECT twitter_archive_master.name, twitter_archive_master.rating FROM twitter_archive_master WHERE twitter_archive_master.rating > 1.4\")\n", "\n", "# Display the response\n", "for name, rating in my_request:\n", " print(name, rating)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<a id='analysis'></a>\n", "### Analyzing and Visualizing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to develop an understanding of what make a dog picture to be a success. \n", "We use categorical plots, as discovered from here : https://seaborn.pydata.org/tutorial/categorical.html#categorical-scatterplots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we would like to understand the most successful stages" ] }, { "cell_type": "code", "execution_count": 876, "metadata": {}, "outputs": [], "source": [ "# make a copy of the master dataset\n", "df_stages = df_master.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each cell in the `stage` column contains several values separated by a ,\" character. We need to have one stage per line. Here is the approach we use:\n", "\n", "1. for each cell we remove any \",\" character at the begining or at the end\n", "2. we split the content according to the \",\" character we encounter, in order to have one column per stage\n", "3. we stack all the stages columns into one single column. So for each stage, we get additional lines into the dataset\n", "4. we rename the new column to `stacked_stages`\n", "5. we add this new column to the original dataset\n", "\n", "This way of slipping and stacking has been obtained from the Udacity's student forum, while working on a previous project.\n" ] }, { "cell_type": "code", "execution_count": 877, "metadata": {}, "outputs": [], "source": [ "# stack the copy on stage column, and get all the stages in a new column\n", "df_stages = df_stages.join(df_stages.stage.str.strip(',').str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('stacked_stages'))" ] }, { "cell_type": "code", "execution_count": 878, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 233\n", "doggo 79 \n", "puppo 29 \n", "floofer 8 \n", "Name: stacked_stages, dtype: int64" ] }, "execution_count": 878, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Do we have unique stage into df_stages ?\n", "df_stages.stacked_stages.value_counts()" ] }, { "cell_type": "code", "execution_count": 879, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 576x576 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot favourite_count per stage, in swarn\n", "\n", "# visibility\n", "myplot = sns.catplot(x=\"stacked_stages\", y=\"favorite_count\", kind=\"swarm\", data=df_stages, height=8)\n", "# Set the titles\n", "myplot.set(title='Popular stages along with favourites counts', xlabel='Dogs Stages', ylabel='Number of favourites');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Insight 1 :\n", "Beyond 80 000 favourites count, we have more doggo(3) than puppo(1) and pupper(1). So it seems that it is better be a doggo to the road to the success here ! \n" ] }, { "cell_type": "code", "execution_count": 880, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 73\n", "Name: stacked_stages, dtype: int64" ] }, "execution_count": 880, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Most stages in 2015\n", "df_stages.query('timestamp > \"2015-01-01 00:00:00\" and timestamp <= \"2016-01-01 00:00:00\"')['stacked_stages'].value_counts()" ] }, { "cell_type": "code", "execution_count": 881, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "pupper 138\n", "doggo 46 \n", "puppo 13 \n", "floofer 6 \n", "Name: stacked_stages, dtype: int64" ] }, "execution_count": 881, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Most stages in 2016\n", "df_stages.query('timestamp > \"2016-01-01 00:00:00\" and timestamp <= \"2017-01-01 00:00:00\"')['stacked_stages'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Doggo stage appears in 2015, but the majority of the stages were still pupper." ] }, { "cell_type": "code", "execution_count": 882, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "doggo 33\n", "pupper 22\n", "puppo 16\n", "floofer 2 \n", "Name: stacked_stages, dtype: int64" ] }, "execution_count": 882, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Most stages in 2017\n", "df_stages.query('timestamp > \"2017-01-01 00:00:00\"')['stacked_stages'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then doggo is on the rise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Insight 2 :\n", "Clearly we have more pupper pictures posted, because pupper pictures have been posted during 2015. This is strange, but might be linked to people awareness regarding the WeRateDogs twitter account, at the beginning." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Does the rating correlated to the most favorited ?" ] }, { "cell_type": "code", "execution_count": 883, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.3 7\n", "1.2 2\n", "1.4 1\n", "Name: rating, dtype: int64" ] }, "execution_count": 883, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the rating of the most favorited\n", "df_most_favorited = df_stages.query('favorite_count > 80000')\n", "df_most_favorited.rating.value_counts()" ] }, { "cell_type": "code", "execution_count": 884, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>img_num</th>\n", " <th>confidence</th>\n", " <th>retweet_count</th>\n", " <th>favorite_count</th>\n", " <th>rating</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>count</th>\n", " <td>2.075000e+03</td>\n", " <td>2075.000000</td>\n", " <td>2075.000000</td>\n", " <td>1964.000000</td>\n", " <td>1964.000000</td>\n", " <td>1968.000000</td>\n", " </tr>\n", " <tr>\n", " <th>mean</th>\n", " <td>7.384514e+17</td>\n", " <td>1.203855</td>\n", " <td>0.547955</td>\n", " <td>2625.843177</td>\n", " <td>8642.184318</td>\n", " <td>1.163556</td>\n", " </tr>\n", " <tr>\n", " <th>std</th>\n", " <td>6.785203e+16</td>\n", " <td>0.561875</td>\n", " <td>0.297842</td>\n", " <td>4693.838058</td>\n", " <td>12689.882348</td>\n", " <td>4.090735</td>\n", " </tr>\n", " <tr>\n", " <th>min</th>\n", " <td>6.660209e+17</td>\n", " <td>1.000000</td>\n", " <td>0.000010</td>\n", " <td>11.000000</td>\n", " <td>77.000000</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>25%</th>\n", " <td>6.764835e+17</td>\n", " <td>1.000000</td>\n", " <td>0.299295</td>\n", " <td>584.750000</td>\n", " <td>1854.750000</td>\n", " <td>1.000000</td>\n", " </tr>\n", " <tr>\n", " <th>50%</th>\n", " <td>7.119988e+17</td>\n", " <td>1.000000</td>\n", " <td>0.541780</td>\n", " <td>1259.000000</td>\n", " <td>3911.000000</td>\n", " <td>1.100000</td>\n", " </tr>\n", " <tr>\n", " <th>75%</th>\n", " <td>7.932034e+17</td>\n", " <td>1.000000</td>\n", " <td>0.820962</td>\n", " <td>3004.250000</td>\n", " <td>10806.750000</td>\n", " <td>1.200000</td>\n", " </tr>\n", " <tr>\n", " <th>max</th>\n", " <td>8.924206e+17</td>\n", " <td>4.000000</td>\n", " <td>1.000000</td>\n", " <td>82678.000000</td>\n", " <td>162549.000000</td>\n", " <td>177.600000</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id img_num confidence retweet_count favorite_count \\\n", "count 2.075000e+03 2075.000000 2075.000000 1964.000000 1964.000000 \n", "mean 7.384514e+17 1.203855 0.547955 2625.843177 8642.184318 \n", "std 6.785203e+16 0.561875 0.297842 4693.838058 12689.882348 \n", "min 6.660209e+17 1.000000 0.000010 11.000000 77.000000 \n", "25% 6.764835e+17 1.000000 0.299295 584.750000 1854.750000 \n", "50% 7.119988e+17 1.000000 0.541780 1259.000000 3911.000000 \n", "75% 7.932034e+17 1.000000 0.820962 3004.250000 10806.750000 \n", "max 8.924206e+17 4.000000 1.000000 82678.000000 162549.000000 \n", "\n", " rating \n", "count 1968.000000 \n", "mean 1.163556 \n", "std 4.090735 \n", "min 0.000000 \n", "25% 1.000000 \n", "50% 1.100000 \n", "75% 1.200000 \n", "max 177.600000 " ] }, "execution_count": 884, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How is this rating positionned among all the others ?\n", "df_master.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Insight 3 :\n", "The most favorited tweets (meaning images) are rated high, more than the rating and in the last quartile up to the highest rate. \n", "So, basically the most favorited tweets are highly rated, which is a normal expected situation here." ] }, { "cell_type": "code", "execution_count": 885, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/jlcossi/anaconda/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.\n", " return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "<Figure size 360x360 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Are they part of the most retweeted ?\n", "retweetplot = sns.catplot(x=\"stacked_stages\", y=\"retweet_count\", kind=\"bar\", data=df_most_favorited)\n", "retweetplot.set(title='Retweet count for the most favorited tweets', xlabel='Dogs Stages', ylabel='Number of retweets');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Insight 4 : \n", "We observe this plot along with the statistics summary above. We confirm that the most favorited are well positioned within the most retweeted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are the breeds associated to the success ?" ] }, { "cell_type": "code", "execution_count": 886, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1093x1008 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Observe the breeds\n", "breedplot = sns.catplot(x=\"prediction\", y=\"favorite_count\", hue=\"stacked_stages\", kind=\"bar\", data=df_most_favorited, height=14);\n", "breedplot.set(title='Favorited tweets and their breeds', xlabel='Dogs predicted breeds', ylabel='Number of favorited');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We do not have all the stages for the identified breeds here. \n", "Even with this small limitation, we can conclude that being a Labrador_retriever, an Eskimo_dog, a standard_poodle and a doggo might lead to a chance to be part of the most favorited by WeRateDogs members. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We observe in the dataset, that 2 of the 3 successful doggos were shared as videos. So this is an important factor because videos transmit emotions. \n", "So there might me other elements which are not in the scope of this study: colors and the quality of the images or videos." ] }, { "cell_type": "code", "execution_count": 887, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>tweet_id</th>\n", " <th>jpg_url</th>\n", " <th>img_num</th>\n", " <th>prediction</th>\n", " <th>confidence</th>\n", " <th>detectiontype</th>\n", " <th>timestamp</th>\n", " <th>source</th>\n", " <th>text</th>\n", " <th>expanded_urls</th>\n", " <th>name</th>\n", " <th>stage</th>\n", " <th>retweet_count</th>\n", " <th>favorite_count</th>\n", " <th>rating</th>\n", " <th>stacked_stages</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>571</th>\n", " <td>678399652199309312</td>\n", " <td>https://pbs.twimg.com/ext_tw_video_thumb/678399528077250560/pu/img/BOjUNHRsYLeSo0hl.jpg</td>\n", " <td>1</td>\n", " <td>Bedlington_terrier</td>\n", " <td>0.015047</td>\n", " <td>dog</td>\n", " <td>2015-12-20 02:20:55</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This made my day. 12/10 please enjoy https://t.co/VRTbo3aAcm</td>\n", " <td>https://twitter.com/dog_rates/status/678399652199309312/video/1</td>\n", " <td>None</td>\n", " <td>None</td>\n", " <td>33355.0</td>\n", " <td>81543.0</td>\n", " <td>1.2</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>1186</th>\n", " <td>739238157791694849</td>\n", " <td>https://pbs.twimg.com/ext_tw_video_thumb/739238016737267712/pu/img/-tLpyiuIzD5zR1et.jpg</td>\n", " <td>1</td>\n", " <td>Eskimo_dog</td>\n", " <td>0.503372</td>\n", " <td>dog</td>\n", " <td>2016-06-04 23:31:25</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>Here's a doggo blowing bubbles. It's downright legendary. 13/10 would watch on repeat forever (vid by Kent Duryee) https://t.co/YcXgHfp1EC</td>\n", " <td>https://twitter.com/dog_rates/status/739238157791694849/video/1</td>\n", " <td>None</td>\n", " <td>doggo</td>\n", " <td>61211.0</td>\n", " <td>120128.0</td>\n", " <td>1.3</td>\n", " <td>doggo</td>\n", " </tr>\n", " <tr>\n", " <th>1221</th>\n", " <td>744234799360020481</td>\n", " <td>https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg</td>\n", " <td>1</td>\n", " <td>Labrador_retriever</td>\n", " <td>0.825333</td>\n", " <td>dog</td>\n", " <td>2016-06-18 18:26:18</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4</td>\n", " <td>https://twitter.com/dog_rates/status/744234799360020481/video/1</td>\n", " <td>None</td>\n", " <td>doggo</td>\n", " <td>82678.0</td>\n", " <td>162549.0</td>\n", " <td>1.3</td>\n", " <td>doggo</td>\n", " </tr>\n", " <tr>\n", " <th>1641</th>\n", " <td>807106840509214720</td>\n", " <td>https://pbs.twimg.com/ext_tw_video_thumb/807106774843039744/pu/img/8XZg1xW35Xp2J6JW.jpg</td>\n", " <td>1</td>\n", " <td>Chihuahua</td>\n", " <td>0.505370</td>\n", " <td>dog</td>\n", " <td>2016-12-09 06:17:20</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Stephan. He just wants to help. 13/10 such a good boy https://t.co/DkBYaCAg2d</td>\n", " <td>https://twitter.com/dog_rates/status/807106840509214720/video/1</td>\n", " <td>Stephan</td>\n", " <td>None</td>\n", " <td>60299.0</td>\n", " <td>125650.0</td>\n", " <td>1.3</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>1715</th>\n", " <td>819004803107983360</td>\n", " <td>https://pbs.twimg.com/media/C12whDoVEAALRxa.jpg</td>\n", " <td>1</td>\n", " <td>standard_poodle</td>\n", " <td>0.351308</td>\n", " <td>dog</td>\n", " <td>2017-01-11 02:15:36</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Bo. He was a very good First Doggo. 14/10 would be an absolute honor to pet https://t.co/AdPKrI8BZ1</td>\n", " <td>https://twitter.com/dog_rates/status/819004803107983360/photo/1</td>\n", " <td>Bo</td>\n", " <td>doggo</td>\n", " <td>39660.0</td>\n", " <td>91294.0</td>\n", " <td>1.4</td>\n", " <td>doggo</td>\n", " </tr>\n", " <tr>\n", " <th>1744</th>\n", " <td>822872901745569793</td>\n", " <td>https://pbs.twimg.com/media/C2tugXLXgAArJO4.jpg</td>\n", " <td>1</td>\n", " <td>Lakeland_terrier</td>\n", " <td>0.196015</td>\n", " <td>dog</td>\n", " <td>2017-01-21 18:26:02</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>Here's a super supportive puppo participating in the Toronto #WomensMarch today. 13/10 https://t.co/nTz3FtorBc</td>\n", " <td>https://twitter.com/dog_rates/status/822872901745569793/photo/1</td>\n", " <td>None</td>\n", " <td>puppo</td>\n", " <td>47136.0</td>\n", " <td>138899.0</td>\n", " <td>1.3</td>\n", " <td>puppo</td>\n", " </tr>\n", " <tr>\n", " <th>1932</th>\n", " <td>859196978902773760</td>\n", " <td>https://pbs.twimg.com/ext_tw_video_thumb/859196962498805762/pu/img/-yBpr4-o4GJZECYE.jpg</td>\n", " <td>1</td>\n", " <td>malamute</td>\n", " <td>0.216163</td>\n", " <td>dog</td>\n", " <td>2017-05-02 00:04:57</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>We only rate dogs. This is quite clearly a smol broken polar bear. We'd appreciate if you only send dogs. Thank you... 12/10 https://t.co/g2nSyGenG9</td>\n", " <td>https://twitter.com/dog_rates/status/859196978902773760/video/1</td>\n", " <td>quite</td>\n", " <td>None</td>\n", " <td>30403.0</td>\n", " <td>89782.0</td>\n", " <td>1.2</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>1961</th>\n", " <td>866450705531457537</td>\n", " <td>https://pbs.twimg.com/media/DAZAUfBXcAAG_Nn.jpg</td>\n", " <td>2</td>\n", " <td>French_bulldog</td>\n", " <td>0.905334</td>\n", " <td>dog</td>\n", " <td>2017-05-22 00:28:40</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Jamesy. He gives a kiss to every other pupper he sees on his walk. 13/10 such passion, much tender https://t.co/wk7TfysWHr</td>\n", " <td>https://twitter.com/dog_rates/status/866450705531457537/photo/1</td>\n", " <td>Jamesy</td>\n", " <td>pupper</td>\n", " <td>35037.0</td>\n", " <td>120747.0</td>\n", " <td>1.3</td>\n", " <td>pupper</td>\n", " </tr>\n", " <tr>\n", " <th>1977</th>\n", " <td>870374049280663552</td>\n", " <td>https://pbs.twimg.com/media/DBQwlFCXkAACSkI.jpg</td>\n", " <td>1</td>\n", " <td>golden_retriever</td>\n", " <td>0.841001</td>\n", " <td>dog</td>\n", " <td>2017-06-01 20:18:38</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Zoey. She really likes the planet. Would hate to see willful ignorance and the denial of fairly elemental science destroy it. 13/10 https://t.co/T1xlgaPujm</td>\n", " <td>https://twitter.com/dog_rates/status/870374049280663552/photo/1</td>\n", " <td>Zoey</td>\n", " <td>None</td>\n", " <td>25805.0</td>\n", " <td>81213.0</td>\n", " <td>1.3</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>2014</th>\n", " <td>879415818425184262</td>\n", " <td>https://pbs.twimg.com/ext_tw_video_thumb/879415784908390401/pu/img/cX7XI1TnUsseGET5.jpg</td>\n", " <td>1</td>\n", " <td>English_springer</td>\n", " <td>0.383404</td>\n", " <td>dog</td>\n", " <td>2017-06-26 19:07:24</td>\n", " <td><a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a></td>\n", " <td>This is Duddles. He did an attempt. 13/10 someone help him (vid by Georgia Felici) https://t.co/UDT7ZkcTgY</td>\n", " <td>https://twitter.com/dog_rates/status/879415818425184262/video/1</td>\n", " <td>Duddles</td>\n", " <td>None</td>\n", " <td>42962.0</td>\n", " <td>103018.0</td>\n", " <td>1.3</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " tweet_id \\\n", "571 678399652199309312 \n", "1186 739238157791694849 \n", "1221 744234799360020481 \n", "1641 807106840509214720 \n", "1715 819004803107983360 \n", "1744 822872901745569793 \n", "1932 859196978902773760 \n", "1961 866450705531457537 \n", "1977 870374049280663552 \n", "2014 879415818425184262 \n", "\n", " jpg_url \\\n", "571 https://pbs.twimg.com/ext_tw_video_thumb/678399528077250560/pu/img/BOjUNHRsYLeSo0hl.jpg \n", "1186 https://pbs.twimg.com/ext_tw_video_thumb/739238016737267712/pu/img/-tLpyiuIzD5zR1et.jpg \n", "1221 https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg \n", "1641 https://pbs.twimg.com/ext_tw_video_thumb/807106774843039744/pu/img/8XZg1xW35Xp2J6JW.jpg \n", "1715 https://pbs.twimg.com/media/C12whDoVEAALRxa.jpg \n", "1744 https://pbs.twimg.com/media/C2tugXLXgAArJO4.jpg \n", "1932 https://pbs.twimg.com/ext_tw_video_thumb/859196962498805762/pu/img/-yBpr4-o4GJZECYE.jpg \n", "1961 https://pbs.twimg.com/media/DAZAUfBXcAAG_Nn.jpg \n", "1977 https://pbs.twimg.com/media/DBQwlFCXkAACSkI.jpg \n", "2014 https://pbs.twimg.com/ext_tw_video_thumb/879415784908390401/pu/img/cX7XI1TnUsseGET5.jpg \n", "\n", " img_num prediction confidence detectiontype \\\n", "571 1 Bedlington_terrier 0.015047 dog \n", "1186 1 Eskimo_dog 0.503372 dog \n", "1221 1 Labrador_retriever 0.825333 dog \n", "1641 1 Chihuahua 0.505370 dog \n", "1715 1 standard_poodle 0.351308 dog \n", "1744 1 Lakeland_terrier 0.196015 dog \n", "1932 1 malamute 0.216163 dog \n", "1961 2 French_bulldog 0.905334 dog \n", "1977 1 golden_retriever 0.841001 dog \n", "2014 1 English_springer 0.383404 dog \n", "\n", " timestamp \\\n", "571 2015-12-20 02:20:55 \n", "1186 2016-06-04 23:31:25 \n", "1221 2016-06-18 18:26:18 \n", "1641 2016-12-09 06:17:20 \n", "1715 2017-01-11 02:15:36 \n", "1744 2017-01-21 18:26:02 \n", "1932 2017-05-02 00:04:57 \n", "1961 2017-05-22 00:28:40 \n", "1977 2017-06-01 20:18:38 \n", "2014 2017-06-26 19:07:24 \n", "\n", " source \\\n", "571 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1186 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1221 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1641 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1715 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1744 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1932 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1961 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "1977 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "2014 <a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a> \n", "\n", " text \\\n", "571 This made my day. 12/10 please enjoy https://t.co/VRTbo3aAcm \n", "1186 Here's a doggo blowing bubbles. It's downright legendary. 13/10 would watch on repeat forever (vid by Kent Duryee) https://t.co/YcXgHfp1EC \n", "1221 Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4 \n", "1641 This is Stephan. He just wants to help. 13/10 such a good boy https://t.co/DkBYaCAg2d \n", "1715 This is Bo. He was a very good First Doggo. 14/10 would be an absolute honor to pet https://t.co/AdPKrI8BZ1 \n", "1744 Here's a super supportive puppo participating in the Toronto #WomensMarch today. 13/10 https://t.co/nTz3FtorBc \n", "1932 We only rate dogs. This is quite clearly a smol broken polar bear. We'd appreciate if you only send dogs. Thank you... 12/10 https://t.co/g2nSyGenG9 \n", "1961 This is Jamesy. He gives a kiss to every other pupper he sees on his walk. 13/10 such passion, much tender https://t.co/wk7TfysWHr \n", "1977 This is Zoey. She really likes the planet. Would hate to see willful ignorance and the denial of fairly elemental science destroy it. 13/10 https://t.co/T1xlgaPujm \n", "2014 This is Duddles. He did an attempt. 13/10 someone help him (vid by Georgia Felici) https://t.co/UDT7ZkcTgY \n", "\n", " expanded_urls \\\n", "571 https://twitter.com/dog_rates/status/678399652199309312/video/1 \n", "1186 https://twitter.com/dog_rates/status/739238157791694849/video/1 \n", "1221 https://twitter.com/dog_rates/status/744234799360020481/video/1 \n", "1641 https://twitter.com/dog_rates/status/807106840509214720/video/1 \n", "1715 https://twitter.com/dog_rates/status/819004803107983360/photo/1 \n", "1744 https://twitter.com/dog_rates/status/822872901745569793/photo/1 \n", "1932 https://twitter.com/dog_rates/status/859196978902773760/video/1 \n", "1961 https://twitter.com/dog_rates/status/866450705531457537/photo/1 \n", "1977 https://twitter.com/dog_rates/status/870374049280663552/photo/1 \n", "2014 https://twitter.com/dog_rates/status/879415818425184262/video/1 \n", "\n", " name stage retweet_count favorite_count rating stacked_stages \n", "571 None None 33355.0 81543.0 1.2 NaN \n", "1186 None doggo 61211.0 120128.0 1.3 doggo \n", "1221 None doggo 82678.0 162549.0 1.3 doggo \n", "1641 Stephan None 60299.0 125650.0 1.3 NaN \n", "1715 Bo doggo 39660.0 91294.0 1.4 doggo \n", "1744 None puppo 47136.0 138899.0 1.3 puppo \n", "1932 quite None 30403.0 89782.0 1.2 NaN \n", "1961 Jamesy pupper 35037.0 120747.0 1.3 pupper \n", "1977 Zoey None 25805.0 81213.0 1.3 NaN \n", "2014 Duddles None 42962.0 103018.0 1.3 NaN " ] }, "execution_count": 887, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The favorited tweets, including the breeds\n", "df_most_favorited" ] }, { "cell_type": "code", "execution_count": 888, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 888, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate the HTML version of this notebook\n", "from subprocess import call\n", "call(['python', '-m', 'nbconvert', 'Wrangle_act.ipynb'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }