{ "cells": [ { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": { "col": 0, "height": 4, "hidden": false, "row": 0, "width": 4 }, "report_default": { "hidden": false } } } } }, "source": [ "# Project: Wrangling and Analysis of 'WeRateDogs' tweet archive data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of contents\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Introduction\n", "\n", "This notebook shows the data wrangling and analysis of tweet archive data of a popular dog rating page on Twitter called 'WeRateDogs'. It demonstrates gathering, assessment and cleaning of the data following different methods all as part of the data wrangling process. It also demonstrates the analysis of this data and documents the various insights and visualizations.\n", "\n", "### Data description\n", "The data to be used in this project includes:\n", "\n", "#### 1. A file on hand containing tweet archive data including:\n", "\n", "##### tweet_id: \n", "unique identifier of a particular Tweet.\n", "\n", "##### in_reply_to_status_id: \n", "if the represented Tweet is a reply, this field will contain the integer representation of the original Tweet's ID.\n", "\n", "##### in_reply_to_user_id:\n", "if the represented Tweet is a reply, this field will contain the string representation of the original Tweet's author ID.\n", "\n", "##### timestamp: \n", "the date and time at which the tweet was posted\n", "\n", "##### source:\n", "utility used to post the tweet as a HTML-formatted string\n", "\n", "##### text: \n", "the actual UTF-8 text of the tweet.\n", "\n", "##### retweeted_status_id:\n", "if the represented Tweet is a retweet, this field will contain the integer representation of the original Tweet's ID\n", "\n", "##### retweeted_status_user_id:\n", "if the represented Tweet is a retweet, this field will contain the integer representation of the original Tweet's author ID\n", "\n", "##### retweeted_timestamp:\n", "the date and time at which the retweet was posted\n", "\n", "##### expanded_urls: \n", "the full url link of the tweet\n", "\n", "##### rating_numerator: \n", "the integer representation of the dog rating.\n", "\n", "##### rating_denominator:\n", "the integer representation of the overall value of the rating\n", "\n", "##### name: \n", "the name of the dog\n", "\n", "##### doggo: \n", "a big pupper usually older\n", "\n", "##### floofer:\n", "label given to a dog that is excessively fury\n", "\n", "##### pupper:\n", "a small doggo, usually younger.\n", "\n", "##### puppo:\n", "a transitional phase between pupper and doggo\n", "\n", "#### 2. A tweet image file containing tweet image prediction data including:\n", "\n", "##### tweet_id:\n", "unique identifier of a particular Tweet.\n", "\n", "##### jpg_url:\n", "url link to the image associated with the given Tweet.\n", "##### img_num:\n", "since a tweet can have multiple images, this indicates the number of the image corresponding to the most confident prediction.\n", "##### p1:\n", "p1 is the algorithm's #1 prediction for the image in the tweet\n", "##### p1_conf:\n", "p1_conf is how confident the algorithm is in its #1 prediction \n", "##### p1_dog:\n", "p1_dog is whether or not the #1 prediction is a breed of dog \n", "##### p2:\n", "p2 is the algorithm's second most likely prediction\n", "##### p2_conf:\n", "p2_conf is how confident the algorithm is in its #2 prediction\n", "##### p2_dog:\n", "p2_dog is whether or not the #2 prediction is a breed of dog\n", "##### p3:\n", "p3 is the algorithm's third most likely prediction\n", "##### p3_conf:\n", "p3_conf is how confident the algorithm is in its #3 prediction\n", "##### p3_dog:\n", "p3_dog is whether or not the #3 prediction is a breed of dog\n", "\n", "#### 3. Tweet retweet count and favorite count data including:\n", "\n", "##### tweet_id:\n", "unique identifier of a particular Tweet.\n", "##### retweet_count:\n", "the number of times a Tweet has been retweeted.\n", "##### favorite_count:\n", "the number of times a Tweet has been favorited.\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: tweepy in /opt/conda/lib/python3.6/site-packages (3.5.0)\n", "Requirement already satisfied: requests>=2.4.3 in /opt/conda/lib/python3.6/site-packages (from tweepy) (2.18.4)\n", "Requirement already satisfied: requests_oauthlib>=0.4.1 in /opt/conda/lib/python3.6/site-packages (from tweepy) (0.8.0)\n", "Requirement already satisfied: six>=1.7.3 in /opt/conda/lib/python3.6/site-packages (from tweepy) (1.11.0)\n", "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (3.0.4)\n", "Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (2.6)\n", "Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (1.22)\n", "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests>=2.4.3->tweepy) (2019.11.28)\n", "Requirement already satisfied: oauthlib>=0.6.2 in /opt/conda/lib/python3.6/site-packages (from requests_oauthlib>=0.4.1->tweepy) (2.0.6)\n" ] } ], "source": [ "#installing tweepy into the environment\n", "!pip install tweepy" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "# importing all the packages that will be required.\n", "import pandas as pd\n", "import requests\n", "import tweepy\n", "import json\n", "import numpy as np\n", "import re\n", "import functools\n", "import matplotlib.pyplot as plt\n", "% matplotlib inline\n", "import seaborn as sns\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Data Gathering\n", "\n", "1. Directly downloading the WeRateDogs Twitter archive data (twitter_archive_enhanced.csv)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": { "hidden": true }, "report_default": { "hidden": true } } } } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
0892420643555336193NaNNaN2017-08-01 16:23:56 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Phineas. He's a mystical boy. Only eve...NaNNaNNaNhttps://twitter.com/dog_rates/status/892420643...1310PhineasNoneNoneNoneNone
1892177421306343426NaNNaN2017-08-01 00:17:27 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Tilly. She's just checking pup on you....NaNNaNNaNhttps://twitter.com/dog_rates/status/892177421...1310TillyNoneNoneNoneNone
2891815181378084864NaNNaN2017-07-31 00:18:03 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Archie. He is a rare Norwegian Pouncin...NaNNaNNaNhttps://twitter.com/dog_rates/status/891815181...1210ArchieNoneNoneNoneNone
\n", "
" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "0 892420643555336193 NaN NaN \n", "1 892177421306343426 NaN NaN \n", "2 891815181378084864 NaN NaN \n", "\n", " timestamp \\\n", "0 2017-08-01 16:23:56 +0000 \n", "1 2017-08-01 00:17:27 +0000 \n", "2 2017-07-31 00:18:03 +0000 \n", "\n", " source \\\n", "0 \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urlimg_nump1p1_confp1_dogp2p2_confp2_dogp3p3_confp3_dog
0666020888022790149https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg1Welsh_springer_spaniel0.465074Truecollie0.156665TrueShetland_sheepdog0.061428True
1666029285002620928https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg1redbone0.506826Trueminiature_pinscher0.074192TrueRhodesian_ridgeback0.072010True
2666033412701032449https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg1German_shepherd0.596461Truemalinois0.138584Truebloodhound0.116197True
\n", "" ], "text/plain": [ " tweet_id jpg_url \\\n", "0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg \n", "1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg \n", "2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg \n", "\n", " img_num p1 p1_conf p1_dog p2 \\\n", "0 1 Welsh_springer_spaniel 0.465074 True collie \n", "1 1 redbone 0.506826 True miniature_pinscher \n", "2 1 German_shepherd 0.596461 True malinois \n", "\n", " p2_conf p2_dog p3 p3_conf p3_dog \n", "0 0.156665 True Shetland_sheepdog 0.061428 True \n", "1 0.074192 True Rhodesian_ridgeback 0.072010 True \n", "2 0.138584 True bloodhound 0.116197 True " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_df = pd.read_csv('image_predictions.tsv', sep = '\\t')\n", "image_df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Using the Tweepy library to query additional data via the Twitter API (tweet_json.txt)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#api\n", "api_key = \"YOU API KEY HERE\"\n", "api_secrets = \"YOUR API SECRET KEY HERE\"\n", "access_token = \"YOUR ACCESS TOKEN KEY HERE\"\n", "access_secret = \"YOUR ACCESS TOKEN SECRET HERE\"\n", " \n", "# Authenticate to Twitter\n", "auth = tweepy.OAuthHandler(api_key,api_secrets)\n", "auth.set_access_token(access_token,access_secret)\n", " \n", "api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#create list of tweet ids from archive_df tweet_id column\n", "tweet_ids = []\n", "for id in tweet_archive.tweet_id:\n", " tweet_ids.append(str(id))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2356\n" ] } ], "source": [ "print(len(tweet_ids))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rate limit reached. Sleeping for: 743\n", "Rate limit reached. Sleeping for: 743\n" ] } ], "source": [ "#create empty list for available tweets\n", "tweets = []\n", "#create empty list for unavailable tweets\n", "unavailable_tweets = []\n", "#gather each tweet's json data by id\n", "for id in tweet_ids:\n", " try:\n", " tweet = (api.get_status(id))._json\n", " tweets.append({'tweet_id':tweet['id'],'retweet_count':tweet['retweet_count'],'favorite_count':tweet['favorite_count']})\n", " except:\n", " unavailable_tweets.append(id)\n", " \n", "indices = list(range(len(tweets)))\n", "with open('tweet_json.txt', mode = 'w') as file:\n", " for i in indices\n", " file.write(json.dumps(tweets[i]['tweet_id']))\n", " file.write('\\t') \n", " file.write(json.dumps(tweets[i]['retweet_count']))\n", " file.write('\\t')\n", " file.write(json.dumps(tweets[i]['favorite_count']))\n", " file.write('\\n')\n", " \n", " \n", " " ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['created_at',\n", " 'id',\n", " 'id_str',\n", " 'text',\n", " 'truncated',\n", " 'entities',\n", " 'extended_entities',\n", " 'source',\n", " 'in_reply_to_status_id',\n", " 'in_reply_to_status_id_str',\n", " 'in_reply_to_user_id',\n", " 'in_reply_to_user_id_str',\n", " 'in_reply_to_screen_name',\n", " 'user',\n", " 'geo',\n", " 'coordinates',\n", " 'place',\n", " 'contributors',\n", " 'is_quote_status',\n", " 'retweet_count',\n", " 'favorite_count',\n", " 'favorited',\n", " 'retweeted',\n", " 'possibly_sensitive',\n", " 'possibly_sensitive_appealable',\n", " 'lang']" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check for available attributes of the json data retrieved from the api\n", "list(tweet.keys())" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idretweet_countfavorite_count
0892420643555336193697933728
1892177421306343426528029255
2891815181378084864346621987
3891689557279858688719836823
4891327558926688256772335207
\n", "
" ], "text/plain": [ " tweet_id retweet_count favorite_count\n", "0 892420643555336193 6979 33728\n", "1 892177421306343426 5280 29255\n", "2 891815181378084864 3466 21987\n", "3 891689557279858688 7198 36823\n", "4 891327558926688256 7723 35207" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#reading the 'tweet_json.txt' file into a dataframe\n", "tweet_counts = pd.read_csv('tweet_json.txt', sep ='\\t', header = None, names = ['tweet_id', 'retweet_count','favorite_count'])\n", "tweet_counts.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2326, 3)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_counts.shape" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": { "col": 4, "height": 4, "hidden": false, "row": 28, "width": 4 }, "report_default": { "hidden": false } } } } }, "source": [ "
\n", "## Assessing Data\n", "#### 1. tweet_archive dataframe" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
0892420643555336193NaNNaN2017-08-01 16:23:56 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Phineas. He's a mystical boy. Only eve...NaNNaNNaNhttps://twitter.com/dog_rates/status/892420643...1310PhineasNoneNoneNoneNone
1892177421306343426NaNNaN2017-08-01 00:17:27 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Tilly. She's just checking pup on you....NaNNaNNaNhttps://twitter.com/dog_rates/status/892177421...1310TillyNoneNoneNoneNone
2891815181378084864NaNNaN2017-07-31 00:18:03 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Archie. He is a rare Norwegian Pouncin...NaNNaNNaNhttps://twitter.com/dog_rates/status/891815181...1210ArchieNoneNoneNoneNone
3891689557279858688NaNNaN2017-07-30 15:58:51 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Darla. She commenced a snooze mid meal...NaNNaNNaNhttps://twitter.com/dog_rates/status/891689557...1310DarlaNoneNoneNoneNone
4891327558926688256NaNNaN2017-07-29 16:00:24 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Franklin. He would like you to stop ca...NaNNaNNaNhttps://twitter.com/dog_rates/status/891327558...1210FranklinNoneNoneNoneNone
5891087950875897856NaNNaN2017-07-29 00:08:17 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a majestic great white breaching ...NaNNaNNaNhttps://twitter.com/dog_rates/status/891087950...1310NoneNoneNoneNoneNone
6890971913173991426NaNNaN2017-07-28 16:27:12 +0000<a href=\"http://twitter.com/download/iphone\" r...Meet Jax. He enjoys ice cream so much he gets ...NaNNaNNaNhttps://gofundme.com/ydvmve-surgery-for-jax,ht...1310JaxNoneNoneNoneNone
7890729181411237888NaNNaN2017-07-28 00:22:40 +0000<a href=\"http://twitter.com/download/iphone\" r...When you watch your owner call another dog a g...NaNNaNNaNhttps://twitter.com/dog_rates/status/890729181...1310NoneNoneNoneNoneNone
8890609185150312448NaNNaN2017-07-27 16:25:51 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Zoey. She doesn't want to be one of th...NaNNaNNaNhttps://twitter.com/dog_rates/status/890609185...1310ZoeyNoneNoneNoneNone
9890240255349198849NaNNaN2017-07-26 15:59:51 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Cassie. She is a college pup. Studying...NaNNaNNaNhttps://twitter.com/dog_rates/status/890240255...1410CassiedoggoNoneNoneNone
10890006608113172480NaNNaN2017-07-26 00:31:25 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Koda. He is a South Australian decksha...NaNNaNNaNhttps://twitter.com/dog_rates/status/890006608...1310KodaNoneNoneNoneNone
11889880896479866881NaNNaN2017-07-25 16:11:53 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Bruno. He is a service shark. Only get...NaNNaNNaNhttps://twitter.com/dog_rates/status/889880896...1310BrunoNoneNoneNoneNone
12889665388333682689NaNNaN2017-07-25 01:55:32 +0000<a href=\"http://twitter.com/download/iphone\" r...Here's a puppo that seems to be on the fence a...NaNNaNNaNhttps://twitter.com/dog_rates/status/889665388...1310NoneNoneNoneNonepuppo
13889638837579907072NaNNaN2017-07-25 00:10:02 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Ted. He does his best. Sometimes that'...NaNNaNNaNhttps://twitter.com/dog_rates/status/889638837...1210TedNoneNoneNoneNone
14889531135344209921NaNNaN2017-07-24 17:02:04 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Stuart. He's sporting his favorite fan...NaNNaNNaNhttps://twitter.com/dog_rates/status/889531135...1310StuartNoneNoneNonepuppo
15889278841981685760NaNNaN2017-07-24 00:19:32 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Oliver. You're witnessing one of his m...NaNNaNNaNhttps://twitter.com/dog_rates/status/889278841...1310OliverNoneNoneNoneNone
16888917238123831296NaNNaN2017-07-23 00:22:39 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Jim. He found a fren. Taught him how t...NaNNaNNaNhttps://twitter.com/dog_rates/status/888917238...1210JimNoneNoneNoneNone
17888804989199671297NaNNaN2017-07-22 16:56:37 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Zeke. He has a new stick. Very proud o...NaNNaNNaNhttps://twitter.com/dog_rates/status/888804989...1310ZekeNoneNoneNoneNone
18888554962724278272NaNNaN2017-07-22 00:23:06 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Ralphus. He's powering up. Attempting ...NaNNaNNaNhttps://twitter.com/dog_rates/status/888554962...1310RalphusNoneNoneNoneNone
19888202515573088257NaNNaN2017-07-21 01:02:36 +0000<a href=\"http://twitter.com/download/iphone\" r...RT @dog_rates: This is Canela. She attempted s...8.874740e+174.196984e+092017-07-19 00:47:34 +0000https://twitter.com/dog_rates/status/887473957...1310CanelaNoneNoneNoneNone
20888078434458587136NaNNaN2017-07-20 16:49:33 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Gerald. He was just told he didn't get...NaNNaNNaNhttps://twitter.com/dog_rates/status/888078434...1210GeraldNoneNoneNoneNone
21887705289381826560NaNNaN2017-07-19 16:06:48 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Jeffrey. He has a monopoly on the pool...NaNNaNNaNhttps://twitter.com/dog_rates/status/887705289...1310JeffreyNoneNoneNoneNone
22887517139158093824NaNNaN2017-07-19 03:39:09 +0000<a href=\"http://twitter.com/download/iphone\" r...I've yet to rate a Venezuelan Hover Wiener. Th...NaNNaNNaNhttps://twitter.com/dog_rates/status/887517139...1410suchNoneNoneNoneNone
23887473957103951883NaNNaN2017-07-19 00:47:34 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Canela. She attempted some fancy porch...NaNNaNNaNhttps://twitter.com/dog_rates/status/887473957...1310CanelaNoneNoneNoneNone
24887343217045368832NaNNaN2017-07-18 16:08:03 +0000<a href=\"http://twitter.com/download/iphone\" r...You may not have known you needed to see this ...NaNNaNNaNhttps://twitter.com/dog_rates/status/887343217...1310NoneNoneNoneNoneNone
25887101392804085760NaNNaN2017-07-18 00:07:08 +0000<a href=\"http://twitter.com/download/iphone\" r...This... is a Jubilant Antarctic House Bear. We...NaNNaNNaNhttps://twitter.com/dog_rates/status/887101392...1210NoneNoneNoneNoneNone
26886983233522544640NaNNaN2017-07-17 16:17:36 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Maya. She's very shy. Rarely leaves he...NaNNaNNaNhttps://twitter.com/dog_rates/status/886983233...1310MayaNoneNoneNoneNone
27886736880519319552NaNNaN2017-07-16 23:58:41 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Mingus. He's a wonderful father to his...NaNNaNNaNhttps://www.gofundme.com/mingusneedsus,https:/...1310MingusNoneNoneNoneNone
28886680336477933568NaNNaN2017-07-16 20:14:00 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Derek. He's late for a dog meeting. 13...NaNNaNNaNhttps://twitter.com/dog_rates/status/886680336...1310DerekNoneNoneNoneNone
29886366144734445568NaNNaN2017-07-15 23:25:31 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Roscoe. Another pupper fallen victim t...NaNNaNNaNhttps://twitter.com/dog_rates/status/886366144...1210RoscoeNoneNonepupperNone
......................................................
2326666411507551481857NaNNaN2015-11-17 00:24:19 +0000<a href=\"http://twitter.com/download/iphone\" r...This is quite the dog. Gets really excited whe...NaNNaNNaNhttps://twitter.com/dog_rates/status/666411507...210quiteNoneNoneNoneNone
2327666407126856765440NaNNaN2015-11-17 00:06:54 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a southern Vesuvius bumblegruff. Can d...NaNNaNNaNhttps://twitter.com/dog_rates/status/666407126...710aNoneNoneNoneNone
2328666396247373291520NaNNaN2015-11-16 23:23:41 +0000<a href=\"http://twitter.com/download/iphone\" r...Oh goodness. A super rare northeast Qdoba kang...NaNNaNNaNhttps://twitter.com/dog_rates/status/666396247...910NoneNoneNoneNoneNone
2329666373753744588802NaNNaN2015-11-16 21:54:18 +0000<a href=\"http://twitter.com/download/iphone\" r...Those are sunglasses and a jean jacket. 11/10 ...NaNNaNNaNhttps://twitter.com/dog_rates/status/666373753...1110NoneNoneNoneNoneNone
2330666362758909284353NaNNaN2015-11-16 21:10:36 +0000<a href=\"http://twitter.com/download/iphone\" r...Unique dog here. Very small. Lives in containe...NaNNaNNaNhttps://twitter.com/dog_rates/status/666362758...610NoneNoneNoneNoneNone
2331666353288456101888NaNNaN2015-11-16 20:32:58 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a mixed Asiago from the Galápagos...NaNNaNNaNhttps://twitter.com/dog_rates/status/666353288...810NoneNoneNoneNoneNone
2332666345417576210432NaNNaN2015-11-16 20:01:42 +0000<a href=\"http://twitter.com/download/iphone\" r...Look at this jokester thinking seat belt laws ...NaNNaNNaNhttps://twitter.com/dog_rates/status/666345417...1010NoneNoneNoneNoneNone
2333666337882303524864NaNNaN2015-11-16 19:31:45 +0000<a href=\"http://twitter.com/download/iphone\" r...This is an extremely rare horned Parthenon. No...NaNNaNNaNhttps://twitter.com/dog_rates/status/666337882...910anNoneNoneNoneNone
2334666293911632134144NaNNaN2015-11-16 16:37:02 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a funny dog. Weird toes. Won't come do...NaNNaNNaNhttps://twitter.com/dog_rates/status/666293911...310aNoneNoneNoneNone
2335666287406224695296NaNNaN2015-11-16 16:11:11 +0000<a href=\"http://twitter.com/download/iphone\" r...This is an Albanian 3 1/2 legged Episcopalian...NaNNaNNaNhttps://twitter.com/dog_rates/status/666287406...12anNoneNoneNoneNone
2336666273097616637952NaNNaN2015-11-16 15:14:19 +0000<a href=\"http://twitter.com/download/iphone\" r...Can take selfies 11/10 https://t.co/ws2AMaNwPWNaNNaNNaNhttps://twitter.com/dog_rates/status/666273097...1110NoneNoneNoneNoneNone
2337666268910803644416NaNNaN2015-11-16 14:57:41 +0000<a href=\"http://twitter.com/download/iphone\" r...Very concerned about fellow dog trapped in com...NaNNaNNaNhttps://twitter.com/dog_rates/status/666268910...1010NoneNoneNoneNoneNone
2338666104133288665088NaNNaN2015-11-16 04:02:55 +0000<a href=\"http://twitter.com/download/iphone\" r...Not familiar with this breed. No tail (weird)....NaNNaNNaNhttps://twitter.com/dog_rates/status/666104133...110NoneNoneNoneNoneNone
2339666102155909144576NaNNaN2015-11-16 03:55:04 +0000<a href=\"http://twitter.com/download/iphone\" r...Oh my. Here you are seeing an Adobe Setter giv...NaNNaNNaNhttps://twitter.com/dog_rates/status/666102155...1110NoneNoneNoneNoneNone
2340666099513787052032NaNNaN2015-11-16 03:44:34 +0000<a href=\"http://twitter.com/download/iphone\" r...Can stand on stump for what seems like a while...NaNNaNNaNhttps://twitter.com/dog_rates/status/666099513...810NoneNoneNoneNoneNone
2341666094000022159362NaNNaN2015-11-16 03:22:39 +0000<a href=\"http://twitter.com/download/iphone\" r...This appears to be a Mongolian Presbyterian mi...NaNNaNNaNhttps://twitter.com/dog_rates/status/666094000...910NoneNoneNoneNoneNone
2342666082916733198337NaNNaN2015-11-16 02:38:37 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a well-established sunblockerspan...NaNNaNNaNhttps://twitter.com/dog_rates/status/666082916...610NoneNoneNoneNoneNone
2343666073100786774016NaNNaN2015-11-16 01:59:36 +0000<a href=\"http://twitter.com/download/iphone\" r...Let's hope this flight isn't Malaysian (lol). ...NaNNaNNaNhttps://twitter.com/dog_rates/status/666073100...1010NoneNoneNoneNoneNone
2344666071193221509120NaNNaN2015-11-16 01:52:02 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a northern speckled Rhododendron....NaNNaNNaNhttps://twitter.com/dog_rates/status/666071193...910NoneNoneNoneNoneNone
2345666063827256086533NaNNaN2015-11-16 01:22:45 +0000<a href=\"http://twitter.com/download/iphone\" r...This is the happiest dog you will ever see. Ve...NaNNaNNaNhttps://twitter.com/dog_rates/status/666063827...1010theNoneNoneNoneNone
2346666058600524156928NaNNaN2015-11-16 01:01:59 +0000<a href=\"http://twitter.com/download/iphone\" r...Here is the Rand Paul of retrievers folks! He'...NaNNaNNaNhttps://twitter.com/dog_rates/status/666058600...810theNoneNoneNoneNone
2347666057090499244032NaNNaN2015-11-16 00:55:59 +0000<a href=\"http://twitter.com/download/iphone\" r...My oh my. This is a rare blond Canadian terrie...NaNNaNNaNhttps://twitter.com/dog_rates/status/666057090...910aNoneNoneNoneNone
2348666055525042405380NaNNaN2015-11-16 00:49:46 +0000<a href=\"http://twitter.com/download/iphone\" r...Here is a Siberian heavily armored polar bear ...NaNNaNNaNhttps://twitter.com/dog_rates/status/666055525...1010aNoneNoneNoneNone
2349666051853826850816NaNNaN2015-11-16 00:35:11 +0000<a href=\"http://twitter.com/download/iphone\" r...This is an odd dog. Hard on the outside but lo...NaNNaNNaNhttps://twitter.com/dog_rates/status/666051853...210anNoneNoneNoneNone
2350666050758794694657NaNNaN2015-11-16 00:30:50 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a truly beautiful English Wilson Staff...NaNNaNNaNhttps://twitter.com/dog_rates/status/666050758...1010aNoneNoneNoneNone
2351666049248165822465NaNNaN2015-11-16 00:24:50 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a 1949 1st generation vulpix. Enj...NaNNaNNaNhttps://twitter.com/dog_rates/status/666049248...510NoneNoneNoneNoneNone
2352666044226329800704NaNNaN2015-11-16 00:04:52 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a purebred Piers Morgan. Loves to Netf...NaNNaNNaNhttps://twitter.com/dog_rates/status/666044226...610aNoneNoneNoneNone
2353666033412701032449NaNNaN2015-11-15 23:21:54 +0000<a href=\"http://twitter.com/download/iphone\" r...Here is a very happy pup. Big fan of well-main...NaNNaNNaNhttps://twitter.com/dog_rates/status/666033412...910aNoneNoneNoneNone
2354666029285002620928NaNNaN2015-11-15 23:05:30 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a western brown Mitsubishi terrier. Up...NaNNaNNaNhttps://twitter.com/dog_rates/status/666029285...710aNoneNoneNoneNone
2355666020888022790149NaNNaN2015-11-15 22:32:08 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a Japanese Irish Setter. Lost eye...NaNNaNNaNhttps://twitter.com/dog_rates/status/666020888...810NoneNoneNoneNoneNone
\n", "

2356 rows × 17 columns

\n", "
" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "0 892420643555336193 NaN NaN \n", "1 892177421306343426 NaN NaN \n", "2 891815181378084864 NaN NaN \n", "3 891689557279858688 NaN NaN \n", "4 891327558926688256 NaN NaN \n", "5 891087950875897856 NaN NaN \n", "6 890971913173991426 NaN NaN \n", "7 890729181411237888 NaN NaN \n", "8 890609185150312448 NaN NaN \n", "9 890240255349198849 NaN NaN \n", "10 890006608113172480 NaN NaN \n", "11 889880896479866881 NaN NaN \n", "12 889665388333682689 NaN NaN \n", "13 889638837579907072 NaN NaN \n", "14 889531135344209921 NaN NaN \n", "15 889278841981685760 NaN NaN \n", "16 888917238123831296 NaN NaN \n", "17 888804989199671297 NaN NaN \n", "18 888554962724278272 NaN NaN \n", "19 888202515573088257 NaN NaN \n", "20 888078434458587136 NaN NaN \n", "21 887705289381826560 NaN NaN \n", "22 887517139158093824 NaN NaN \n", "23 887473957103951883 NaN NaN \n", "24 887343217045368832 NaN NaN \n", "25 887101392804085760 NaN NaN \n", "26 886983233522544640 NaN NaN \n", "27 886736880519319552 NaN NaN \n", "28 886680336477933568 NaN NaN \n", "29 886366144734445568 NaN NaN \n", "... ... ... ... \n", "2326 666411507551481857 NaN NaN \n", "2327 666407126856765440 NaN NaN \n", "2328 666396247373291520 NaN NaN \n", "2329 666373753744588802 NaN NaN \n", "2330 666362758909284353 NaN NaN \n", "2331 666353288456101888 NaN NaN \n", "2332 666345417576210432 NaN NaN \n", "2333 666337882303524864 NaN NaN \n", "2334 666293911632134144 NaN NaN \n", "2335 666287406224695296 NaN NaN \n", "2336 666273097616637952 NaN NaN \n", "2337 666268910803644416 NaN NaN \n", "2338 666104133288665088 NaN NaN \n", "2339 666102155909144576 NaN NaN \n", "2340 666099513787052032 NaN NaN \n", "2341 666094000022159362 NaN NaN \n", "2342 666082916733198337 NaN NaN \n", "2343 666073100786774016 NaN NaN \n", "2344 666071193221509120 NaN NaN \n", "2345 666063827256086533 NaN NaN \n", "2346 666058600524156928 NaN NaN \n", "2347 666057090499244032 NaN NaN \n", "2348 666055525042405380 NaN NaN \n", "2349 666051853826850816 NaN NaN \n", "2350 666050758794694657 NaN NaN \n", "2351 666049248165822465 NaN NaN \n", "2352 666044226329800704 NaN NaN \n", "2353 666033412701032449 NaN NaN \n", "2354 666029285002620928 NaN NaN \n", "2355 666020888022790149 NaN NaN \n", "\n", " timestamp \\\n", "0 2017-08-01 16:23:56 +0000 \n", "1 2017-08-01 00:17:27 +0000 \n", "2 2017-07-31 00:18:03 +0000 \n", "3 2017-07-30 15:58:51 +0000 \n", "4 2017-07-29 16:00:24 +0000 \n", "5 2017-07-29 00:08:17 +0000 \n", "6 2017-07-28 16:27:12 +0000 \n", "7 2017-07-28 00:22:40 +0000 \n", "8 2017-07-27 16:25:51 +0000 \n", "9 2017-07-26 15:59:51 +0000 \n", "10 2017-07-26 00:31:25 +0000 \n", "11 2017-07-25 16:11:53 +0000 \n", "12 2017-07-25 01:55:32 +0000 \n", "13 2017-07-25 00:10:02 +0000 \n", "14 2017-07-24 17:02:04 +0000 \n", "15 2017-07-24 00:19:32 +0000 \n", "16 2017-07-23 00:22:39 +0000 \n", "17 2017-07-22 16:56:37 +0000 \n", "18 2017-07-22 00:23:06 +0000 \n", "19 2017-07-21 01:02:36 +0000 \n", "20 2017-07-20 16:49:33 +0000 \n", "21 2017-07-19 16:06:48 +0000 \n", "22 2017-07-19 03:39:09 +0000 \n", "23 2017-07-19 00:47:34 +0000 \n", "24 2017-07-18 16:08:03 +0000 \n", "25 2017-07-18 00:07:08 +0000 \n", "26 2017-07-17 16:17:36 +0000 \n", "27 2017-07-16 23:58:41 +0000 \n", "28 2017-07-16 20:14:00 +0000 \n", "29 2017-07-15 23:25:31 +0000 \n", "... ... \n", "2326 2015-11-17 00:24:19 +0000 \n", "2327 2015-11-17 00:06:54 +0000 \n", "2328 2015-11-16 23:23:41 +0000 \n", "2329 2015-11-16 21:54:18 +0000 \n", "2330 2015-11-16 21:10:36 +0000 \n", "2331 2015-11-16 20:32:58 +0000 \n", "2332 2015-11-16 20:01:42 +0000 \n", "2333 2015-11-16 19:31:45 +0000 \n", "2334 2015-11-16 16:37:02 +0000 \n", "2335 2015-11-16 16:11:11 +0000 \n", "2336 2015-11-16 15:14:19 +0000 \n", "2337 2015-11-16 14:57:41 +0000 \n", "2338 2015-11-16 04:02:55 +0000 \n", "2339 2015-11-16 03:55:04 +0000 \n", "2340 2015-11-16 03:44:34 +0000 \n", "2341 2015-11-16 03:22:39 +0000 \n", "2342 2015-11-16 02:38:37 +0000 \n", "2343 2015-11-16 01:59:36 +0000 \n", "2344 2015-11-16 01:52:02 +0000 \n", "2345 2015-11-16 01:22:45 +0000 \n", "2346 2015-11-16 01:01:59 +0000 \n", "2347 2015-11-16 00:55:59 +0000 \n", "2348 2015-11-16 00:49:46 +0000 \n", "2349 2015-11-16 00:35:11 +0000 \n", "2350 2015-11-16 00:30:50 +0000 \n", "2351 2015-11-16 00:24:50 +0000 \n", "2352 2015-11-16 00:04:52 +0000 \n", "2353 2015-11-15 23:21:54 +0000 \n", "2354 2015-11-15 23:05:30 +0000 \n", "2355 2015-11-15 22:32:08 +0000 \n", "\n", " source \\\n", "0 \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
1062741099773336379392NaNNaN2016-06-10 02:48:49 +0000<a href=\"http://vine.co\" rel=\"nofollow\">Vine -...This is Ted. He's given up. 11/10 relatable af...NaNNaNNaNhttps://vine.co/v/ixHYvdxUx1L1110TedNoneNoneNoneNone
2347666057090499244032NaNNaN2015-11-16 00:55:59 +0000<a href=\"http://twitter.com/download/iphone\" r...My oh my. This is a rare blond Canadian terrie...NaNNaNNaNhttps://twitter.com/dog_rates/status/666057090...910aNoneNoneNoneNone
555803692223237865472NaNNaN2016-11-29 20:08:52 +0000<a href=\"http://twitter.com/download/iphone\" r...RT @dog_rates: I present to you... Dog Jesus. ...6.914169e+174.196984e+092016-01-25 00:26:41 +0000https://twitter.com/dog_rates/status/691416866...1310NoneNoneNoneNoneNone
1305707387676719185920NaNNaN2016-03-09 02:08:59 +0000<a href=\"http://twitter.com/download/iphone\" r...Meet Clarkus. He's a Skinny Eastern Worcesters...NaNNaNNaNhttps://twitter.com/dog_rates/status/707387676...1010ClarkusNoneNoneNoneNone
1060741438259667034112NaNNaN2016-06-11 01:13:51 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Tucker. He's still figuring out couche...NaNNaNNaNhttps://twitter.com/dog_rates/status/741438259...910TuckerNoneNoneNoneNone
1498630795471887851546.671522e+174.196984e+092017-05-12 17:12:53 +0000<a href=\"http://twitter.com/download/iphone\" r...Ladies and gentlemen... I found Pipsy. He may ...NaNNaNNaNhttps://twitter.com/dog_rates/status/863079547...1410NoneNoneNoneNoneNone
1073739932936087216128NaNNaN2016-06-06 21:32:13 +0000<a href=\"http://twitter.com/download/iphone\" r...Say hello to Rorie. She's zen af. Just enjoyin...NaNNaNNaNhttps://twitter.com/dog_rates/status/739932936...1010RorieNoneNoneNoneNone
1828676263575653122048NaNNaN2015-12-14 04:52:55 +0000<a href=\"http://twitter.com/download/iphone\" r...All this pupper wanted to do was go skiing. No...NaNNaNNaNhttps://twitter.com/dog_rates/status/676263575...1010NoneNoneNonepupperNone
1938673906403526995968NaNNaN2015-12-07 16:46:21 +0000<a href=\"http://twitter.com/download/iphone\" r...Guys I'm getting real tired of this. We only r...NaNNaNNaNhttps://twitter.com/dog_rates/status/673906403...310NoneNoneNoneNoneNone
1054742423170473463808NaNNaN2016-06-13 18:27:32 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Bell. She likes holding hands. 12/10 w...NaNNaNNaNhttps://twitter.com/dog_rates/status/742423170...1210BellNoneNoneNoneNone
\n", "" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "1062 741099773336379392 NaN NaN \n", "2347 666057090499244032 NaN NaN \n", "555 803692223237865472 NaN NaN \n", "1305 707387676719185920 NaN NaN \n", "1060 741438259667034112 NaN NaN \n", "149 863079547188785154 6.671522e+17 4.196984e+09 \n", "1073 739932936087216128 NaN NaN \n", "1828 676263575653122048 NaN NaN \n", "1938 673906403526995968 NaN NaN \n", "1054 742423170473463808 NaN NaN \n", "\n", " timestamp \\\n", "1062 2016-06-10 02:48:49 +0000 \n", "2347 2015-11-16 00:55:59 +0000 \n", "555 2016-11-29 20:08:52 +0000 \n", "1305 2016-03-09 02:08:59 +0000 \n", "1060 2016-06-11 01:13:51 +0000 \n", "149 2017-05-12 17:12:53 +0000 \n", "1073 2016-06-06 21:32:13 +0000 \n", "1828 2015-12-14 04:52:55 +0000 \n", "1938 2015-12-07 16:46:21 +0000 \n", "1054 2016-06-13 18:27:32 +0000 \n", "\n", " source \\\n", "1062
Vine -... \n", "2347 \n", "RangeIndex: 2356 entries, 0 to 2355\n", "Data columns (total 17 columns):\n", "tweet_id 2356 non-null int64\n", "in_reply_to_status_id 78 non-null float64\n", "in_reply_to_user_id 78 non-null float64\n", "timestamp 2356 non-null object\n", "source 2356 non-null object\n", "text 2356 non-null object\n", "retweeted_status_id 181 non-null float64\n", "retweeted_status_user_id 181 non-null float64\n", "retweeted_status_timestamp 181 non-null object\n", "expanded_urls 2297 non-null object\n", "rating_numerator 2356 non-null int64\n", "rating_denominator 2356 non-null int64\n", "name 2356 non-null object\n", "doggo 2356 non-null object\n", "floofer 2356 non-null object\n", "pupper 2356 non-null object\n", "puppo 2356 non-null object\n", "dtypes: float64(4), int64(3), object(10)\n", "memory usage: 313.0+ KB\n" ] } ], "source": [ "tweet_archive.info()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
0FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
1FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
3FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
4FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
5FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
6FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
7FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
8FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
9FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
10FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
11FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
12FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
13FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
14FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
15FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
16FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
17FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
18FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
19FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
20FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
21FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
22FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
23FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
24FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
25FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
26FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
27FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
28FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
29FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
......................................................
2326FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2327FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2328FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2329FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2330FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2331FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2332FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2333FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2334FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2335FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2336FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2337FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2338FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2339FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2340FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2341FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2342FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2343FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2344FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2345FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2346FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2347FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2348FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2349FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2350FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2351FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2352FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2353FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2354FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
2355FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseFalseFalseFalse
\n", "

2356 rows × 17 columns

\n", "
" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source \\\n", "0 False True True False False \n", "1 False True True False False \n", "2 False True True False False \n", "3 False True True False False \n", "4 False True True False False \n", "5 False True True False False \n", "6 False True True False False \n", "7 False True True False False \n", "8 False True True False False \n", "9 False True True False False \n", "10 False True True False False \n", "11 False True True False False \n", "12 False True True False False \n", "13 False True True False False \n", "14 False True True False False \n", "15 False True True False False \n", "16 False True True False False \n", "17 False True True False False \n", "18 False True True False False \n", "19 False True True False False \n", "20 False True True False False \n", "21 False True True False False \n", "22 False True True False False \n", "23 False True True False False \n", "24 False True True False False \n", "25 False True True False False \n", "26 False True True False False \n", "27 False True True False False \n", "28 False True True False False \n", "29 False True True False False \n", "... ... ... ... ... ... \n", "2326 False True True False False \n", "2327 False True True False False \n", "2328 False True True False False \n", "2329 False True True False False \n", "2330 False True True False False \n", "2331 False True True False False \n", "2332 False True True False False \n", "2333 False True True False False \n", "2334 False True True False False \n", "2335 False True True False False \n", "2336 False True True False False \n", "2337 False True True False False \n", "2338 False True True False False \n", "2339 False True True False False \n", "2340 False True True False False \n", "2341 False True True False False \n", "2342 False True True False False \n", "2343 False True True False False \n", "2344 False True True False False \n", "2345 False True True False False \n", "2346 False True True False False \n", "2347 False True True False False \n", "2348 False True True False False \n", "2349 False True True False False \n", "2350 False True True False False \n", "2351 False True True False False \n", "2352 False True True False False \n", "2353 False True True False False \n", "2354 False True True False False \n", "2355 False True True False False \n", "\n", " text retweeted_status_id retweeted_status_user_id \\\n", "0 False True True \n", "1 False True True \n", "2 False True True \n", "3 False True True \n", "4 False True True \n", "5 False True True \n", "6 False True True \n", "7 False True True \n", "8 False True True \n", "9 False True True \n", "10 False True True \n", "11 False True True \n", "12 False True True \n", "13 False True True \n", "14 False True True \n", "15 False True True \n", "16 False True True \n", "17 False True True \n", "18 False True True \n", "19 False False False \n", "20 False True True \n", "21 False True True \n", "22 False True True \n", "23 False True True \n", "24 False True True \n", "25 False True True \n", "26 False True True \n", "27 False True True \n", "28 False True True \n", "29 False True True \n", "... ... ... ... \n", "2326 False True True \n", "2327 False True True \n", "2328 False True True \n", "2329 False True True \n", "2330 False True True \n", "2331 False True True \n", "2332 False True True \n", "2333 False True True \n", "2334 False True True \n", "2335 False True True \n", "2336 False True True \n", "2337 False True True \n", "2338 False True True \n", "2339 False True True \n", "2340 False True True \n", "2341 False True True \n", "2342 False True True \n", "2343 False True True \n", "2344 False True True \n", "2345 False True True \n", "2346 False True True \n", "2347 False True True \n", "2348 False True True \n", "2349 False True True \n", "2350 False True True \n", "2351 False True True \n", "2352 False True True \n", "2353 False True True \n", "2354 False True True \n", "2355 False True True \n", "\n", " retweeted_status_timestamp expanded_urls rating_numerator \\\n", "0 True False False \n", "1 True False False \n", "2 True False False \n", "3 True False False \n", "4 True False False \n", "5 True False False \n", "6 True False False \n", "7 True False False \n", "8 True False False \n", "9 True False False \n", "10 True False False \n", "11 True False False \n", "12 True False False \n", "13 True False False \n", "14 True False False \n", "15 True False False \n", "16 True False False \n", "17 True False False \n", "18 True False False \n", "19 False False False \n", "20 True False False \n", "21 True False False \n", "22 True False False \n", "23 True False False \n", "24 True False False \n", "25 True False False \n", "26 True False False \n", "27 True False False \n", "28 True False False \n", "29 True False False \n", "... ... ... ... \n", "2326 True False False \n", "2327 True False False \n", "2328 True False False \n", "2329 True False False \n", "2330 True False False \n", "2331 True False False \n", "2332 True False False \n", "2333 True False False \n", "2334 True False False \n", "2335 True False False \n", "2336 True False False \n", "2337 True False False \n", "2338 True False False \n", "2339 True False False \n", "2340 True False False \n", "2341 True False False \n", "2342 True False False \n", "2343 True False False \n", "2344 True False False \n", "2345 True False False \n", "2346 True False False \n", "2347 True False False \n", "2348 True False False \n", "2349 True False False \n", "2350 True False False \n", "2351 True False False \n", "2352 True False False \n", "2353 True False False \n", "2354 True False False \n", "2355 True False False \n", "\n", " rating_denominator name doggo floofer pupper puppo \n", "0 False False False False False False \n", "1 False False False False False False \n", "2 False False False False False False \n", "3 False False False False False False \n", "4 False False False False False False \n", "5 False False False False False False \n", "6 False False False False False False \n", "7 False False False False False False \n", "8 False False False False False False \n", "9 False False False False False False \n", "10 False False False False False False \n", "11 False False False False False False \n", "12 False False False False False False \n", "13 False False False False False False \n", "14 False False False False False False \n", "15 False False False False False False \n", "16 False False False False False False \n", "17 False False False False False False \n", "18 False False False False False False \n", "19 False False False False False False \n", "20 False False False False False False \n", "21 False False False False False False \n", "22 False False False False False False \n", "23 False False False False False False \n", "24 False False False False False False \n", "25 False False False False False False \n", "26 False False False False False False \n", "27 False False False False False False \n", "28 False False False False False False \n", "29 False False False False False False \n", "... ... ... ... ... ... ... \n", "2326 False False False False False False \n", "2327 False False False False False False \n", "2328 False False False False False False \n", "2329 False False False False False False \n", "2330 False False False False False False \n", "2331 False False False False False False \n", "2332 False False False False False False \n", "2333 False False False False False False \n", "2334 False False False False False False \n", "2335 False False False False False False \n", "2336 False False False False False False \n", "2337 False False False False False False \n", "2338 False False False False False False \n", "2339 False False False False False False \n", "2340 False False False False False False \n", "2341 False False False False False False \n", "2342 False False False False False False \n", "2343 False False False False False False \n", "2344 False False False False False False \n", "2345 False False False False False False \n", "2346 False False False False False False \n", "2347 False False False False False False \n", "2348 False False False False False False \n", "2349 False False False False False False \n", "2350 False False False False False False \n", "2351 False False False False False False \n", "2352 False False False False False False \n", "2353 False False False False False False \n", "2354 False False False False False False \n", "2355 False False False False False False \n", "\n", "[2356 rows x 17 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_archive.isnull()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Phineas\n", "1 Tilly\n", "2 Archie\n", "3 Darla\n", "4 Franklin\n", "5 None\n", "6 Jax\n", "7 None\n", "8 Zoey\n", "9 Cassie\n", "10 Koda\n", "11 Bruno\n", "12 None\n", "13 Ted\n", "14 Stuart\n", "15 Oliver\n", "16 Jim\n", "17 Zeke\n", "18 Ralphus\n", "19 Canela\n", "20 Gerald\n", "21 Jeffrey\n", "22 such\n", "23 Canela\n", "24 None\n", "25 None\n", "26 Maya\n", "27 Mingus\n", "28 Derek\n", "29 Roscoe\n", " ... \n", "2326 quite\n", "2327 a\n", "2328 None\n", "2329 None\n", "2330 None\n", "2331 None\n", "2332 None\n", "2333 an\n", "2334 a\n", "2335 an\n", "2336 None\n", "2337 None\n", "2338 None\n", "2339 None\n", "2340 None\n", "2341 None\n", "2342 None\n", "2343 None\n", "2344 None\n", "2345 the\n", "2346 the\n", "2347 a\n", "2348 a\n", "2349 an\n", "2350 a\n", "2351 None\n", "2352 a\n", "2353 a\n", "2354 a\n", "2355 None\n", "Name: name, Length: 2356, dtype: object" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_archive.name" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 745\n", "a 55\n", "Charlie 12\n", "Lucy 11\n", "Oliver 11\n", "Cooper 11\n", "Penny 10\n", "Tucker 10\n", "Lola 10\n", "Winston 9\n", "Bo 9\n", "the 8\n", "Sadie 8\n", "an 7\n", "Bailey 7\n", "Buddy 7\n", "Daisy 7\n", "Toby 7\n", "Scout 6\n", "Dave 6\n", "Jack 6\n", "Oscar 6\n", "Bella 6\n", "Koda 6\n", "Jax 6\n", "Rusty 6\n", "Milo 6\n", "Leo 6\n", "Stanley 6\n", "Chester 5\n", " ... \n", "Thor 1\n", "Rueben 1\n", "by 1\n", "Jeremy 1\n", "Bobble 1\n", "Liam 1\n", "Stella 1\n", "General 1\n", "Cheryl 1\n", "Lilli 1\n", "Travis 1\n", "Berkeley 1\n", "Banditt 1\n", "Ralphie 1\n", "Dixie 1\n", "Grizzwald 1\n", "Strudel 1\n", "Orion 1\n", "Kona 1\n", "Lupe 1\n", "Ace 1\n", "Zuzu 1\n", "Chloe 1\n", "Remy 1\n", "Rufio 1\n", "Vinscent 1\n", "Sprinkles 1\n", "Eevee 1\n", "Pancake 1\n", "Mosby 1\n", "Name: name, Length: 957, dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_archive.name.value_counts()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'actually', 'all', 'an', 'by', 'getting', 'his', 'incredibly', 'infuriating', 'just', 'life', 'light', 'mad', 'my', 'not', 'officially', 'old', 'one', 'quite', 'space', 'such', 'the', 'this', 'unacceptable', 'very']\n" ] } ], "source": [ "words = []\n", "for n in tweet_archive.name:\n", " if n[0].islower():\n", " words.append(n)\n", " \n", "other_words = list(np.unique(words))\n", "print(other_words)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 2099\n", "pupper 257\n", "Name: pupper, dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_archive.pupper.value_counts()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 2259\n", "doggo 97\n", "Name: doggo, dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_archive.doggo.value_counts()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(tweet_archive.duplicated())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. image_df dataframe" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urlimg_nump1p1_confp1_dogp2p2_confp2_dogp3p3_confp3_dog
0666020888022790149https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg1Welsh_springer_spaniel0.465074Truecollie0.156665TrueShetland_sheepdog0.061428True
1666029285002620928https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg1redbone0.506826Trueminiature_pinscher0.074192TrueRhodesian_ridgeback0.072010True
2666033412701032449https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg1German_shepherd0.596461Truemalinois0.138584Truebloodhound0.116197True
3666044226329800704https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg1Rhodesian_ridgeback0.408143Trueredbone0.360687Trueminiature_pinscher0.222752True
4666049248165822465https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg1miniature_pinscher0.560311TrueRottweiler0.243682TrueDoberman0.154629True
5666050758794694657https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg1Bernese_mountain_dog0.651137TrueEnglish_springer0.263788TrueGreater_Swiss_Mountain_dog0.016199True
6666051853826850816https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg1box_turtle0.933012Falsemud_turtle0.045885Falseterrapin0.017885False
7666055525042405380https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg1chow0.692517TrueTibetan_mastiff0.058279Truefur_coat0.054449False
8666057090499244032https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg1shopping_cart0.962465Falseshopping_basket0.014594Falsegolden_retriever0.007959True
9666058600524156928https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg1miniature_poodle0.201493Truekomondor0.192305Truesoft-coated_wheaten_terrier0.082086True
10666063827256086533https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg1golden_retriever0.775930TrueTibetan_mastiff0.093718TrueLabrador_retriever0.072427True
11666071193221509120https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg1Gordon_setter0.503672TrueYorkshire_terrier0.174201TruePekinese0.109454True
12666073100786774016https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg1Walker_hound0.260857TrueEnglish_foxhound0.175382TrueIbizan_hound0.097471True
13666082916733198337https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg1pug0.489814Truebull_mastiff0.404722TrueFrench_bulldog0.048960True
14666094000022159362https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg1bloodhound0.195217TrueGerman_shepherd0.078260Truemalinois0.075628True
15666099513787052032https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg1Lhasa0.582330TrueShih-Tzu0.166192TrueDandie_Dinmont0.089688True
16666102155909144576https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg1English_setter0.298617TrueNewfoundland0.149842Trueborzoi0.133649True
17666104133288665088https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg1hen0.965932Falsecock0.033919Falsepartridge0.000052False
18666268910803644416https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg1desktop_computer0.086502Falsedesk0.085547Falsebookcase0.079480False
19666273097616637952https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg1Italian_greyhound0.176053Truetoy_terrier0.111884Truebasenji0.111152True
20666287406224695296https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg1Maltese_dog0.857531Truetoy_poodle0.063064Trueminiature_poodle0.025581True
21666293911632134144https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg1three-toed_sloth0.914671Falseotter0.015250Falsegreat_grey_owl0.013207False
22666337882303524864https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg1ox0.416669FalseNewfoundland0.278407Truegroenendael0.102643True
23666345417576210432https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg1golden_retriever0.858744TrueChesapeake_Bay_retriever0.054787TrueLabrador_retriever0.014241True
24666353288456101888https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg1malamute0.336874TrueSiberian_husky0.147655TrueEskimo_dog0.093412True
25666362758909284353https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg1guinea_pig0.996496Falseskunk0.002402Falsehamster0.000461False
26666373753744588802https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg1soft-coated_wheaten_terrier0.326467TrueAfghan_hound0.259551Truebriard0.206803True
27666396247373291520https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg1Chihuahua0.978108Truetoy_terrier0.009397Truepapillon0.004577True
28666407126856765440https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg1black-and-tan_coonhound0.529139Truebloodhound0.244220Trueflat-coated_retriever0.173810True
29666411507551481857https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg1coho0.404640Falsebarracouta0.271485Falsegar0.189945False
.......................................
2045886366144734445568https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg1French_bulldog0.999201TrueChihuahua0.000361TrueBoston_bull0.000076True
2046886680336477933568https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg1convertible0.738995Falsesports_car0.139952Falsecar_wheel0.044173False
2047886736880519319552https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg1kuvasz0.309706TrueGreat_Pyrenees0.186136TrueDandie_Dinmont0.086346True
2048886983233522544640https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg2Chihuahua0.793469Truetoy_terrier0.143528Truecan_opener0.032253False
2049887101392804085760https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg1Samoyed0.733942TrueEskimo_dog0.035029TrueStaffordshire_bullterrier0.029705True
2050887343217045368832https://pbs.twimg.com/ext_tw_video_thumb/88734...1Mexican_hairless0.330741Truesea_lion0.275645FalseWeimaraner0.134203True
2051887473957103951883https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg2Pembroke0.809197TrueRhodesian_ridgeback0.054950Truebeagle0.038915True
2052887517139158093824https://pbs.twimg.com/ext_tw_video_thumb/88751...1limousine0.130432Falsetow_truck0.029175Falseshopping_cart0.026321False
2053887705289381826560https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg1basset0.821664Trueredbone0.087582TrueWeimaraner0.026236True
2054888078434458587136https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg1French_bulldog0.995026Truepug0.000932Truebull_mastiff0.000903True
2055888202515573088257https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg2Pembroke0.809197TrueRhodesian_ridgeback0.054950Truebeagle0.038915True
2056888554962724278272https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg3Siberian_husky0.700377TrueEskimo_dog0.166511Truemalamute0.111411True
2057888804989199671297https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg1golden_retriever0.469760TrueLabrador_retriever0.184172TrueEnglish_setter0.073482True
2058888917238123831296https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg1golden_retriever0.714719TrueTibetan_mastiff0.120184TrueLabrador_retriever0.105506True
2059889278841981685760https://pbs.twimg.com/ext_tw_video_thumb/88927...1whippet0.626152Trueborzoi0.194742TrueSaluki0.027351True
2060889531135344209921https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg1golden_retriever0.953442TrueLabrador_retriever0.013834Trueredbone0.007958True
2061889638837579907072https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg1French_bulldog0.991650Trueboxer0.002129TrueStaffordshire_bullterrier0.001498True
2062889665388333682689https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg1Pembroke0.966327TrueCardigan0.027356Truebasenji0.004633True
2063889880896479866881https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg1French_bulldog0.377417TrueLabrador_retriever0.151317Truemuzzle0.082981False
2064890006608113172480https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg1Samoyed0.957979TruePomeranian0.013884Truechow0.008167True
2065890240255349198849https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg1Pembroke0.511319TrueCardigan0.451038TrueChihuahua0.029248True
2066890609185150312448https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg1Irish_terrier0.487574TrueIrish_setter0.193054TrueChesapeake_Bay_retriever0.118184True
2067890729181411237888https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg2Pomeranian0.566142TrueEskimo_dog0.178406TruePembroke0.076507True
2068890971913173991426https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg1Appenzeller0.341703TrueBorder_collie0.199287Trueice_lolly0.193548False
2069891087950875897856https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg1Chesapeake_Bay_retriever0.425595TrueIrish_terrier0.116317TrueIndian_elephant0.076902False
2070891327558926688256https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg2basset0.555712TrueEnglish_springer0.225770TrueGerman_short-haired_pointer0.175219True
2071891689557279858688https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg1paper_towel0.170278FalseLabrador_retriever0.168086Truespatula0.040836False
2072891815181378084864https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg1Chihuahua0.716012Truemalamute0.078253Truekelpie0.031379True
2073892177421306343426https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg1Chihuahua0.323581TruePekinese0.090647Truepapillon0.068957True
2074892420643555336193https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg1orange0.097049Falsebagel0.085851Falsebanana0.076110False
\n", "

2075 rows × 12 columns

\n", "
" ], "text/plain": [ " tweet_id jpg_url \\\n", "0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg \n", "1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg \n", "2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg \n", "3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg \n", "4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg \n", "5 666050758794694657 https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg \n", "6 666051853826850816 https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg \n", "7 666055525042405380 https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg \n", "8 666057090499244032 https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg \n", "9 666058600524156928 https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg \n", "10 666063827256086533 https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg \n", "11 666071193221509120 https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg \n", "12 666073100786774016 https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg \n", "13 666082916733198337 https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg \n", "14 666094000022159362 https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg \n", "15 666099513787052032 https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg \n", "16 666102155909144576 https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg \n", "17 666104133288665088 https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg \n", "18 666268910803644416 https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg \n", "19 666273097616637952 https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg \n", "20 666287406224695296 https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg \n", "21 666293911632134144 https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg \n", "22 666337882303524864 https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg \n", "23 666345417576210432 https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg \n", "24 666353288456101888 https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg \n", "25 666362758909284353 https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg \n", "26 666373753744588802 https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg \n", "27 666396247373291520 https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg \n", "28 666407126856765440 https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg \n", "29 666411507551481857 https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg \n", "... ... ... \n", "2045 886366144734445568 https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg \n", "2046 886680336477933568 https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg \n", "2047 886736880519319552 https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg \n", "2048 886983233522544640 https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg \n", "2049 887101392804085760 https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg \n", "2050 887343217045368832 https://pbs.twimg.com/ext_tw_video_thumb/88734... \n", "2051 887473957103951883 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg \n", "2052 887517139158093824 https://pbs.twimg.com/ext_tw_video_thumb/88751... \n", "2053 887705289381826560 https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg \n", "2054 888078434458587136 https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg \n", "2055 888202515573088257 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg \n", "2056 888554962724278272 https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg \n", "2057 888804989199671297 https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg \n", "2058 888917238123831296 https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg \n", "2059 889278841981685760 https://pbs.twimg.com/ext_tw_video_thumb/88927... \n", "2060 889531135344209921 https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg \n", "2061 889638837579907072 https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg \n", "2062 889665388333682689 https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg \n", "2063 889880896479866881 https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg \n", "2064 890006608113172480 https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg \n", "2065 890240255349198849 https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg \n", "2066 890609185150312448 https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg \n", "2067 890729181411237888 https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg \n", "2068 890971913173991426 https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg \n", "2069 891087950875897856 https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg \n", "2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg \n", "2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg \n", "2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg \n", "2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg \n", "2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg \n", "\n", " img_num p1 p1_conf p1_dog \\\n", "0 1 Welsh_springer_spaniel 0.465074 True \n", "1 1 redbone 0.506826 True \n", "2 1 German_shepherd 0.596461 True \n", "3 1 Rhodesian_ridgeback 0.408143 True \n", "4 1 miniature_pinscher 0.560311 True \n", "5 1 Bernese_mountain_dog 0.651137 True \n", "6 1 box_turtle 0.933012 False \n", "7 1 chow 0.692517 True \n", "8 1 shopping_cart 0.962465 False \n", "9 1 miniature_poodle 0.201493 True \n", "10 1 golden_retriever 0.775930 True \n", "11 1 Gordon_setter 0.503672 True \n", "12 1 Walker_hound 0.260857 True \n", "13 1 pug 0.489814 True \n", "14 1 bloodhound 0.195217 True \n", "15 1 Lhasa 0.582330 True \n", "16 1 English_setter 0.298617 True \n", "17 1 hen 0.965932 False \n", "18 1 desktop_computer 0.086502 False \n", "19 1 Italian_greyhound 0.176053 True \n", "20 1 Maltese_dog 0.857531 True \n", "21 1 three-toed_sloth 0.914671 False \n", "22 1 ox 0.416669 False \n", "23 1 golden_retriever 0.858744 True \n", "24 1 malamute 0.336874 True \n", "25 1 guinea_pig 0.996496 False \n", "26 1 soft-coated_wheaten_terrier 0.326467 True \n", "27 1 Chihuahua 0.978108 True \n", "28 1 black-and-tan_coonhound 0.529139 True \n", "29 1 coho 0.404640 False \n", "... ... ... ... ... \n", "2045 1 French_bulldog 0.999201 True \n", "2046 1 convertible 0.738995 False \n", "2047 1 kuvasz 0.309706 True \n", "2048 2 Chihuahua 0.793469 True \n", "2049 1 Samoyed 0.733942 True \n", "2050 1 Mexican_hairless 0.330741 True \n", "2051 2 Pembroke 0.809197 True \n", "2052 1 limousine 0.130432 False \n", "2053 1 basset 0.821664 True \n", "2054 1 French_bulldog 0.995026 True \n", "2055 2 Pembroke 0.809197 True \n", "2056 3 Siberian_husky 0.700377 True \n", "2057 1 golden_retriever 0.469760 True \n", "2058 1 golden_retriever 0.714719 True \n", "2059 1 whippet 0.626152 True \n", "2060 1 golden_retriever 0.953442 True \n", "2061 1 French_bulldog 0.991650 True \n", "2062 1 Pembroke 0.966327 True \n", "2063 1 French_bulldog 0.377417 True \n", "2064 1 Samoyed 0.957979 True \n", "2065 1 Pembroke 0.511319 True \n", "2066 1 Irish_terrier 0.487574 True \n", "2067 2 Pomeranian 0.566142 True \n", "2068 1 Appenzeller 0.341703 True \n", "2069 1 Chesapeake_Bay_retriever 0.425595 True \n", "2070 2 basset 0.555712 True \n", "2071 1 paper_towel 0.170278 False \n", "2072 1 Chihuahua 0.716012 True \n", "2073 1 Chihuahua 0.323581 True \n", "2074 1 orange 0.097049 False \n", "\n", " p2 p2_conf p2_dog p3 \\\n", "0 collie 0.156665 True Shetland_sheepdog \n", "1 miniature_pinscher 0.074192 True Rhodesian_ridgeback \n", "2 malinois 0.138584 True bloodhound \n", "3 redbone 0.360687 True miniature_pinscher \n", "4 Rottweiler 0.243682 True Doberman \n", "5 English_springer 0.263788 True Greater_Swiss_Mountain_dog \n", "6 mud_turtle 0.045885 False terrapin \n", "7 Tibetan_mastiff 0.058279 True fur_coat \n", "8 shopping_basket 0.014594 False golden_retriever \n", "9 komondor 0.192305 True soft-coated_wheaten_terrier \n", "10 Tibetan_mastiff 0.093718 True Labrador_retriever \n", "11 Yorkshire_terrier 0.174201 True Pekinese \n", "12 English_foxhound 0.175382 True Ibizan_hound \n", "13 bull_mastiff 0.404722 True French_bulldog \n", "14 German_shepherd 0.078260 True malinois \n", "15 Shih-Tzu 0.166192 True Dandie_Dinmont \n", "16 Newfoundland 0.149842 True borzoi \n", "17 cock 0.033919 False partridge \n", "18 desk 0.085547 False bookcase \n", "19 toy_terrier 0.111884 True basenji \n", "20 toy_poodle 0.063064 True miniature_poodle \n", "21 otter 0.015250 False great_grey_owl \n", "22 Newfoundland 0.278407 True groenendael \n", "23 Chesapeake_Bay_retriever 0.054787 True Labrador_retriever \n", "24 Siberian_husky 0.147655 True Eskimo_dog \n", "25 skunk 0.002402 False hamster \n", "26 Afghan_hound 0.259551 True briard \n", "27 toy_terrier 0.009397 True papillon \n", "28 bloodhound 0.244220 True flat-coated_retriever \n", "29 barracouta 0.271485 False gar \n", "... ... ... ... ... \n", "2045 Chihuahua 0.000361 True Boston_bull \n", "2046 sports_car 0.139952 False car_wheel \n", "2047 Great_Pyrenees 0.186136 True Dandie_Dinmont \n", "2048 toy_terrier 0.143528 True can_opener \n", "2049 Eskimo_dog 0.035029 True Staffordshire_bullterrier \n", "2050 sea_lion 0.275645 False Weimaraner \n", "2051 Rhodesian_ridgeback 0.054950 True beagle \n", "2052 tow_truck 0.029175 False shopping_cart \n", "2053 redbone 0.087582 True Weimaraner \n", "2054 pug 0.000932 True bull_mastiff \n", "2055 Rhodesian_ridgeback 0.054950 True beagle \n", "2056 Eskimo_dog 0.166511 True malamute \n", "2057 Labrador_retriever 0.184172 True English_setter \n", "2058 Tibetan_mastiff 0.120184 True Labrador_retriever \n", "2059 borzoi 0.194742 True Saluki \n", "2060 Labrador_retriever 0.013834 True redbone \n", "2061 boxer 0.002129 True Staffordshire_bullterrier \n", "2062 Cardigan 0.027356 True basenji \n", "2063 Labrador_retriever 0.151317 True muzzle \n", "2064 Pomeranian 0.013884 True chow \n", "2065 Cardigan 0.451038 True Chihuahua \n", "2066 Irish_setter 0.193054 True Chesapeake_Bay_retriever \n", "2067 Eskimo_dog 0.178406 True Pembroke \n", "2068 Border_collie 0.199287 True ice_lolly \n", "2069 Irish_terrier 0.116317 True Indian_elephant \n", "2070 English_springer 0.225770 True German_short-haired_pointer \n", "2071 Labrador_retriever 0.168086 True spatula \n", "2072 malamute 0.078253 True kelpie \n", "2073 Pekinese 0.090647 True papillon \n", "2074 bagel 0.085851 False banana \n", "\n", " p3_conf p3_dog \n", "0 0.061428 True \n", "1 0.072010 True \n", "2 0.116197 True \n", "3 0.222752 True \n", "4 0.154629 True \n", "5 0.016199 True \n", "6 0.017885 False \n", "7 0.054449 False \n", "8 0.007959 True \n", "9 0.082086 True \n", "10 0.072427 True \n", "11 0.109454 True \n", "12 0.097471 True \n", "13 0.048960 True \n", "14 0.075628 True \n", "15 0.089688 True \n", "16 0.133649 True \n", "17 0.000052 False \n", "18 0.079480 False \n", "19 0.111152 True \n", "20 0.025581 True \n", "21 0.013207 False \n", "22 0.102643 True \n", "23 0.014241 True \n", "24 0.093412 True \n", "25 0.000461 False \n", "26 0.206803 True \n", "27 0.004577 True \n", "28 0.173810 True \n", "29 0.189945 False \n", "... ... ... \n", "2045 0.000076 True \n", "2046 0.044173 False \n", "2047 0.086346 True \n", "2048 0.032253 False \n", "2049 0.029705 True \n", "2050 0.134203 True \n", "2051 0.038915 True \n", "2052 0.026321 False \n", "2053 0.026236 True \n", "2054 0.000903 True \n", "2055 0.038915 True \n", "2056 0.111411 True \n", "2057 0.073482 True \n", "2058 0.105506 True \n", "2059 0.027351 True \n", "2060 0.007958 True \n", "2061 0.001498 True \n", "2062 0.004633 True \n", "2063 0.082981 False \n", "2064 0.008167 True \n", "2065 0.029248 True \n", "2066 0.118184 True \n", "2067 0.076507 True \n", "2068 0.193548 False \n", "2069 0.076902 False \n", "2070 0.175219 True \n", "2071 0.040836 False \n", "2072 0.031379 True \n", "2073 0.068957 True \n", "2074 0.076110 False \n", "\n", "[2075 rows x 12 columns]" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_df" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urlimg_nump1p1_confp1_dogp2p2_confp2_dogp3p3_confp3_dog
1881847116187444137987https://pbs.twimg.com/media/C8GPrNDW4AAkLde.jpg1white_wolf0.128935FalseAmerican_Staffordshire_terrier0.113434Truedingo0.081231False
1837837366284874571778https://pbs.twimg.com/media/C57sMJwXMAASBSx.jpg1American_Staffordshire_terrier0.660085TrueStaffordshire_bullterrier0.334947Truedalmatian0.002697True
995708149363256774660https://pbs.twimg.com/media/CdPaEkHW8AA-Wom.jpg1Cardigan0.350993Truebasset0.164555Truetoy_terrier0.080484True
481675362609739206656https://pbs.twimg.com/media/CV9etctWUAAl5Hp.jpg1Labrador_retriever0.479008Trueice_bear0.218289Falsekuvasz0.139911True
1937860276583193509888https://pbs.twimg.com/media/C_BQ_NlVwAAgYGD.jpg1lakeside0.312299Falsedock0.159842Falsecanoe0.070795False
\n", "
" ], "text/plain": [ " tweet_id jpg_url \\\n", "1881 847116187444137987 https://pbs.twimg.com/media/C8GPrNDW4AAkLde.jpg \n", "1837 837366284874571778 https://pbs.twimg.com/media/C57sMJwXMAASBSx.jpg \n", "995 708149363256774660 https://pbs.twimg.com/media/CdPaEkHW8AA-Wom.jpg \n", "481 675362609739206656 https://pbs.twimg.com/media/CV9etctWUAAl5Hp.jpg \n", "1937 860276583193509888 https://pbs.twimg.com/media/C_BQ_NlVwAAgYGD.jpg \n", "\n", " img_num p1 p1_conf p1_dog \\\n", "1881 1 white_wolf 0.128935 False \n", "1837 1 American_Staffordshire_terrier 0.660085 True \n", "995 1 Cardigan 0.350993 True \n", "481 1 Labrador_retriever 0.479008 True \n", "1937 1 lakeside 0.312299 False \n", "\n", " p2 p2_conf p2_dog p3 p3_conf \\\n", "1881 American_Staffordshire_terrier 0.113434 True dingo 0.081231 \n", "1837 Staffordshire_bullterrier 0.334947 True dalmatian 0.002697 \n", "995 basset 0.164555 True toy_terrier 0.080484 \n", "481 ice_bear 0.218289 False kuvasz 0.139911 \n", "1937 dock 0.159842 False canoe 0.070795 \n", "\n", " p3_dog \n", "1881 False \n", "1837 True \n", "995 True \n", "481 True \n", "1937 False " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_df.sample(5)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(image_df.duplicated())" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2075 entries, 0 to 2074\n", "Data columns (total 12 columns):\n", "tweet_id 2075 non-null int64\n", "jpg_url 2075 non-null object\n", "img_num 2075 non-null int64\n", "p1 2075 non-null object\n", "p1_conf 2075 non-null float64\n", "p1_dog 2075 non-null bool\n", "p2 2075 non-null object\n", "p2_conf 2075 non-null float64\n", "p2_dog 2075 non-null bool\n", "p3 2075 non-null object\n", "p3_conf 2075 non-null float64\n", "p3_dog 2075 non-null bool\n", "dtypes: bool(3), float64(3), int64(2), object(4)\n", "memory usage: 152.1+ KB\n" ] } ], "source": [ "image_df.info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#checking for non-original tweets' tweet ids in the image_df dataframe." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "img_list = (list(image_df.tweet_id))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rate limit reached. Sleeping for: 741\n", "Rate limit reached. Sleeping for: 742\n" ] } ], "source": [ "#takes approximately 30 mins\n", "#create empty list for unoriginal tweets\n", "unoriginal_tweets = []\n", "#create empty list for unavailable tweets\n", "unavailable_tweets = []\n", "#create empty list for original tweets\n", "original_tweets = []\n", "#gather each tweet's json data by id\n", "for id in img_list:\n", " try:\n", " img_tweet = (api.get_status(id))._json\n", " if pd.isna(img_tweet['in_reply_to_status_id']) is False or 'retweeted_status' in img_tweet.keys():\n", " unoriginal_tweets.append(id)\n", " elif pd.isna(img_tweet['in_reply_to_status_id']) is True and 'retweeted_status' not in img_tweet.keys():\n", " original_tweets.append(id)\n", " except:\n", " unavailable_tweets.append(id)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[667550882905632768, 667550904950915073, 669353438988365824, 671729906628341761, 674754018082705410, 674793399141146624, 674999807681908736, 675349384339542016, 675707330206547968, 675870721063669760, 684225744407494656, 684538444857667585, 692142790915014657, 694356675654983680, 695767669421768709, 703425003149250560, 704871453724954624, 705786532653883392, 711998809858043904, 729838605770891264, 746818907684614144, 746906459439529985, 752309394570878976, 754874841593970688, 757597904299253760, 757729163776290825, 759159934323924993, 761371037149827077, 761750502866649088, 766078092750233600, 770093767776997377, 771171053431250945, 772615324260794368, 775898661951791106, 776819012571455488, 777641927919427584, 778396591732486144, 780476555013349377, 780496263422808064, 782021823840026624, 783347506784731136, 786036967502913536, 788070120937619456, 790723298204217344, 791026214425268224, 793614319594401792, 794355576146903043, 794983741416415232, 796177847564038144, 798340744599797760, 798628517273620480, 798644042770751489, 798665375516884993, 798673117451325440, 798694562394996736, 798697898615730177, 799774291445383169, 800443802682937345, 802265048156610565, 802624713319034886, 803692223237865472, 804413760345620481, 805958939288408065, 806242860592926720, 807059379405148160, 808134635716833280, 809808892968534016, 813944609378369540, 816014286006976512, 816829038950027264, 817181837579653120, 818588835076603904, 819015331746349057, 819015337530290176, 820446719150292993, 821813639212650496, 822647212903690241, 823269594223824897, 824796380199809024, 829878982036299777, 832040443403784192, 832215726631055365, 832769181346996225, 838916489579200512, 839290600511926273, 841833993020538882, 844979544864018432, 847971574464610304, 856526610513747968, 860924035999428608, 863079547188785154, 867072653475098625, 877611172832227328, 885311592912609280]\n" ] } ], "source": [ "print(unoriginal_tweets)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[680055455951884288, 754011816964026368, 759566828574212096, 759923798737051648, 771004394259247104, 779123168116150273, 802247111496568832, 829374341691346946, 837012587749474308, 837366284874571778, 842892208864923648, 844704788403113984, 851861385021730816, 851953902622658560, 861769973181624320, 872261713294495745, 873697596434513921, 888202515573088257]\n" ] } ], "source": [ "print(unavailable_tweets)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. tweet_counts dataframe" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idretweet_countfavorite_count
0892420643555336193697933728
1892177421306343426528029255
2891815181378084864346621987
3891689557279858688719836823
4891327558926688256772335207
\n", "
" ], "text/plain": [ " tweet_id retweet_count favorite_count\n", "0 892420643555336193 6979 33728\n", "1 892177421306343426 5280 29255\n", "2 891815181378084864 3466 21987\n", "3 891689557279858688 7198 36823\n", "4 891327558926688256 7723 35207" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_counts.head()" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2326 entries, 0 to 2325\n", "Data columns (total 3 columns):\n", "tweet_id 2326 non-null int64\n", "retweet_count 2326 non-null int64\n", "favorite_count 2326 non-null int64\n", "dtypes: int64(3)\n", "memory usage: 54.6 KB\n" ] } ], "source": [ "tweet_counts.info()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idretweet_countfavorite_count
count2.326000e+032326.0000002326.000000
mean7.417480e+172457.8671547018.926913
std6.818802e+164166.19766110908.789726
min6.660209e+171.0000000.000000
25%6.780814e+17492.2500001219.500000
50%7.178159e+171145.5000003040.500000
75%7.986402e+172843.5000008562.750000
max8.924206e+1770429.000000144312.000000
\n", "
" ], "text/plain": [ " tweet_id retweet_count favorite_count\n", "count 2.326000e+03 2326.000000 2326.000000\n", "mean 7.417480e+17 2457.867154 7018.926913\n", "std 6.818802e+16 4166.197661 10908.789726\n", "min 6.660209e+17 1.000000 0.000000\n", "25% 6.780814e+17 492.250000 1219.500000\n", "50% 7.178159e+17 1145.500000 3040.500000\n", "75% 7.986402e+17 2843.500000 8562.750000\n", "max 8.924206e+17 70429.000000 144312.000000" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_counts.describe()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(tweet_counts.duplicated())" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17 tweet_id\n", "20 tweet_id\n", "dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_columns = pd.Series(list(tweet_archive) + list(tweet_counts) + list(image_df))\n", "all_columns[all_columns.duplicated()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quality issues\n", "#### `tweet_archive table`\n", "\n", "1. 'name', 'doggo', 'floofer', 'pupper' and 'puppo' columns have missing values misrepresented as strings('None')\n", "\n", "2. 'name' column has non-valid words (adjectives,articles,adverbs) as values in some entries\n", "\n", "3. 'name', 'doggo', 'floofer', 'pupper' and 'puppo' columns have missing values.\n", "\n", "4. presence of some records with non null values in the 'retweeted_status_id' column,these are not original tweets\n", "\n", "5. presence of some records with non null values in 'reply_to_status_id' column, these are not original tweets\n", "\n", "6. records dating beyond 2017-08-01\n", "\n", "7. 'time_stamp' column is an object datatype\n", "\n", "#### `image_df table`\n", "\n", "8. records with 'false' values in all 3 of the dog prediction columns (p1_dog, p2_dog, p3_dog),these are not dogs\n", "9. non-original tweet records\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": { "col": 0, "height": 7, "hidden": false, "row": 40, "width": 12 }, "report_default": { "hidden": false } } } } }, "source": [ "### Tidiness issues\n", "1. doggo,pupper,puppo,floofer are represented as different column headers instead of individual values of one column(tweet_archive table)\n", "\n", "2. dog breeds are spread out in separate columns(image_df table)\n", "\n", "3. data is in separate tables." ] }, { "cell_type": "markdown", "metadata": { "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": { "col": 4, "height": 4, "hidden": false, "row": 32, "width": 4 }, "report_default": { "hidden": false } } } } }, "source": [ "
\n", "## Cleaning Data" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "# Make copies of original pieces of data\n", "tweetarch_clean = tweet_archive.copy()\n", "img_clean = image_df.copy()\n", "tweetcounts_clean = tweet_counts.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quality issue #1:\n", "#### tweet_archive table:\n", "#### 'name', 'doggo', 'floofer', 'pupper' and 'puppo' columns have missing values misrepresented as strings('None')\n", "\n", "### Quality issue #2:\n", "#### tweet_archive table:\n", "#### 'name' column has non-valid words (adjectives,articles,adverbs) as values in some entries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define: \n", "Use the .replace() function to replace the specified values from the assessment with null values.This fixes both issue 1 and 2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [], "source": [ "other_words.append('None')\n", "for word in other_words:\n", " tweetarch_clean.replace(to_replace = word , value = np.nan, inplace = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
623FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseTrueTrueTrueTrue
889FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseFalseTrueFalseTrue
976FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseTrueTrueTrueTrue
113FalseFalseFalseFalseFalseFalseTrueTrueTrueTrueFalseFalseTrueTrueTrueTrueTrue
639FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseTrueTrueTrueTrueTrue
2036FalseFalseFalseFalseFalseFalseTrueTrueTrueFalseFalseFalseTrueTrueTrueTrueTrue
574FalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueTrueTrue
1412FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseTrueTrueTrueTrue
1756FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseFalseTrueTrueTrueTrue
1707FalseTrueTrueFalseFalseFalseTrueTrueTrueFalseFalseFalseTrueTrueTrueTrueTrue
\n", "
" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source \\\n", "623 False True True False False \n", "889 False True True False False \n", "976 False True True False False \n", "113 False False False False False \n", "639 False True True False False \n", "2036 False False False False False \n", "574 False True True False False \n", "1412 False True True False False \n", "1756 False True True False False \n", "1707 False True True False False \n", "\n", " text retweeted_status_id retweeted_status_user_id \\\n", "623 False True True \n", "889 False True True \n", "976 False True True \n", "113 False True True \n", "639 False True True \n", "2036 False True True \n", "574 False False False \n", "1412 False True True \n", "1756 False True True \n", "1707 False True True \n", "\n", " retweeted_status_timestamp expanded_urls rating_numerator \\\n", "623 True False False \n", "889 True False False \n", "976 True False False \n", "113 True True False \n", "639 True False False \n", "2036 True False False \n", "574 False False False \n", "1412 True False False \n", "1756 True False False \n", "1707 True False False \n", "\n", " rating_denominator name doggo floofer pupper puppo \n", "623 False False True True True True \n", "889 False False False True False True \n", "976 False False True True True True \n", "113 False True True True True True \n", "639 False True True True True True \n", "2036 False True True True True True \n", "574 False False False True True True \n", "1412 False False True True True True \n", "1756 False False True True True True \n", "1707 False True True True True True " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweetarch_clean.isnull().sample(10)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Charlie 12\n", "Cooper 11\n", "Oliver 11\n", "Lucy 11\n", "Tucker 10\n", "Penny 10\n", "Lola 10\n", "Bo 9\n", "Winston 9\n", "Sadie 8\n", "Bailey 7\n", "Toby 7\n", "Buddy 7\n", "Daisy 7\n", "Stanley 6\n", "Jax 6\n", "Leo 6\n", "Koda 6\n", "Oscar 6\n", "Bella 6\n", "Dave 6\n", "Rusty 6\n", "Milo 6\n", "Scout 6\n", "Jack 6\n", "Louis 5\n", "George 5\n", "Gus 5\n", "Sammy 5\n", "Phil 5\n", " ..\n", "Blipson 1\n", "Kara 1\n", "Cuddles 1\n", "Walker 1\n", "Rorie 1\n", "Lolo 1\n", "Rontu 1\n", "Asher 1\n", "Gerbald 1\n", "Dook 1\n", "Todo 1\n", "Brudge 1\n", "Swagger 1\n", "Rinna 1\n", "Willy 1\n", "Tupawc 1\n", "Ember 1\n", "Bradley 1\n", "Eugene 1\n", "Fido 1\n", "Iggy 1\n", "Pherb 1\n", "Jeb 1\n", "Monty 1\n", "Tess 1\n", "Chloe 1\n", "Chuck 1\n", "Tayzie 1\n", "Aqua 1\n", "Dug 1\n", "Name: name, Length: 931, dtype: int64" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweetarch_clean.name.value_counts()" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [tweet_id, in_reply_to_status_id, in_reply_to_user_id, timestamp, source, text, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp, expanded_urls, rating_numerator, rating_denominator, name, doggo, floofer, pupper, puppo]\n", "Index: []" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweetarch_clean.query(f'name == {other_words}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quality issue #3:\n", "#### tweet_archive table:\n", "#### 'name', 'doggo', 'floofer', 'pupper' and 'puppo' columns have missing values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define: \n", "Use the values of the text column and regex methods to extract the required values for these columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "#create a function to find and replace null values in the dog stages\n", "\n", "def fill_dog_stage(stage):\n", " \n", " '''loops through all the values of the specified column,searches for null values, which if True,\n", " loops through the corresponding values of the text column, searches for the appropriate value and replaces this value\n", " into the specified position'''\n", " \n", " indices = list(range(len(tweetarch_clean[stage])))\n", " for i in indices:\n", " if pd.isna(tweetarch_clean[stage][i]):\n", " try:\n", " tweetarch_clean.loc[i,stage] = re.findall(stage,tweetarch_clean.text[i])[0]\n", " except:\n", " tweetarch_clean.loc[i,stage] = np.nan" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [], "source": [ "#create a list of the dog stages loop through each one, passing to the function\n", "\n", "stages = ['pupper','puppo','doggo','floofer']\n", "\n", "for stage in stages:\n", " fill_dog_stage(stage)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2356 entries, 0 to 2355\n", "Data columns (total 17 columns):\n", "tweet_id 2356 non-null int64\n", "in_reply_to_status_id 78 non-null float64\n", "in_reply_to_user_id 78 non-null float64\n", "timestamp 2356 non-null object\n", "source 2356 non-null object\n", "text 2356 non-null object\n", "retweeted_status_id 181 non-null float64\n", "retweeted_status_user_id 181 non-null float64\n", "retweeted_status_timestamp 181 non-null object\n", "expanded_urls 2297 non-null object\n", "rating_numerator 2356 non-null int64\n", "rating_denominator 2356 non-null int64\n", "name 1502 non-null object\n", "doggo 107 non-null object\n", "floofer 10 non-null object\n", "pupper 281 non-null object\n", "puppo 38 non-null object\n", "dtypes: float64(4), int64(3), object(10)\n", "memory usage: 313.0+ KB\n" ] } ], "source": [ "tweetarch_clean.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tidiness issue #1: \n", "#### tweet_archive table:\n", "#### doggo,pupper,puppo,floofer are represented as different column headers instead of individual values of one column" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": { "hidden": true }, "report_default": { "hidden": true } } } } }, "source": [ "#### Define:\n", "Create a new column for dog stages and loop through each of these columns for appropriate values for the dog stage column to unpivot the three columns. Drop the three columns when done." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [], "source": [ "dog_stages = [] #doggo,floofer,pupper,puppo\n", "indices = list(range(len(tweetarch_clean)))\n", "for i in indices:\n", " if not pd.isna(tweetarch_clean.floofer[i]):\n", " dog_stages.append('floofer')\n", " elif not pd.isna(tweetarch_clean.puppo[i]):\n", " dog_stages.append('puppo')\n", " elif not pd.isna(tweetarch_clean.pupper[i]):\n", " dog_stages.append('pupper')\n", " elif not pd.isna(tweetarch_clean.doggo[i]):\n", " dog_stages.append('doggo')\n", " else:\n", " dog_stages.append(np.nan)" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [], "source": [ "tweetarch_clean['dog_stage'] = dog_stages" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [], "source": [ "tweetarch_clean = tweetarch_clean.drop(['pupper','doggo','puppo','floofer'], axis = 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 281\n", "doggo 92\n", "puppo 38\n", "floofer 10\n", "Name: dog_stage, dtype: int64" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweetarch_clean.dog_stage.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that some records had values for either both doggo & pupper, doggo & puppo and doggo & floofer.For streamlining purposes, one had to be chosen over the other, hence the slight changes in some of the dog stage value counts. I chose the other stages over 'doggo', placing it last in the loop statement because this term is generally used more loosely relative to the rest, as per my personal judgement." ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['tweet_id', 'in_reply_to_status_id', 'in_reply_to_user_id', 'timestamp',\n", " 'source', 'text', 'retweeted_status_id', 'retweeted_status_user_id',\n", " 'retweeted_status_timestamp', 'expanded_urls', 'rating_numerator',\n", " 'rating_denominator', 'name', 'dog_stage'],\n", " dtype='object')" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweetarch_clean.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quality issue #9:\n", "#### image_df table:\n", "#### non-original tweet records" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define:\n", "Delete all the non-original tweets using the original_tweets ids list and vectorization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [], "source": [ "for id in img_clean.tweet_id:\n", " if id not in original_tweets:\n", " img_clean.drop(img_clean[img_clean['tweet_id'] == id].index, inplace = True)" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [], "source": [ "#reset the index\n", "img_clean.reset_index(inplace = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#should evaluate to True\n", "list(img_clean.tweet_id) == original_tweets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tidiness issue #2:\n", "#### image_df table:\n", "#### dog breeds are spread out in separate columns\n", "\n", "### Quality issue#8:\n", "#### image_df table:\n", "#### records with 'false' values in all 3 of the dog prediction columns (p1_dog, p2_dog, p3_dog),these are not dogs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define:\n", "Create a new column for dog breeds and use the p and p_dog columns to populate the column, then drop all rows with null values in the dog breeds column. This also takes care of issue `#8`\n", "Then drop the columns that will no longer be needed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [], "source": [ "breeds = []\n", "indices = list(range(len(img_clean.tweet_id)))\n", "for i in indices:\n", " if img_clean.p1_dog[i]:\n", " breeds.append(img_clean.p1[i])\n", " elif img_clean.p2_dog[i]:\n", " breeds.append(img_clean.p2[i])\n", " elif img_clean.p3_dog[i]:\n", " breeds.append(img_clean.p3[i])\n", " else:\n", " breeds.append(np.nan)" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [], "source": [ "img_clean['dog_breed'] = breeds" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [], "source": [ "img_clean.dropna(inplace = True)" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [], "source": [ "img_clean = img_clean.drop([\"img_num\",'p1','p1_conf','p1_dog','p2','p2_conf','p2_dog','p3','p3_conf','p3_dog'],axis = 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['index', 'tweet_id', 'jpg_url', 'dog_breed'], dtype='object')" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "img_clean.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quality issue #4:\n", "#### tweet_archive table:\n", "#### presence of some records with non null values in the 'retweeted_status_id' column,these are not original tweets\n", "\n", "\n", "### Quality issue #5:\n", "#### tweet_archive table:\n", "#### presence of some records with non null values in 'reply_to_status_id' column, these are not original tweets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define:\n", "Filter out these records using boolean indexing. Afterwards drop associated columns and any other columns that will not be further used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [], "source": [ "tweetarch_clean = tweetarch_clean.loc[pd.isna(tweetarch_clean['in_reply_to_status_id'])]" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [], "source": [ "tweetarch_clean = tweetarch_clean.loc[pd.isna(tweetarch_clean['retweeted_status_id'])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 2097 entries, 0 to 2355\n", "Data columns (total 14 columns):\n", "tweet_id 2097 non-null int64\n", "in_reply_to_status_id 0 non-null float64\n", "in_reply_to_user_id 0 non-null float64\n", "timestamp 2097 non-null object\n", "source 2097 non-null object\n", "text 2097 non-null object\n", "retweeted_status_id 0 non-null float64\n", "retweeted_status_user_id 0 non-null float64\n", "retweeted_status_timestamp 0 non-null object\n", "expanded_urls 2094 non-null object\n", "rating_numerator 2097 non-null int64\n", "rating_denominator 2097 non-null int64\n", "name 1390 non-null object\n", "dog_stage 372 non-null object\n", "dtypes: float64(4), int64(3), object(7)\n", "memory usage: 245.7+ KB\n" ] } ], "source": [ "tweetarch_clean.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 134, "metadata": {}, "outputs": [], "source": [ "tweetarch_clean = tweetarch_clean.drop(['in_reply_to_status_id', 'in_reply_to_user_id','retweeted_status_id', 'retweeted_status_user_id','retweeted_status_timestamp','source','expanded_urls',], axis = 1 )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 2097 entries, 0 to 2355\n", "Data columns (total 7 columns):\n", "tweet_id 2097 non-null int64\n", "timestamp 2097 non-null object\n", "text 2097 non-null object\n", "rating_numerator 2097 non-null int64\n", "rating_denominator 2097 non-null int64\n", "name 1390 non-null object\n", "dog_stage 372 non-null object\n", "dtypes: int64(3), object(4)\n", "memory usage: 131.1+ KB\n" ] } ], "source": [ "tweetarch_clean.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tidiness issue #3: \n", "#### Data is in separate tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define:\n", "Merge the three dataframes on the tweet_id column to make one master dataframe." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 136, "metadata": {}, "outputs": [], "source": [ "data_frames = [img_clean,tweetarch_clean,tweetcounts_clean]\n", "master_df = functools.reduce(lambda left,right: pd.merge(left,right,on=['tweet_id'],\n", " how='left'), data_frames)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urldog_breedtimestamptextrating_numeratorrating_denominatornamedog_stageretweet_countfavorite_count
0666020888022790149https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpgWelsh_springer_spaniel2015-11-15 22:32:08 +0000Here we have a Japanese Irish Setter. Lost eye...8.010.0NaNNaN421.02285.0
1666029285002620928https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpgredbone2015-11-15 23:05:30 +0000This is a western brown Mitsubishi terrier. Up...7.010.0NaNNaN39.0112.0
2666033412701032449https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpgGerman_shepherd2015-11-15 23:21:54 +0000Here is a very happy pup. Big fan of well-main...9.010.0NaNNaN36.0100.0
3666044226329800704https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpgRhodesian_ridgeback2015-11-16 00:04:52 +0000This is a purebred Piers Morgan. Loves to Netf...6.010.0NaNNaN115.0245.0
4666049248165822465https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpgminiature_pinscher2015-11-16 00:24:50 +0000Here we have a 1949 1st generation vulpix. Enj...5.010.0NaNNaN36.088.0
5666050758794694657https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpgBernese_mountain_dog2015-11-16 00:30:50 +0000This is a truly beautiful English Wilson Staff...10.010.0NaNNaN50.0115.0
6666055525042405380https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpgchow2015-11-16 00:49:46 +0000Here is a Siberian heavily armored polar bear ...10.010.0NaNNaN196.0367.0
7666057090499244032https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpggolden_retriever2015-11-16 00:55:59 +0000My oh my. This is a rare blond Canadian terrie...9.010.0NaNNaN112.0247.0
8666058600524156928https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpgminiature_poodle2015-11-16 01:01:59 +0000Here is the Rand Paul of retrievers folks! He'...8.010.0NaNNaN47.099.0
9666063827256086533https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpggolden_retriever2015-11-16 01:22:45 +0000This is the happiest dog you will ever see. Ve...10.010.0NaNNaN180.0395.0
10666071193221509120https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpgGordon_setter2015-11-16 01:52:02 +0000Here we have a northern speckled Rhododendron....9.010.0NaNNaN51.0127.0
11666073100786774016https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpgWalker_hound2015-11-16 01:59:36 +0000Let's hope this flight isn't Malaysian (lol). ...10.010.0NaNNaN130.0274.0
12666082916733198337https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpgpug2015-11-16 02:38:37 +0000Here we have a well-established sunblockerspan...6.010.0NaNNaN37.092.0
13666094000022159362https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpgbloodhound2015-11-16 03:22:39 +0000This appears to be a Mongolian Presbyterian mi...9.010.0NaNNaN63.0142.0
14666099513787052032https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpgLhasa2015-11-16 03:44:34 +0000Can stand on stump for what seems like a while...8.010.0NaNNaN53.0133.0
15666102155909144576https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpgEnglish_setter2015-11-16 03:55:04 +0000Oh my. Here you are seeing an Adobe Setter giv...11.010.0NaNNaN11.066.0
16666273097616637952https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpgItalian_greyhound2015-11-16 15:14:19 +0000Can take selfies 11/10 https://t.co/ws2AMaNwPW11.010.0NaNNaN66.0151.0
17666287406224695296https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpgMaltese_dog2015-11-16 16:11:11 +0000This is an Albanian 3 1/2 legged Episcopalian...1.02.0NaNNaN56.0123.0
18666337882303524864https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpgNewfoundland2015-11-16 19:31:45 +0000This is an extremely rare horned Parthenon. No...9.010.0NaNNaN79.0168.0
19666345417576210432https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpggolden_retriever2015-11-16 20:01:42 +0000Look at this jokester thinking seat belt laws ...10.010.0NaNNaN122.0241.0
20666353288456101888https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpgmalamute2015-11-16 20:32:58 +0000Here we have a mixed Asiago from the Galápagos...8.010.0NaNNaN56.0179.0
21666373753744588802https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpgsoft-coated_wheaten_terrier2015-11-16 21:54:18 +0000Those are sunglasses and a jean jacket. 11/10 ...11.010.0NaNNaN73.0162.0
22666396247373291520https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpgChihuahua2015-11-16 23:23:41 +0000Oh goodness. A super rare northeast Qdoba kang...9.010.0NaNNaN68.0147.0
23666407126856765440https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpgblack-and-tan_coonhound2015-11-17 00:06:54 +0000This is a southern Vesuvius bumblegruff. Can d...7.010.0NaNNaN30.093.0
24666418789513326592https://pbs.twimg.com/media/CT-YWb7U8AA7QnN.jpgtoy_terrier2015-11-17 00:53:15 +0000This is Walter. He is an Alaskan Terrapin. Lov...10.010.0WalterNaN39.0107.0
25666421158376562688https://pbs.twimg.com/media/CT-aggCXAAIMfT3.jpgBlenheim_spaniel2015-11-17 01:02:40 +0000*internally screaming* 12/10 https://t.co/YMcr...12.010.0NaNNaN91.0272.0
26666428276349472768https://pbs.twimg.com/media/CT-g-0DUwAEQdSn.jpgPembroke2015-11-17 01:30:57 +0000Here we have an Austrian Pulitzer. Collectors ...7.010.0NaNNaN67.0139.0
27666430724426358785https://pbs.twimg.com/media/CT-jNYqW4AAPi2M.jpgIrish_terrier2015-11-17 01:40:41 +0000Oh boy what a pup! Sunglasses take this one to...6.010.0NaNNaN160.0276.0
28666435652385423360https://pbs.twimg.com/media/CT-nsTQWEAEkyDn.jpgChesapeake_Bay_retriever2015-11-17 02:00:15 +0000\"Can you behave? You're ruining my wedding day...10.010.0NaNNaN42.0137.0
29666437273139982337https://pbs.twimg.com/media/CT-pKmRWIAAxUWj.jpgChihuahua2015-11-17 02:06:42 +0000Here we see a lone northeastern Cumberbatch. H...7.010.0NaNNaN40.0106.0
....................................
1721885528943205470208https://pbs.twimg.com/media/DEoH3yvXgAAzQtS.jpgpug2017-07-13 15:58:47 +0000This is Maisey. She fell asleep mid-excavation...13.010.0MaiseyNaN5311.031522.0
1722885984800019947520https://pbs.twimg.com/media/DEumeWWV0AA-Z61.jpgBlenheim_spaniel2017-07-14 22:10:11 +0000Viewer discretion advised. This is Jimbo. He w...12.010.0JimboNaN5592.028526.0
1723886258384151887873https://pbs.twimg.com/media/DEyfTG4UMAE4aE9.jpgpug2017-07-15 16:17:19 +0000This is Waffles. His doggles are pupside down....13.010.0WafflesNaN5262.024459.0
1724886366144734445568https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpgFrench_bulldog2017-07-15 23:25:31 +0000This is Roscoe. Another pupper fallen victim t...12.010.0Roscoepupper2614.018501.0
1725886736880519319552https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpgkuvasz2017-07-16 23:58:41 +0000This is Mingus. He's a wonderful father to his...13.010.0MingusNaN2619.010466.0
1726886983233522544640https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpgChihuahua2017-07-17 16:17:36 +0000This is Maya. She's very shy. Rarely leaves he...13.010.0MayaNaN6294.030275.0
1727887101392804085760https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpgSamoyed2017-07-18 00:07:08 +0000This... is a Jubilant Antarctic House Bear. We...12.010.0NaNNaN4975.026916.0
1728887343217045368832https://pbs.twimg.com/ext_tw_video_thumb/88734...Mexican_hairless2017-07-18 16:08:03 +0000You may not have known you needed to see this ...13.010.0NaNNaN8788.029515.0
1729887473957103951883https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpgPembroke2017-07-19 00:47:34 +0000This is Canela. She attempted some fancy porch...13.010.0CanelaNaN14973.060028.0
1730887705289381826560https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpgbasset2017-07-19 16:06:48 +0000This is Jeffrey. He has a monopoly on the pool...13.010.0JeffreyNaN4522.026553.0
1731888078434458587136https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpgFrench_bulldog2017-07-20 16:49:33 +0000This is Gerald. He was just told he didn't get...12.010.0GeraldNaN2886.019115.0
1732888202515573088257https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpgPembrokeNaNNaNNaNNaNNaNNaNNaNNaN
1733888554962724278272https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpgSiberian_husky2017-07-22 00:23:06 +0000This is Ralphus. He's powering up. Attempting ...13.010.0RalphusNaN2868.017266.0
1734888804989199671297https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpggolden_retriever2017-07-22 16:56:37 +0000This is Zeke. He has a new stick. Very proud o...13.010.0ZekeNaN3518.022409.0
1735888917238123831296https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpggolden_retriever2017-07-23 00:22:39 +0000This is Jim. He found a fren. Taught him how t...12.010.0JimNaN3747.025551.0
1736889278841981685760https://pbs.twimg.com/ext_tw_video_thumb/88927...whippet2017-07-24 00:19:32 +0000This is Oliver. You're witnessing one of his m...13.010.0OliverNaN4426.022052.0
1737889531135344209921https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpggolden_retriever2017-07-24 17:02:04 +0000This is Stuart. He's sporting his favorite fan...13.010.0Stuartpuppo1874.013321.0
1738889638837579907072https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpgFrench_bulldog2017-07-25 00:10:02 +0000This is Ted. He does his best. Sometimes that'...12.010.0TedNaN3702.023605.0
1739889665388333682689https://pbs.twimg.com/media/DFi579UWsAAatzw.jpgPembroke2017-07-25 01:55:32 +0000Here's a puppo that seems to be on the fence a...13.010.0NaNpuppo8312.041910.0
1740889880896479866881https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpgFrench_bulldog2017-07-25 16:11:53 +0000This is Bruno. He is a service shark. Only get...13.010.0BrunoNaN4145.024506.0
1741890006608113172480https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpgSamoyed2017-07-26 00:31:25 +0000This is Koda. He is a South Australian decksha...13.010.0KodaNaN6120.026973.0
1742890240255349198849https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpgPembroke2017-07-26 15:59:51 +0000This is Cassie. She is a college pup. Studying...14.010.0Cassiedoggo6081.027878.0
1743890609185150312448https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpgIrish_terrier2017-07-27 16:25:51 +0000This is Zoey. She doesn't want to be one of th...13.010.0ZoeyNaN3605.024455.0
1744890729181411237888https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpgPomeranian2017-07-28 00:22:40 +0000When you watch your owner call another dog a g...13.010.0NaNNaN15695.056701.0
1745890971913173991426https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpgAppenzeller2017-07-28 16:27:12 +0000Meet Jax. He enjoys ice cream so much he gets ...13.010.0JaxNaN1649.010340.0
1746891087950875897856https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpgChesapeake_Bay_retriever2017-07-29 00:08:17 +0000Here we have a majestic great white breaching ...13.010.0NaNNaN2590.017757.0
1747891327558926688256https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpgbasset2017-07-29 16:00:24 +0000This is Franklin. He would like you to stop ca...12.010.0FranklinNaN7723.035207.0
1748891689557279858688https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpgLabrador_retriever2017-07-30 15:58:51 +0000This is Darla. She commenced a snooze mid meal...13.010.0DarlaNaN7198.036823.0
1749891815181378084864https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpgChihuahua2017-07-31 00:18:03 +0000This is Archie. He is a rare Norwegian Pouncin...12.010.0ArchieNaN3466.021987.0
1750892177421306343426https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpgChihuahua2017-08-01 00:17:27 +0000This is Tilly. She's just checking pup on you....13.010.0TillyNaN5280.029255.0
\n", "

1751 rows × 11 columns

\n", "
" ], "text/plain": [ " tweet_id jpg_url \\\n", "0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg \n", "1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg \n", "2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg \n", "3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg \n", "4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg \n", "5 666050758794694657 https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg \n", "6 666055525042405380 https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg \n", "7 666057090499244032 https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg \n", "8 666058600524156928 https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg \n", "9 666063827256086533 https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg \n", "10 666071193221509120 https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg \n", "11 666073100786774016 https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg \n", "12 666082916733198337 https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg \n", "13 666094000022159362 https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg \n", "14 666099513787052032 https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg \n", "15 666102155909144576 https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg \n", "16 666273097616637952 https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg \n", "17 666287406224695296 https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg \n", "18 666337882303524864 https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg \n", "19 666345417576210432 https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg \n", "20 666353288456101888 https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg \n", "21 666373753744588802 https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg \n", "22 666396247373291520 https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg \n", "23 666407126856765440 https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg \n", "24 666418789513326592 https://pbs.twimg.com/media/CT-YWb7U8AA7QnN.jpg \n", "25 666421158376562688 https://pbs.twimg.com/media/CT-aggCXAAIMfT3.jpg \n", "26 666428276349472768 https://pbs.twimg.com/media/CT-g-0DUwAEQdSn.jpg \n", "27 666430724426358785 https://pbs.twimg.com/media/CT-jNYqW4AAPi2M.jpg \n", "28 666435652385423360 https://pbs.twimg.com/media/CT-nsTQWEAEkyDn.jpg \n", "29 666437273139982337 https://pbs.twimg.com/media/CT-pKmRWIAAxUWj.jpg \n", "... ... ... \n", "1721 885528943205470208 https://pbs.twimg.com/media/DEoH3yvXgAAzQtS.jpg \n", "1722 885984800019947520 https://pbs.twimg.com/media/DEumeWWV0AA-Z61.jpg \n", "1723 886258384151887873 https://pbs.twimg.com/media/DEyfTG4UMAE4aE9.jpg \n", "1724 886366144734445568 https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg \n", "1725 886736880519319552 https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg \n", "1726 886983233522544640 https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg \n", "1727 887101392804085760 https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg \n", "1728 887343217045368832 https://pbs.twimg.com/ext_tw_video_thumb/88734... \n", "1729 887473957103951883 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg \n", "1730 887705289381826560 https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg \n", "1731 888078434458587136 https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg \n", "1732 888202515573088257 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg \n", "1733 888554962724278272 https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg \n", "1734 888804989199671297 https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg \n", "1735 888917238123831296 https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg \n", "1736 889278841981685760 https://pbs.twimg.com/ext_tw_video_thumb/88927... \n", "1737 889531135344209921 https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg \n", "1738 889638837579907072 https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg \n", "1739 889665388333682689 https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg \n", "1740 889880896479866881 https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg \n", "1741 890006608113172480 https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg \n", "1742 890240255349198849 https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg \n", "1743 890609185150312448 https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg \n", "1744 890729181411237888 https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg \n", "1745 890971913173991426 https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg \n", "1746 891087950875897856 https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg \n", "1747 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg \n", "1748 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg \n", "1749 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg \n", "1750 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg \n", "\n", " dog_breed timestamp \\\n", "0 Welsh_springer_spaniel 2015-11-15 22:32:08 +0000 \n", "1 redbone 2015-11-15 23:05:30 +0000 \n", "2 German_shepherd 2015-11-15 23:21:54 +0000 \n", "3 Rhodesian_ridgeback 2015-11-16 00:04:52 +0000 \n", "4 miniature_pinscher 2015-11-16 00:24:50 +0000 \n", "5 Bernese_mountain_dog 2015-11-16 00:30:50 +0000 \n", "6 chow 2015-11-16 00:49:46 +0000 \n", "7 golden_retriever 2015-11-16 00:55:59 +0000 \n", "8 miniature_poodle 2015-11-16 01:01:59 +0000 \n", "9 golden_retriever 2015-11-16 01:22:45 +0000 \n", "10 Gordon_setter 2015-11-16 01:52:02 +0000 \n", "11 Walker_hound 2015-11-16 01:59:36 +0000 \n", "12 pug 2015-11-16 02:38:37 +0000 \n", "13 bloodhound 2015-11-16 03:22:39 +0000 \n", "14 Lhasa 2015-11-16 03:44:34 +0000 \n", "15 English_setter 2015-11-16 03:55:04 +0000 \n", "16 Italian_greyhound 2015-11-16 15:14:19 +0000 \n", "17 Maltese_dog 2015-11-16 16:11:11 +0000 \n", "18 Newfoundland 2015-11-16 19:31:45 +0000 \n", "19 golden_retriever 2015-11-16 20:01:42 +0000 \n", "20 malamute 2015-11-16 20:32:58 +0000 \n", "21 soft-coated_wheaten_terrier 2015-11-16 21:54:18 +0000 \n", "22 Chihuahua 2015-11-16 23:23:41 +0000 \n", "23 black-and-tan_coonhound 2015-11-17 00:06:54 +0000 \n", "24 toy_terrier 2015-11-17 00:53:15 +0000 \n", "25 Blenheim_spaniel 2015-11-17 01:02:40 +0000 \n", "26 Pembroke 2015-11-17 01:30:57 +0000 \n", "27 Irish_terrier 2015-11-17 01:40:41 +0000 \n", "28 Chesapeake_Bay_retriever 2015-11-17 02:00:15 +0000 \n", "29 Chihuahua 2015-11-17 02:06:42 +0000 \n", "... ... ... \n", "1721 pug 2017-07-13 15:58:47 +0000 \n", "1722 Blenheim_spaniel 2017-07-14 22:10:11 +0000 \n", "1723 pug 2017-07-15 16:17:19 +0000 \n", "1724 French_bulldog 2017-07-15 23:25:31 +0000 \n", "1725 kuvasz 2017-07-16 23:58:41 +0000 \n", "1726 Chihuahua 2017-07-17 16:17:36 +0000 \n", "1727 Samoyed 2017-07-18 00:07:08 +0000 \n", "1728 Mexican_hairless 2017-07-18 16:08:03 +0000 \n", "1729 Pembroke 2017-07-19 00:47:34 +0000 \n", "1730 basset 2017-07-19 16:06:48 +0000 \n", "1731 French_bulldog 2017-07-20 16:49:33 +0000 \n", "1732 Pembroke NaN \n", "1733 Siberian_husky 2017-07-22 00:23:06 +0000 \n", "1734 golden_retriever 2017-07-22 16:56:37 +0000 \n", "1735 golden_retriever 2017-07-23 00:22:39 +0000 \n", "1736 whippet 2017-07-24 00:19:32 +0000 \n", "1737 golden_retriever 2017-07-24 17:02:04 +0000 \n", "1738 French_bulldog 2017-07-25 00:10:02 +0000 \n", "1739 Pembroke 2017-07-25 01:55:32 +0000 \n", "1740 French_bulldog 2017-07-25 16:11:53 +0000 \n", "1741 Samoyed 2017-07-26 00:31:25 +0000 \n", "1742 Pembroke 2017-07-26 15:59:51 +0000 \n", "1743 Irish_terrier 2017-07-27 16:25:51 +0000 \n", "1744 Pomeranian 2017-07-28 00:22:40 +0000 \n", "1745 Appenzeller 2017-07-28 16:27:12 +0000 \n", "1746 Chesapeake_Bay_retriever 2017-07-29 00:08:17 +0000 \n", "1747 basset 2017-07-29 16:00:24 +0000 \n", "1748 Labrador_retriever 2017-07-30 15:58:51 +0000 \n", "1749 Chihuahua 2017-07-31 00:18:03 +0000 \n", "1750 Chihuahua 2017-08-01 00:17:27 +0000 \n", "\n", " text rating_numerator \\\n", "0 Here we have a Japanese Irish Setter. Lost eye... 8.0 \n", "1 This is a western brown Mitsubishi terrier. Up... 7.0 \n", "2 Here is a very happy pup. Big fan of well-main... 9.0 \n", "3 This is a purebred Piers Morgan. Loves to Netf... 6.0 \n", "4 Here we have a 1949 1st generation vulpix. Enj... 5.0 \n", "5 This is a truly beautiful English Wilson Staff... 10.0 \n", "6 Here is a Siberian heavily armored polar bear ... 10.0 \n", "7 My oh my. This is a rare blond Canadian terrie... 9.0 \n", "8 Here is the Rand Paul of retrievers folks! He'... 8.0 \n", "9 This is the happiest dog you will ever see. Ve... 10.0 \n", "10 Here we have a northern speckled Rhododendron.... 9.0 \n", "11 Let's hope this flight isn't Malaysian (lol). ... 10.0 \n", "12 Here we have a well-established sunblockerspan... 6.0 \n", "13 This appears to be a Mongolian Presbyterian mi... 9.0 \n", "14 Can stand on stump for what seems like a while... 8.0 \n", "15 Oh my. Here you are seeing an Adobe Setter giv... 11.0 \n", "16 Can take selfies 11/10 https://t.co/ws2AMaNwPW 11.0 \n", "17 This is an Albanian 3 1/2 legged Episcopalian... 1.0 \n", "18 This is an extremely rare horned Parthenon. No... 9.0 \n", "19 Look at this jokester thinking seat belt laws ... 10.0 \n", "20 Here we have a mixed Asiago from the Galápagos... 8.0 \n", "21 Those are sunglasses and a jean jacket. 11/10 ... 11.0 \n", "22 Oh goodness. A super rare northeast Qdoba kang... 9.0 \n", "23 This is a southern Vesuvius bumblegruff. Can d... 7.0 \n", "24 This is Walter. He is an Alaskan Terrapin. Lov... 10.0 \n", "25 *internally screaming* 12/10 https://t.co/YMcr... 12.0 \n", "26 Here we have an Austrian Pulitzer. Collectors ... 7.0 \n", "27 Oh boy what a pup! Sunglasses take this one to... 6.0 \n", "28 \"Can you behave? You're ruining my wedding day... 10.0 \n", "29 Here we see a lone northeastern Cumberbatch. H... 7.0 \n", "... ... ... \n", "1721 This is Maisey. She fell asleep mid-excavation... 13.0 \n", "1722 Viewer discretion advised. This is Jimbo. He w... 12.0 \n", "1723 This is Waffles. His doggles are pupside down.... 13.0 \n", "1724 This is Roscoe. Another pupper fallen victim t... 12.0 \n", "1725 This is Mingus. He's a wonderful father to his... 13.0 \n", "1726 This is Maya. She's very shy. Rarely leaves he... 13.0 \n", "1727 This... is a Jubilant Antarctic House Bear. We... 12.0 \n", "1728 You may not have known you needed to see this ... 13.0 \n", "1729 This is Canela. She attempted some fancy porch... 13.0 \n", "1730 This is Jeffrey. He has a monopoly on the pool... 13.0 \n", "1731 This is Gerald. He was just told he didn't get... 12.0 \n", "1732 NaN NaN \n", "1733 This is Ralphus. He's powering up. Attempting ... 13.0 \n", "1734 This is Zeke. He has a new stick. Very proud o... 13.0 \n", "1735 This is Jim. He found a fren. Taught him how t... 12.0 \n", "1736 This is Oliver. You're witnessing one of his m... 13.0 \n", "1737 This is Stuart. He's sporting his favorite fan... 13.0 \n", "1738 This is Ted. He does his best. Sometimes that'... 12.0 \n", "1739 Here's a puppo that seems to be on the fence a... 13.0 \n", "1740 This is Bruno. He is a service shark. Only get... 13.0 \n", "1741 This is Koda. He is a South Australian decksha... 13.0 \n", "1742 This is Cassie. She is a college pup. Studying... 14.0 \n", "1743 This is Zoey. She doesn't want to be one of th... 13.0 \n", "1744 When you watch your owner call another dog a g... 13.0 \n", "1745 Meet Jax. He enjoys ice cream so much he gets ... 13.0 \n", "1746 Here we have a majestic great white breaching ... 13.0 \n", "1747 This is Franklin. He would like you to stop ca... 12.0 \n", "1748 This is Darla. She commenced a snooze mid meal... 13.0 \n", "1749 This is Archie. He is a rare Norwegian Pouncin... 12.0 \n", "1750 This is Tilly. She's just checking pup on you.... 13.0 \n", "\n", " rating_denominator name dog_stage retweet_count favorite_count \n", "0 10.0 NaN NaN 421.0 2285.0 \n", "1 10.0 NaN NaN 39.0 112.0 \n", "2 10.0 NaN NaN 36.0 100.0 \n", "3 10.0 NaN NaN 115.0 245.0 \n", "4 10.0 NaN NaN 36.0 88.0 \n", "5 10.0 NaN NaN 50.0 115.0 \n", "6 10.0 NaN NaN 196.0 367.0 \n", "7 10.0 NaN NaN 112.0 247.0 \n", "8 10.0 NaN NaN 47.0 99.0 \n", "9 10.0 NaN NaN 180.0 395.0 \n", "10 10.0 NaN NaN 51.0 127.0 \n", "11 10.0 NaN NaN 130.0 274.0 \n", "12 10.0 NaN NaN 37.0 92.0 \n", "13 10.0 NaN NaN 63.0 142.0 \n", "14 10.0 NaN NaN 53.0 133.0 \n", "15 10.0 NaN NaN 11.0 66.0 \n", "16 10.0 NaN NaN 66.0 151.0 \n", "17 2.0 NaN NaN 56.0 123.0 \n", "18 10.0 NaN NaN 79.0 168.0 \n", "19 10.0 NaN NaN 122.0 241.0 \n", "20 10.0 NaN NaN 56.0 179.0 \n", "21 10.0 NaN NaN 73.0 162.0 \n", "22 10.0 NaN NaN 68.0 147.0 \n", "23 10.0 NaN NaN 30.0 93.0 \n", "24 10.0 Walter NaN 39.0 107.0 \n", "25 10.0 NaN NaN 91.0 272.0 \n", "26 10.0 NaN NaN 67.0 139.0 \n", "27 10.0 NaN NaN 160.0 276.0 \n", "28 10.0 NaN NaN 42.0 137.0 \n", "29 10.0 NaN NaN 40.0 106.0 \n", "... ... ... ... ... ... \n", "1721 10.0 Maisey NaN 5311.0 31522.0 \n", "1722 10.0 Jimbo NaN 5592.0 28526.0 \n", "1723 10.0 Waffles NaN 5262.0 24459.0 \n", "1724 10.0 Roscoe pupper 2614.0 18501.0 \n", "1725 10.0 Mingus NaN 2619.0 10466.0 \n", "1726 10.0 Maya NaN 6294.0 30275.0 \n", "1727 10.0 NaN NaN 4975.0 26916.0 \n", "1728 10.0 NaN NaN 8788.0 29515.0 \n", "1729 10.0 Canela NaN 14973.0 60028.0 \n", "1730 10.0 Jeffrey NaN 4522.0 26553.0 \n", "1731 10.0 Gerald NaN 2886.0 19115.0 \n", "1732 NaN NaN NaN NaN NaN \n", "1733 10.0 Ralphus NaN 2868.0 17266.0 \n", "1734 10.0 Zeke NaN 3518.0 22409.0 \n", "1735 10.0 Jim NaN 3747.0 25551.0 \n", "1736 10.0 Oliver NaN 4426.0 22052.0 \n", "1737 10.0 Stuart puppo 1874.0 13321.0 \n", "1738 10.0 Ted NaN 3702.0 23605.0 \n", "1739 10.0 NaN puppo 8312.0 41910.0 \n", "1740 10.0 Bruno NaN 4145.0 24506.0 \n", "1741 10.0 Koda NaN 6120.0 26973.0 \n", "1742 10.0 Cassie doggo 6081.0 27878.0 \n", "1743 10.0 Zoey NaN 3605.0 24455.0 \n", "1744 10.0 NaN NaN 15695.0 56701.0 \n", "1745 10.0 Jax NaN 1649.0 10340.0 \n", "1746 10.0 NaN NaN 2590.0 17757.0 \n", "1747 10.0 Franklin NaN 7723.0 35207.0 \n", "1748 10.0 Darla NaN 7198.0 36823.0 \n", "1749 10.0 Archie NaN 3466.0 21987.0 \n", "1750 10.0 Tilly NaN 5280.0 29255.0 \n", "\n", "[1751 rows x 11 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "master_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Issue #6:\n", "#### records dating beyond 2017-08-01\n", "\n", "### Issue #7:\n", "#### 'timestamp' column is an object datatype\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define:\n", "Merging the data sets filters out all records dating beyond 2017-08-01, fixing issue`#6`.This also means that issue`#7` is now a non-issue because the column will be dropped, as the main reason for converting it's data type would have been to filter out the records that dated beyond the specified date." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Code:" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [], "source": [ "master_df = master_df.drop(['timestamp'], axis = 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Test:" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['index', 'tweet_id', 'jpg_url', 'dog_breed', 'text', 'rating_numerator',\n", " 'rating_denominator', 'name', 'dog_stage', 'retweet_count',\n", " 'favorite_count'],\n", " dtype='object')" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "master_df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Storing Data" ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [], "source": [ "# storing the cleaned master dataframe in a csv file\n", "master_df.to_csv('twitter_archive_master.csv',index=False) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Analyzing and Visualizing Data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Questions for analysis\n", "\n", "1. What is the lowest dog rating on WeRateDogs?\n", "2. Which tweet has the lowest dog rating on WeRateDogs?\n", "3. Which tweet has the highest retweet count?\n", "4. What are some of the most common dog names?\n", "5. What are the most popular dog breeds by favorite count?\n", "6. Are WeRateDogs tweets more likely to be favorited or retweeted?\n", "7. Describe the correlation between retweet count and favorite count.\n", "8. Describe the correlation between dog rating and retweet count.\n", "9. Describe the correlation between dog rating and favorite count.\n" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [], "source": [ "# importing data into a dataframe\n", "tweets = pd.read_csv('twitter_archive_master.csv')\n" ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indextweet_idrating_numeratorrating_denominatorretweet_countfavorite_count
count1658.0000001.658000e+031658.0000001658.0000001657.0000001657.000000
mean1048.1628477.392385e+1711.38540410.4710492283.7465308004.847314
std594.0204136.794971e+167.5065346.3591524158.93629311787.806358
min0.0000006.660209e+170.0000002.00000011.00000066.000000
25%548.2500006.773835e+1710.00000010.000000514.0000001806.000000
50%1049.5000007.138309e+1711.00000010.0000001131.0000003723.000000
75%1552.7500007.931619e+1712.00000010.0000002587.0000009904.000000
max2073.0000008.921774e+17165.000000150.00000070429.000000144312.000000
\n", "
" ], "text/plain": [ " index tweet_id rating_numerator rating_denominator \\\n", "count 1658.000000 1.658000e+03 1658.000000 1658.000000 \n", "mean 1048.162847 7.392385e+17 11.385404 10.471049 \n", "std 594.020413 6.794971e+16 7.506534 6.359152 \n", "min 0.000000 6.660209e+17 0.000000 2.000000 \n", "25% 548.250000 6.773835e+17 10.000000 10.000000 \n", "50% 1049.500000 7.138309e+17 11.000000 10.000000 \n", "75% 1552.750000 7.931619e+17 12.000000 10.000000 \n", "max 2073.000000 8.921774e+17 165.000000 150.000000 \n", "\n", " retweet_count favorite_count \n", "count 1657.000000 1657.000000 \n", "mean 2283.746530 8004.847314 \n", "std 4158.936293 11787.806358 \n", "min 11.000000 66.000000 \n", "25% 514.000000 1806.000000 \n", "50% 1131.000000 3723.000000 \n", "75% 2587.000000 9904.000000 \n", "max 70429.000000 144312.000000 " ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# cheching summary statistics for the data\n", "tweets.describe()" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10 1642\n", "50 3\n", "80 2\n", "11 2\n", "150 1\n", "120 1\n", "110 1\n", "90 1\n", "70 1\n", "40 1\n", "20 1\n", "7 1\n", "2 1\n", "Name: rating_denominator, dtype: int64" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#checking the denominator values\n", "tweets.rating_denominator.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the values of the denominator vary, to standardize the ratings for analysis, create an additional percentage column" ] }, { "cell_type": "code", "execution_count": 143, "metadata": {}, "outputs": [], "source": [ "#creating a percentage rating column\n", "tweets['rating'] = (tweets.rating_numerator / tweets.rating_denominator)* 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Insights:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 1. What is the lowest dog rating on WeRateDogs? " ] }, { "cell_type": "code", "execution_count": 144, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets.rating.min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The lowest dog rating given on WeRateDogs is 0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2. Which tweet has the lowest dog rating on WeRateDogs?" ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indextweet_idjpg_urldog_breedtextrating_numeratorrating_denominatornamedog_stageretweet_countfavorite_countrating
14551824835152434251116546https://pbs.twimg.com/media/C5cOtWVWMAEjO5p.jpgAmerican_Staffordshire_terrierWhen you're so blinded by your systematic plag...010NaNNaN2755.020923.00.0
\n", "
" ], "text/plain": [ " index tweet_id \\\n", "1455 1824 835152434251116546 \n", "\n", " jpg_url \\\n", "1455 https://pbs.twimg.com/media/C5cOtWVWMAEjO5p.jpg \n", "\n", " dog_breed \\\n", "1455 American_Staffordshire_terrier \n", "\n", " text rating_numerator \\\n", "1455 When you're so blinded by your systematic plag... 0 \n", "\n", " rating_denominator name dog_stage retweet_count favorite_count rating \n", "1455 10 NaN NaN 2755.0 20923.0 0.0 " ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#tweet with the lowest dog rating\n", "tweets.loc[tweets.rating == tweets.rating.min()]" ] }, { "cell_type": "code", "execution_count": 147, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"When you're so blinded by your systematic plagiarism that you forget what day it is. 0/10 https://t.co/YbEJPkg4Ag\"" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# getting the tweet text\n", "tweets.loc[tweets.rating == tweets.rating.min()].text[1455]" ] }, { "cell_type": "code", "execution_count": 148, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'https://pbs.twimg.com/media/C5cOtWVWMAEjO5p.jpg'" ] }, "execution_count": 148, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#getting the tweet image url for download\n", "tweets.loc[tweets.rating == tweets.rating.min()].jpg_url[1455]" ] }, { "attachments": { "Terrier.jpg": { "image/jpeg": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "The tweet with the lowest rating on WeRateDogs, at 0/10 is of an American Staffordshire terrier stating:\n", "\n", "\"When you're so blinded by your systematic plagiarism that you forget what day it is. 0/10\"\n", "![Terrier.jpg](attachment:Terrier.jpg)\n", "\n", "This tweet however still has a favorite count of 20,923 which is more than double the average favorite count, and a retweet count of 2,755 which is still slightly higher than the average retweet count." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 3. Which tweet has the highest retweet count?" ] }, { "cell_type": "code", "execution_count": 149, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indextweet_idjpg_urldog_breedtextrating_numeratorrating_denominatornamedog_stageretweet_countfavorite_countrating
9781221744234799360020481https://pbs.twimg.com/ext_tw_video_thumb/74423...Labrador_retrieverHere's a doggo realizing you can stand in a po...1310NaNdoggo70429.0144312.0130.0
\n", "
" ], "text/plain": [ " index tweet_id \\\n", "978 1221 744234799360020481 \n", "\n", " jpg_url dog_breed \\\n", "978 https://pbs.twimg.com/ext_tw_video_thumb/74423... Labrador_retriever \n", "\n", " text rating_numerator \\\n", "978 Here's a doggo realizing you can stand in a po... 13 \n", "\n", " rating_denominator name dog_stage retweet_count favorite_count rating \n", "978 10 NaN doggo 70429.0 144312.0 130.0 " ] }, "execution_count": 149, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#tweet with the highest retweet count\n", "tweets.loc[tweets.retweet_count == tweets.retweet_count.max()]" ] }, { "cell_type": "code", "execution_count": 150, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4\"" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#getting the tweet text\n", "tweets.loc[tweets.retweet_count == tweets.retweet_count.max()].text[978]" ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'https://pbs.twimg.com/ext_tw_video_thumb/744234667679821824/pu/img/1GaWmtJtdqzZV7jy.jpg'" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#getting the tweet image for downaload\n", "tweets.loc[tweets.retweet_count == tweets.retweet_count.max()].jpg_url[978]" ] }, { "attachments": { "Labrador_retriever.jpg": { "image/jpeg": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "The tweet with the highest retweet count at 70,429 retweets, also happening to have the highest favorite count at 144,312 likes, is of a Labrador retriever stating:\n", "\n", "\"Here's a doggo realizing you can stand in a pool. 13/10 enlightened af\"\n", "![Labrador_retriever.jpg](attachment:Labrador_retriever.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 4. What are some of the most common dog names?" ] }, { "cell_type": "code", "execution_count": 152, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Cooper 10\n", "Charlie 9\n", "Lucy 9\n", "Tucker 9\n", "Oliver 9\n", "Penny 8\n", "Sadie 7\n", "Daisy 7\n", "Winston 7\n", "Toby 6\n", "Jax 6\n", "Lola 6\n", "Koda 6\n", "Stanley 5\n", "Bella 5\n", "Oscar 5\n", "Leo 5\n", "Bo 5\n", "Rusty 5\n", "Bear 4\n", "Gus 4\n", "Larry 4\n", "Louis 4\n", "Winnie 4\n", "Alfie 4\n", "Duke 4\n", "Brody 4\n", "Maggie 4\n", "Bentley 4\n", "Cassie 4\n", " ..\n", "Tayzie 1\n", "Lipton 1\n", "Aqua 1\n", "Rocco 1\n", "Clybe 1\n", "Carll 1\n", "Humphrey 1\n", "Brownie 1\n", "Jay 1\n", "Asher 1\n", "Brat 1\n", "Lili 1\n", "Eve 1\n", "Ed 1\n", "Grizz 1\n", "Travis 1\n", "Cheesy 1\n", "Sage 1\n", "Jockson 1\n", "Hero 1\n", "Antony 1\n", "Buddah 1\n", "Jarvis 1\n", "Snickers 1\n", "Bonaparte 1\n", "Klevin 1\n", "Betty 1\n", "Cora 1\n", "Bruno 1\n", "Dug 1\n", "Name: name, Length: 830, dtype: int64" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets.name.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of the most common dog names on WeRateDogs include Cooper at the very top, with 10 dogs, Charlie, Lucy, Tucker, Oliver. Each having 9 dogs with the stated names and Penny having 8 dogs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 5. What are the most popular dog breeds by favorite count?" ] }, { "cell_type": "code", "execution_count": 153, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dog_breed\n", "golden_retriever 1639666.0\n", "Labrador_retriever 1028020.0\n", "Pembroke 902671.0\n", "Chihuahua 664894.0\n", "French_bulldog 524718.0\n", "Samoyed 480684.0\n", "chow 388436.0\n", "cocker_spaniel 351165.0\n", "pug 324429.0\n", "malamute 303845.0\n", "toy_poodle 274019.0\n", "Pomeranian 273973.0\n", "Chesapeake_Bay_retriever 265152.0\n", "Eskimo_dog 242658.0\n", "Cardigan 229040.0\n", "German_shepherd 184672.0\n", "Lakeland_terrier 182833.0\n", "basset 170964.0\n", "miniature_pinscher 168423.0\n", "Great_Pyrenees 157068.0\n", "whippet 139361.0\n", "standard_poodle 131277.0\n", "Shetland_sheepdog 130964.0\n", "Bedlington_terrier 128905.0\n", "Staffordshire_bullterrier 128875.0\n", "English_springer 121099.0\n", "Italian_greyhound 120516.0\n", "Siberian_husky 119303.0\n", "Rottweiler 118892.0\n", "flat-coated_retriever 115892.0\n", " ... \n", "Dandie_Dinmont 20572.0\n", "Australian_terrier 19072.0\n", "basenji 18967.0\n", "Gordon_setter 18523.0\n", "Welsh_springer_spaniel 17243.0\n", "bluetick 17055.0\n", "keeshond 16513.0\n", "redbone 16486.0\n", "Bouvier_des_Flandres 15318.0\n", "cairn 15302.0\n", "miniature_schnauzer 14438.0\n", "wire-haired_fox_terrier 14355.0\n", "Rhodesian_ridgeback 13811.0\n", "Appenzeller 12507.0\n", "curly-coated_retriever 11830.0\n", "Lhasa 11146.0\n", "Ibizan_hound 10865.0\n", "toy_terrier 8100.0\n", "Scottish_deerhound 7650.0\n", "Sussex_spaniel 6853.0\n", "silky_terrier 6222.0\n", "Tibetan_terrier 6218.0\n", "clumber 6184.0\n", "Scotch_terrier 3018.0\n", "EntleBucher 2246.0\n", "Brabancon_griffon 2229.0\n", "groenendael 1949.0\n", "standard_schnauzer 1686.0\n", "Irish_wolfhound 1285.0\n", "Japanese_spaniel 1111.0\n", "Name: favorite_count, Length: 113, dtype: float64" ] }, "execution_count": 153, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweets.groupby('dog_breed').favorite_count.sum().sort_values(ascending = False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The top 5 most popular dog breeds on WeRateDogs in descending order are:\n", "\n", "1.Golden retrievers having 1,639,666 total favorite counts\n", "\n", "2.Labrador retrievers at 1,028,020 favorite counts\n", "\n", "3.Pembrokes at 902,671 favorite counts\n", "\n", "4.Chihuahuas at 664,894 favorite counts\n", "\n", "5.French Bulldogs at 524,718 favorite counts\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 6. Are WeRateDogs tweets more likely to be retweeted or favorited?" ] }, { "cell_type": "code", "execution_count": 154, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of retweets: 3784168.0\n", "Total number of likes: 13264032.0\n", "Difference: 9479864.0\n" ] } ], "source": [ "print(f'Total number of retweets: {tweets.retweet_count.sum()}') \n", "print(f'Total number of likes: {tweets.favorite_count.sum()}')\n", "print(f'Difference: {tweets.favorite_count.sum()-tweets.retweet_count.sum()}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "WeRateDogs tweets are more likely to be favorited, having 9.4 million more favorite counts than retweet counts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 1. Describe the correlation between retweet count and favorite count." ] }, { "cell_type": "code", "execution_count": 155, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#plotting to show correlation\n", "tweets.plot(x = 'retweet_count',y = 'favorite_count', kind = 'scatter');\n", "plt.title('Correlation between tweet retweet count and tweet favorite count');\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is generally a positive correlation between retweet count and favorite count. As a tweet's favorite count increases, the number of retweets is also highly likely to increase." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2. Describe the correlation between dog rating and retweet count" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "tweets.plot(x = 'retweet_count',y = 'rating', kind = 'scatter');\n", "plt.title('Correlation between tweet retweet count and dog rating');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The scatter plot reveals a horizontal line of best fit, indicating there is no correlation between dog rating and retweet count. The rating given to a dog therefore does not affect the number of times a particular dog's tweet will be retweeted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 3. Describe the correlation between dog rating and favorite count." ] }, { "cell_type": "code", "execution_count": 157, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "tweets.plot(x = 'favorite_count',y = 'rating', kind = 'scatter');\n", "plt.title('Correlation between tweet favorite count and dog rating');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Much like with retweet count, this scatter plot also reveals a horizontal line of best fit, indicating, there is no correlation between dog rating and favorite count. The rating given to a dog therefore does not affect the number of times that the particular dog's tweet will be favorited." ] } ], "metadata": { "extensions": { "jupyter_dashboards": { "activeView": "report_default", "version": 1, "views": { "grid_default": { "cellMargin": 10, "defaultCellHeight": 20, "maxColumns": 12, "name": "grid", "type": "grid" }, "report_default": { "name": "report", "type": "report" } } } }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 2 }