{ "cells": [ { "cell_type": "markdown", "id": "desperate-cotton", "metadata": {}, "source": [ "# RickyRenuncia Project\n", "\n", "The team of the RickyRenuncia Project managed multiple adquicision procedures to preserve the incidents that occured during the summer 2019 related to the leave of office of Ex-governor Ricardo Rosello Nevarez.\n", "\n", "\n", "## Physical Media\n", "\n", "The team collected artifacts and bannes used during the demonstrations. When ever possible the artifacts where accompanied by audio interview and/or photograph of the demonstrators that produced and used this artifacts.\n", "\n", "## Digital Donations\n", "\n", "Using social media and word-of-mouth the team also contacted the community requesting imagery and content related to the activities of that summer.\n", "\n", "## Twitter Data Collection\n", "\n", "In order to have a broad view of the many activities and demonstratiosn around the globe, one of the team members, Joel Blanco, decided to capture records of tweet activity in the web. This data was captured life during the days of the incident and requires processing and analysis to provide a valid interpretation of the information adquired.\n", "\n", "A cleaned version of this dataset occupies over `7 gigabytes` but fits into `777 megabytes` when compressed using `gzip`. Full text data can generally be easily compressed. Bellow we calculate the benefit of compressing this specific dataset." ] }, { "cell_type": "code", "execution_count": 1, "id": "previous-priest", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The storage was reduced to 10.8%.\n", "After compression, 89.2% from the originaly occupied space was freed.\n" ] } ], "source": [ "# Calculare the storage benefits of compression\n", "\n", "# Observations\n", "original_size_G = 7\n", "final_size_M = 777\n", "\n", "# Unit transformation\n", "giga_to_mega_rate = 1024.0\n", "original_size_M = original_size_G * giga_to_mega_rate\n", "\n", "# Calculate percent change\n", "new_size_to_old_size = final_size_M / original_size_M\n", "new_size_percent = new_size_to_old_size * 100.0\n", "space_freed_percent = 100 - new_size_percent\n", "\n", "print(\n", " \"The storage was reduced to {:.1f}%.\\nAfter compression, {:.1f}% from the originaly occupied space was freed.\".\n", " format(new_size_percent, space_freed_percent)\n", ")\n", "\n" ] }, { "cell_type": "markdown", "id": "adopted-configuration", "metadata": {}, "source": [ "The benefits can be very big specially for long term storage." ] }, { "cell_type": "markdown", "id": "frozen-treasurer", "metadata": {}, "source": [ "# Twitter Data: What we collected?\n", "\n", "It is important to understand the type of data that is collected from a social media API (application programable interface). The file `Data/Joel/tweetsRickyRenuncia-final.jsonl` is of jsonl format. If you are familiar with json files then this format is a composition of multiple `json` strings each in a new line, the 'L' stands for line (`jsonl = json-l = json-line`).\n", "\n", "This data set was collected from Twitter in 2019. The Twitter API rescently went through an update, however this data uses the previous API conventions. We will use Pythons `json` library to parse a random line from the source data to help you visualize the structure of this data. Observe that some of the content is readily availble (text field), while others are harder to parse (media url).\n", "\n", "The full list of tweet ids is available [here](https://ia601005.us.archive.org/31/items/tweetsRickyRenuncia-final/tweetsRickyRenuncia-final.txt).\n", "\n", "Bellow we show how a try/except and while loops can be used to loop through the data until a post with images is found." ] }, { "cell_type": "code", "execution_count": 2, "id": "delayed-house", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/rickyrenuncia/RickyRenuncia-case-module_shared\n", "Sample Size:1113758\n", "\n", "\n", "Text: RT @vicgasco: \"Somos tan buenos que cogemos a los nuestros de pendejos\" -Ricardo Rosselló #RickyRenuncia #RickyRenunciaYa https://t.co/isIw…\n", "Tweet URL using user's screen_name: https://twitter.com/frances_sola/status/1151268971557134337\n", "Tweet URL using user's ID : https://twitter.com/3299053100/status/1151268971557134337\n", "Media: https://pbs.twimg.com/media/D_oenfmXoAUuieN.jpg\n", "\n", "\n" ] } ], "source": [ "import json\n", "from random import seed, randint\n", "import os \n", "dir_path = os.getcwd()\n", "print(dir_path)\n", "#os.chdir(\"/home/torrien/\")\n", "#dir_path = os.getcwd()\n", "#print(dir_path)\n", "#print(os.listdir())\n", "#print(os.listfile())\n", "JL_DATA=\"/home/rickyrenuncia/tweetsRickyRenuncia-final.jsonl\"\n", "\n", "# Get the SAMPLE_SIZE\n", "SAMPLE_SIZE = 0.\n", "with open(JL_DATA, \"r\") as data_handler:\n", " for line in data_handler:\n", " if line != \"\\n\":\n", " SAMPLE_SIZE += 1.\n", "print(f\"Sample Size:{int(SAMPLE_SIZE)}\\n\\n\")\n", "\n", "\n", "# Get a random integer to skip before taking single sample\n", "# Try seeds 1 and 16 or any you want to test\n", "seed(1)\n", "skip_lines=randint(0,int(SAMPLE_SIZE-1))\n", "\n", "\n", "# Reopen file using the with-open-as style and print out a single sample\n", "with open(JL_DATA, 'r') as data_handler:\n", " # Use next to skip a line, the for loop allows skipping multiple lines\n", " for _ in range(skip_lines):\n", " next(data_handler)\n", " \n", " while True:\n", " # Loop until a tweet with media.\n", " try:\n", " # Capture string\n", " raw_data = data_handler.readline()\n", " \n", " # Verify if the json has any 'meda_url_https' keys.\n", " if 'media_url_https' not in raw_data:\n", " continue\n", " data = json.loads(raw_data)\n", " except:\n", " break\n", " try:\n", " i = 0\n", " while True:\n", " try:\n", " media_url = data['retweeted_status']['entities']['media'][i]['media_url_https']\n", " except:\n", " i += 1\n", " if i > 10:\n", " media_url = \"Could not quickly find a tweet with media.\"\n", " raise #Pass error to previous try/except.\n", " continue\n", " break \n", " except:\n", " continue\n", " \n", " \n", " print(\"Text:\", data['text'])\n", " # The Tweet URL is a twitter convention where both the tweet ID and the user's screen_name are required to access the status.\n", " print(\"Tweet URL using user's screen_name:\", f\"https://twitter.com/{data['user']['screen_name']}/status/{data['id_str']}\")\n", " print(\"Tweet URL using user's ID :\", f\"https://twitter.com/{data['user']['id_str']}/status/{data['id_str']}\")\n", " print(\"Media:\", media_url)\n", "# print(f\"In replay to: {json.dumps(data['retweeted_status'], indent=1)}\")\n", " print(\"\\n\")\n", " # The indent and sort_keys in json.dumps \"prettify\" the output. Still not pretty.\n", "# print(\"Raw Data:\")\n", "# print(\"#\"*50)\n", "# print(json.dumps(data, indent=4, sort_keys=True))\n", "# print(\"#\"*50)\n", " break\n" ] }, { "cell_type": "markdown", "id": "monetary-venezuela", "metadata": {}, "source": [ "retweeted_posts, handler)## Study the old Twitter API\n", "Documentation on the old twitter API version 1.1 can be found [here](https://developer.twitter.com/en/docs/twitter-api/v1) and a sample [over here](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/sample-realtime/api-reference/get-statuses-sample)." ] }, { "cell_type": "markdown", "id": "collaborative-lafayette", "metadata": {}, "source": [ "# What data is available\n", "\n", "As data analysts we need to understand the data before we can set goals. " ] }, { "cell_type": "code", "execution_count": 1, "id": "underlying-breathing", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/rickyrenuncia/.multiple_sorts_tweetsRickyRenuncia-final.jsonl.pkl\n", "Creating Lists\n", "/home/rickyrenuncia/tweetsRickyRenuncia-final.jsonl\n", "Finished: 200000\n", "Finished: 400000\n", "Finished: 600000\n", "Finished: 800000\n", "Finished: 1000000\n", "Expecting value: line 1 column 1 (char 0)\n", "Could not create tweet\n", "Last line processed = 1113758\n", "48\n", "0.046875\n" ] } ], "source": [ "from tweet_rehydrate.analysis import TweetJLAnalyzer, TweetAnalyzer, getsizeof\n", "from random import randint\n", "JL_DATA=\"/home/rickyrenuncia/tweetsRickyRenuncia-final.jsonl\"\n", "SAMPLE_SIZE = 1113758\n", "data = TweetJLAnalyzer(JL_DATA, reset=True, local_media=False, cache_size=2000)\n", "size=getsizeof(data)\n", "print(str(size))\n", "print(str(size/1024.0))" ] }, { "cell_type": "code", "execution_count": 11, "id": "concrete-proposition", "metadata": {}, "outputs": [], "source": [ "most_retweeted_media = data.get_most_retweeted_media(40)" ] }, { "cell_type": "code", "execution_count": 12, "id": "broken-interim", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ammount found: 42\n", "PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1151738583000264704/pu/img/M_mUM7-RiqQHIFwB.jpg\n", "********************\n", "112370 - 1151738583000264704\n", "********************\n", "\n", "\n", "PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1151475207007326208/pu/img/upRlenku-1oo84bw.jpg\n", "********************\n", "110941 - 1151475207007326208\n", "********************\n", "\n", "\n", "PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1151645615660699659/pu/img/IRu1lKdxKDEusUcg.jpg\n", "********************\n", "103496 - 1151645615660699659\n", "********************\n", "\n", "\n", "PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1153281467121831938/pu/img/E9obyW9yqFLqL4aN.jpg\n", "********************\n", "101538 - 1153281467121831938\n", "********************\n", "\n", "\n", "PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1152226566652518406/pu/img/30CitbamtRq0VWge.jpg\n", "********************\n", "90413 - 1152226566652518406\n", "********************\n", "\n", "\n", "PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1150890766107066369/pu/img/Xfaaja0nGSz22hK3.jpg\n", "********************\n", "84206 - 1150890766107066369\n", "********************\n", "\n", "\n" ] } ], "source": [ "print(\"Ammount found: \", len(most_retweeted_media))\n", "for rt_count, m_id, m in most_retweeted_media[15:21]:\n", " print(m)\n", " print(\"*\"*20 + \"\\n\" + str(rt_count) + \" - \" + str(m_id) + \"\\n\" + \"*\"*20 + \"\\n\\n\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "offensive-consolidation", "metadata": {}, "outputs": [], "source": [ "most_retweeted_posts = data.get_most_retweeted(100,has_media=True)" ] }, { "cell_type": "code", "execution_count": 3, "id": "viral-vietnam", "metadata": {}, "outputs": [], "source": [ "# Save populars posts\n", "import pickle\n", "with open(\"100_most_retweeted_posts.pickle\",'wb') as handler:\n", " pickle.dump(most_retweeted_posts, handler)" ] }, { "cell_type": "code", "execution_count": 4, "id": "outdoor-luther", "metadata": {}, "outputs": [], "source": [ "# Recall popular posts\n", "import pickle\n", "with open(\"100_most_retweeted_posts.pickle\",'rb') as handler:\n", " most_retweeted_posts = pickle.load(handler)" ] }, { "cell_type": "code", "execution_count": 13, "id": "constitutional-little", "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ammount found: 100\n", "ID: 1151475227404185600\n", "Text: Your daily dose of antidepressant https://t.co/7MXYSyobOf\n", "URL: https://twitter.com/2896099018/status/1151475227404185600\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1151475207007326208/pu/img/upRlenku-1oo84bw.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1151475207007326208/pu/vid/480x480/oNJBGtn57RnjFF9j.mp4?tag=10']\n", "**********\n", "110941 - 1151475227404185600 - 992800.0001\n", "\n", "\n", "ID: 1150890789033193481\n", "Text: “Whose driving?”\n", "\n", "Everyone with cars: https://t.co/dtwxbdZJ04\n", "URL: https://twitter.com/2544501210/status/1150890789033193481\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1150890766107066369/pu/img/Xfaaja0nGSz22hK3.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1150890766107066369/pu/vid/480x480/hUo98Vfa6UYdETtE.mp4?tag=10']\n", "**********\n", "84206 - 1150890789033193481 - 298732.0001\n", "\n", "\n", "ID: 1152968581862346752\n", "Text: A thread for my non Spanish speaking followers explaining what’s going on in Puerto Rico 🇵🇷 https://t.co/2K5Qr3GOSu\n", "URL: https://twitter.com/1312036255/status/1152968581862346752\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/media/EAAqz9aXYAIkkbi.jpg', 'PHOTO: https://pbs.twimg.com/media/EAAqz9aXYAIkkbi.jpg']\n", "********************\n", "83996 - 1152968581862346752 - 1084631.0101\n", "********************\n", "\n", "\n", "ID: 1152279015576784896\n", "Text: 'They killed my mom and my six brothers'\n", "\n", "Trump: 'Where are they now?'\n", "\n", "'They're dead'\n", "\n", "https://t.co/SfKSpAXyP1\n", "URL: https://twitter.com/68752979/status/1152279015576784896\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1152226566652518406/pu/img/30CitbamtRq0VWge.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1152226566652518406/pu/vid/388x360/zS3fIb5QOntfBsRM.mp4?tag=10']\n", "**********\n", "90413 - 1152279015576784896 - 923475.0101\n", "\n", "\n", "ID: 1150516560018018305\n", "Text: Me pretending I didn’t see someone I know in public https://t.co/drw8A5Nsl1\n", "URL: https://twitter.com/534931954/status/1150516560018018305\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1150516529064075265/pu/img/JtB7Z6VJDSrCqNJM.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1150516529064075265/pu/vid/480x480/d9vgRMGh1-S6MPE4.mp4?tag=10']\n", "**********\n", "118018 - 1150516560018018305 - 208876.0001\n", "\n", "\n", "ID: 1152313246285742081\n", "Text: me trying to lose 5 kilos in 5 minutes https://t.co/KUDRKP0ar0\n", "URL: https://twitter.com/241292536/status/1152313246285742081\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1151738583000264704/pu/img/M_mUM7-RiqQHIFwB.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1151738583000264704/pu/vid/360x640/rcnaRAh7J8qNmSgo.mp4?tag=10']\n", "**********\n", "112370 - 1152313246285742081 - 865476.0001\n", "\n", "\n", "ID: 1152025567794872320\n", "Text: Buddy wanted $24.95 for a picture of me on the roller coaster https://t.co/SwbdLRGfby\n", "URL: https://twitter.com/15355278/status/1152025567794872320\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1152025523633213440/pu/img/dzz33EwahtHZIV1B.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1152025523633213440/pu/vid/360x638/g6a5mSdO5BvJOtdU.mp4?tag=10']\n", "**********\n", "76575 - 1152025567794872320 - 899336.0101\n", "\n", "\n", "ID: 1151645825036148736\n", "Text: SOMEONE ADDED BALLS TO THIS SCENE AND NOW IT'S AMAZING😂 https://t.co/NTJDbhwFFv\n", "URL: https://twitter.com/595688846/status/1151645825036148736\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1151645615660699659/pu/img/IRu1lKdxKDEusUcg.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1151645615660699659/pu/vid/640x360/8CtIpQFodrv4Dv-B.mp4?tag=10']\n", "**********\n", "103496 - 1151645825036148736 - 956056.0001\n", "\n", "\n", "ID: 1152737403913854977\n", "Text: Me happily listening to the same 6 songs everyday https://t.co/8goDOTSLvw\n", "URL: https://twitter.com/1027329545064603648/status/1152737403913854977\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1152737371533787140/pu/img/8a1Vv8-Nyrwg2cSn.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1152737371533787140/pu/vid/360x708/xZMzjSu4y_hXyDhZ.mp4?tag=10']\n", "**********\n", "148330 - 1152737403913854977 - 1085243.0101\n", "\n", "\n", "ID: 1153281489976606720\n", "Text: Next we work in a little conditioner. https://t.co/ffveNaXU72\n", "URL: https://twitter.com/2262255041/status/1153281489976606720\n", "Retweet:False\n", "Original Tweet URL: Not applicable\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Has Media=True\n", "Has Local Media=True\n", "Media=['PHOTO: https://pbs.twimg.com/ext_tw_video_thumb/1153281467121831938/pu/img/E9obyW9yqFLqL4aN.jpg', 'VIDEO: https://video.twimg.com/ext_tw_video/1153281467121831938/pu/vid/360x520/XqCo21S7WuPPbC48.mp4?tag=10']\n", "**********\n", "101538 - 1153281489976606720 - 542924.0101\n", "\n", "\n" ] } ], "source": [ "import random\n", "print(\"Ammount found: \", len(most_retweeted_posts))\n", "for rt_count, tweet_id, key in random.sample(most_retweeted_posts[11:21], 10):\n", " tweet = data.fetch_by_id(tweet_id)\n", " if \"renuncia\" in tweet.data[\"text\"].lower() or \"puerto rico\" in tweet.data[\"text\"].lower() or \"ricky\" in tweet.data[\"text\"].lower() or \"rosell\" in tweet.data[\"text\"].lower():\n", " print(tweet)\n", " print(\"*\"*20 + \"\\n\" + str(rt_count) + \" - \" + str(tweet_id) + \" - \" + str(key) + \"\\n\" + \"*\"*20 + \"\\n\\n\")\n", " else:\n", " # print(tweet.data[\"text\"])\n", " print(tweet)\n", " print(\"*\"*10 + \"\\n\" + str(rt_count) + \" - \" + str(tweet_id) + \" - \" + str(key) + \"\\n\\n\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "worth-density", "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ID: 1153081436636925952\n", "Text: RT @CortesBob: This will only add fuel to the protesting fire. @ricardorossello has lost all trust to govern & the people do not believe in…\n", "URL: https://twitter.com/3222769285/status/1153081436636925952\n", "Retweet:True\n", "Original Tweet URL: https://twitter.com/612777130/status/1153064207472123905\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Media=[]\n", "****************************************************************************************************\n", "\n", "ID: 1153081432895631367\n", "Text: RT @carlosdelgado21: El gobernador @ricardorossello sigue jugando a la politica. Esto no se trata del partido ni de la presidencia del mism…\n", "URL: https://twitter.com/550143070/status/1153081432895631367\n", "Retweet:True\n", "Original Tweet URL: https://twitter.com/39184279/status/1153065364827385856\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Media=[]\n", "****************************************************************************************************\n", "\n", "ID: 1153081430232195072\n", "Text: RT @Tommy_Torres: La generación de Benito no cree en el Ay Bendito. #RickyRenuncia https://t.co/ZdZL6dRiE1\n", "URL: https://twitter.com/2723666981/status/1153081430232195072\n", "Retweet:True\n", "Original Tweet URL: https://twitter.com/61359460/status/1152693083164807168\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Media=['PHOTO: https://pbs.twimg.com/media/D_8wPoCXoAEq8jy.jpg', 'PHOTO: https://pbs.twimg.com/media/D_8wPoCXoAEq8jy.jpg', 'PHOTO: https://pbs.twimg.com/media/D_8wPoAWwAARYrz.jpg', 'PHOTO: https://pbs.twimg.com/media/D_8wPn_WwAEJE-y.jpg']\n", "****************************************************************************************************\n", "\n", "ID: 1153081425329041408\n", "Text: RT @pjsinsuela: 👑 Mañana 7AM nos vemos en el #ParoNaciona a sacar a este bacalao' de su gobernación y mejorar nuestra situación. ¡Puerto Ri…\n", "URL: https://twitter.com/858487597/status/1153081425329041408\n", "Retweet:True\n", "Original Tweet URL: https://twitter.com/222536921/status/1153076413643276289\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Media=[]\n", "****************************************************************************************************\n", "\n", "ID: 1153081422586023936\n", "Text: RT @Samynemir: 20 years ago, #PuertoRico mobilized to remove US Marine- world’s most powerful military- out of Vieques as it was bombarded…\n", "URL: https://twitter.com/1459766030/status/1153081422586023936\n", "Retweet:True\n", "Original Tweet URL: https://twitter.com/128790234/status/1152783400127930371\n", "Quotes:False\n", "Quoted Tweet URL: Not applicable\n", "Media=[]\n", "****************************************************************************************************\n", "\n", "\n" ] } ], "source": [ "# randint(0,SAMPLE_SIZE-6)\n", "# print(data.head(5, 40, sep=\"\\n\" + \"*\"*100 + \"\\n\\n\"))\n", "#RickyRenuncia\n", "#RickyVeteYa\n", "print(data.head(5, randint(0,SAMPLE_SIZE-6), sep=\"\\n\" + \"*\"*100 + \"\\n\\n\"))" ] }, { "cell_type": "code", "execution_count": 2, "id": "permanent-testament", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "(\"'TweetAnalyzer' object has no attribute 'media'\",)\n", "'TweetAnalyzer' object has no attribute 'media'\n", "\n" ] } ], "source": [ "print(data.head(2, sep=\"\\n*************\\n\"))" ] }, { "cell_type": "code", "execution_count": 5, "id": "unsigned-strand", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "dict_keys([25, 0, 2894, 6, 2, 4, 205, 106, 1291, 23, 121, 1458, 256, 3, 274, 8, 834, 30, 1, 586, 431, 13, 816, 49, 160, 2106, 1604, 210, 107, 27, 10, 1711, 994, 396, 1520, 17, 176, 92, 93, 162, 21, 63, 48, 45, 4326, 87, 1673, 257, 923, 24, 123, 139, 163, 481, 36, 1450, 54, 932, 119, 801, 275, 22, 7, 247, 32, 213, 143, 71, 18, 383, 185, 44, 700, 301, 50, 182, 368, 242, 28, 11, 207, 219, 955, 5, 53,\n", "{25: [{'id': 1152013350382915589, 'id_str': '1152013350382915589', 'user': {'id': 1057328811585560576, 'id_str': '1057328811585560576'}, 'jsonl_key': '1152013350382915589'}, {'id': 1152013299300438016, 'id_str': '1152013299300438016', 'user': {'id': 906410164185649152, 'id_str': '906410164185649152'}, 'jsonl_key': '1152013299300438016'}, {'id': 1152012777365446656, 'id_str': '1152012777365446656',\n" ] } ], "source": [ "print(type(data.retweet_cache))\n", "print(str(data.retweet_cache.keys())[:400])\n", "print(str(data.retweet_cache)[:400])" ] }, { "cell_type": "code", "execution_count": 11, "id": "fixed-westminster", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'id': 1152013349267234817, 'id_str': '1152013349267234817', 'user': {'id': 474765264, 'id_str': '474765264'}, 'jsonl_key': '1152013349267234817'}\n", "{'https://twitter.com/residente/status/1151965929925959680': 49, 'https://twitter.com/i/web/status/1152013334717198336': 1, 'https://twitter.com/perlalessandra/status/1152001977355603968': 1112, 'https://twitter.com/petebuttigieg/status/1151993436536393729': 17, 'https://twitter.com/musiccapos/status/1152005696260464641': 56, 'not applicable': 50119, 'https://twitter.com/idislikegabo/status/115200\n", "{'https://twitter.com/1232457985/status/1151967163646926855': 39, 'https://twitter.com/1311098821/status/1151689632955928577': 2268, 'https://twitter.com/1052162276399243264/status/1151965615080521728': 6, 'https://twitter.com/741592836/status/1152012539082858496': 1, 'https://twitter.com/2523919437/status/1152012188279701504': 33, 'https://twitter.com/1915636538/status/1152008881909858305': 1169,\n", "{25: [{'id': 1152013350382915589, 'id_str': '1152013350382915589', 'user': {'id': 1057328811585560576, 'id_str': '1057328811585560576'}, 'jsonl_key': '1152013350382915589'}, {'id': 1152013299300438016, 'id_str': '1152013299300438016', 'user': {'id': 906410164185649152, 'id_str': '906410164185649152'}, 'jsonl_key': '1152013299300438016'}, {'id': 1152012777365446656, 'id_str': '1152012777365446656',\n", "[20982, 20981, 20980, 20978, 20977, 20976, 20935, 20933, 20932, 20931, 20930, 20929, 20928, 20927, 20925, 20924, 20923, 20922, 20921, 20920, 20918, 20917, 8010, 8009, 8001, 6865, 6864, 6855, 6854, 6853, 6852, 6851, 6850, 6849, 6182, 6181, 6180, 6179, 6178, 6177, 6173, 6172, 6171, 6170, 6169, 5717, 5716, 5715, 5656, 5655, 5654, 5653, 5652, 5651, 5650, 5649, 5547, 5546, 5545, 5544, 5504, 5502, 5501,\n", "[]\n" ] } ], "source": [ "print(data.retweet_cache[0][0])\n", "print(str(data.quoteOf)[:400])\n", "print(str(data.retweetOf)[:400])\n", "print(str(data.retweet_cache)[:400])\n", "retweet_counts = list(data.retweet_cache.keys())\n", "retweet_counts.sort(reverse=True)\n", "quote_counts = list(data.quote_cache.keys())\n", "quote_counts.sort(reverse=True)\n", "print(str(retweet_counts)[:400])\n", "print(str(quote_counts)[:400])" ] }, { "cell_type": "code", "execution_count": 6, "id": "proved-amsterdam", "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"contributors\": null,\n", " \"truncated\": false,\n", " \"text\": \"RT @kellydiazr: AHORA: \\n\\ud83d\\udea8\\ud83d\\udea8\\ud83d\\udea8A la corilla de La Fiebre que sal\\u00eda hoy hacia Fortaleza les hicieron un bloqueo ilegal, los hartaron a tickets y\\u2026\",\n", " \"is_quote_status\": false,\n", " \"in_reply_to_status_id\": null,\n", " \"id\": 1152013081255407618,\n", " \"favorite_count\": 0,\n", " \"source\": \"Twitter for iPhone\",\n", " \"retweeted\": false,\n", " \"coordinates\": null,\n", " \"entities\": {\n", " \"symbols\": [],\n", " \"user_mentions\": [\n", " {\n", " \"indices\": [\n", " 3,\n", " 14\n", " ],\n", " \"id_str\": \"1915636538\",\n", " \"screen_name\": \"kellydiazr\",\n", " \"name\": \"K. D\\u00edaz\",\n", " \"id\": 1915636538\n", " }\n", " ],\n", " \"hashtags\": [],\n", " \"urls\": []\n", " },\n", " \"in_reply_to_screen_name\": null,\n", " \"in_reply_to_user_id\": null,\n", " \"retweet_count\": 205,\n", " \"id_str\": \"1152013081255407618\",\n", " \"favorited\": false,\n", " \"retweeted_status\": {\n", " \"contributors\": null,\n", " \"truncated\": true,\n", " \"text\": \"AHORA: \\n\\ud83d\\udea8\\ud83d\\udea8\\ud83d\\udea8A la corilla de La Fiebre que sal\\u00eda hoy hacia Fortaleza les hicieron un bloqueo ilegal, los hartaron a t\\u2026 https://t.co/47bAazekHL\",\n", " \"is_quote_status\": false,\n", " \"in_reply_to_status_id\": null,\n", " \"id\": 1152008881909858305,\n", " \"favorite_count\": 99,\n", " \"source\": \"Twitter for iPhone\",\n", " \"retweeted\": false,\n", " \"coordinates\": null,\n", " \"entities\": {\n", " \"symbols\": [],\n", " \"user_mentions\": [],\n", " \"hashtags\": [],\n", " \"urls\": [\n", " {\n", " \"url\": \"https://t.co/47bAazekHL\",\n", " \"indices\": [\n", " 117,\n", " 140\n", " ],\n", " \"expanded_url\": \"https://twitter.com/i/web/status/1152008881909858305\",\n", " \"display_url\": \"twitter.com/i/web/status/1\\u2026\"\n", " }\n", " ]\n", " },\n", " \"in_reply_to_screen_name\": null,\n", " \"in_reply_to_user_id\": null,\n", " \"retweet_count\": 205,\n", " \"id_str\": \"1152008881909858305\",\n", " \"favorited\": false,\n", " \"user\": {\n", " \"follow_request_sent\": false,\n", " \"has_extended_profile\": true,\n", " \"profile_use_background_image\": false,\n", " \"contributors_enabled\": false,\n", " \"id\": 1915636538,\n", " \"verified\": false,\n", " \"translator_type\": \"regular\",\n", " \"profile_text_color\": \"000000\",\n", " \"profile_image_url_https\": \"https://pbs.twimg.com/profile_images/1143401892854321153/sjw9lvDu_normal.jpg\",\n", " \"profile_sidebar_fill_color\": \"000000\",\n", " \"entities\": {\n", " \"url\": {\n", " \"urls\": [\n", " {\n", " \"url\": \"https://t.co/ke4r46jzGu\",\n", " \"indices\": [\n", " 0,\n", " 23\n", " ],\n", " \"expanded_url\": \"http://kellydiazrodriguez.wordpress.com\",\n", " \"display_url\": \"kellydiazrodriguez.wordpress.com\"\n", " }\n", " ]\n", " },\n", " \"description\": {\n", " \"urls\": []\n", " }\n", " },\n", " \"followers_count\": 3372,\n", " \"profile_sidebar_border_color\": \"000000\",\n", " \"id_str\": \"1915636538\",\n", " \"default_profile_image\": false,\n", " \"listed_count\": 56,\n", " \"is_translation_enabled\": false,\n", " \"utc_offset\": null,\n", " \"statuses_count\": 69543,\n", " \"description\": \"Escribo cosas, brego con libros y como nada es perfecto: no s\\u00e9 bailar salsa. | #RickyRenuncia\",\n", " \"friends_count\": 301,\n", " \"location\": \"Sanqueerce, Puerto Rico \",\n", " \"profile_link_color\": \"000456\",\n", " \"profile_image_url\": \"http://pbs.twimg.com/profile_images/1143401892854321153/sjw9lvDu_normal.jpg\",\n", " \"notifications\": false,\n", " \"geo_enabled\": true,\n", " \"profile_background_color\": \"000000\",\n", " \"profile_banner_url\": \"https://pbs.twimg.com/profile_banners/1915636538/1462228528\",\n", " \"profile_background_image_url\": \"http://abs.twimg.com/images/themes/theme1/bg.png\",\n", " \"screen_name\": \"kellydiazr\",\n", " \"lang\": null,\n", " \"following\": false,\n", " \"profile_background_tile\": false,\n", " \"favourites_count\": 21839,\n", " \"name\": \"K. D\\u00edaz\",\n", " \"url\": \"https://t.co/ke4r46jzGu\",\n", " \"created_at\": \"Sat Sep 28 23:39:07 +0000 2013\",\n", " \"profile_background_image_url_https\": \"https://abs.twimg.com/images/themes/theme1/bg.png\",\n", " \"time_zone\": null,\n", " \"protected\": false,\n", " \"default_profile\": false,\n", " \"is_translator\": false\n", " },\n", " \"geo\": null,\n", " \"in_reply_to_user_id_str\": null,\n", " \"lang\": \"es\",\n", " \"created_at\": \"Fri Jul 19 00:14:55 +0000 2019\",\n", " \"in_reply_to_status_id_str\": null,\n", " \"place\": null,\n", " \"metadata\": {\n", " \"iso_language_code\": \"es\",\n", " \"result_type\": \"recent\"\n", " }\n", " },\n", " \"user\": {\n", " \"follow_request_sent\": false,\n", " \"has_extended_profile\": true,\n", " \"profile_use_background_image\": true,\n", " \"contributors_enabled\": false,\n", " \"id\": 1141124347534462976,\n", " \"verified\": false,\n", " \"translator_type\": \"none\",\n", " \"profile_text_color\": \"333333\",\n", " \"profile_image_url_https\": \"https://pbs.twimg.com/profile_images/1151986107065593856/6wM9_Pvq_normal.jpg\",\n", " \"profile_sidebar_fill_color\": \"DDEEF6\",\n", " \"entities\": {\n", " \"description\": {\n", " \"urls\": []\n", " }\n", " },\n", " \"followers_count\": 75,\n", " \"profile_sidebar_border_color\": \"C0DEED\",\n", " \"id_str\": \"1141124347534462976\",\n", " \"default_profile_image\": false,\n", " \"listed_count\": 0,\n", " \"is_translation_enabled\": false,\n", " \"utc_offset\": null,\n", " \"statuses_count\": 1503,\n", " \"description\": \"\\u00a1Viva la vida!\",\n", " \"friends_count\": 156,\n", " \"location\": \"00767\",\n", " \"profile_link_color\": \"1DA1F2\",\n", " \"profile_image_url\": \"http://pbs.twimg.com/profile_images/1151986107065593856/6wM9_Pvq_normal.jpg\",\n", " \"notifications\": false,\n", " \"geo_enabled\": false,\n", " \"profile_background_color\": \"F5F8FA\",\n", " \"profile_banner_url\": \"https://pbs.twimg.com/profile_banners/1141124347534462976/1563484778\",\n", " \"profile_background_image_url\": null,\n", " \"screen_name\": \"floreessz\",\n", " \"lang\": null,\n", " \"following\": false,\n", " \"profile_background_tile\": false,\n", " \"favourites_count\": 297,\n", " \"name\": \"Flo\\ud83e\\udd8b\",\n", " \"url\": null,\n", " \"created_at\": \"Tue Jun 18 23:23:40 +0000 2019\",\n", " \"profile_background_image_url_https\": null,\n", " \"time_zone\": null,\n", " \"protected\": false,\n", " \"default_profile\": true,\n", " \"is_translator\": false\n", " },\n", " \"geo\": null,\n", " \"in_reply_to_user_id_str\": null,\n", " \"lang\": \"es\",\n", " \"created_at\": \"Fri Jul 19 00:31:36 +0000 2019\",\n", " \"in_reply_to_status_id_str\": null,\n", " \"place\": null,\n", " \"metadata\": {\n", " \"iso_language_code\": \"es\",\n", " \"result_type\": \"recent\"\n", " }\n", "}\n" ] } ], "source": [ "import json\n", "sample_t = data.fetch_by_position(112)\n", "print(json.dumps(sample_t.data, indent=4))" ] }, { "cell_type": "code", "execution_count": 7, "id": "listed-patient", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1151689205996822528 video https://video.twimg.com/ext_tw_video/1151689205996822528/pu/vid/360x778/YrQ6FVlC2kMHWMnO.mp4?tag=10\n", "1152001927623761923 video https://video.twimg.com/ext_tw_video/1152001927623761923/pu/vid/640x360/scZWK1m35HHcj-of.mp4?tag=10\n", "1152005409303121921 video https://video.twimg.com/ext_tw_video/1152005409303121921/pu/vid/640x360/U0BllslXNll2b3_e.mp4?tag=10\n", "1152009003687448576 video https://video.twimg.com/ext_tw_video/1152009003687448576/pu/vid/640x360/5zbRUPzUFcmOp-7L.mp4?tag=10\n", "1151860930466263042 video https://video.twimg.com/ext_tw_video/1151860930466263042/pu/vid/360x640/lYd2puCmVM5dX_4f.mp4?tag=10\n", "1151938358534377472 video https://video.twimg.com/ext_tw_video/1151938358534377472/pu/vid/360x640/KzDJbiPaj0l-J1u6.mp4?tag=10\n", "1152008287715561472 video https://video.twimg.com/ext_tw_video/1152008287715561472/pu/vid/640x360/n2oHa2lcp_BKsoDV.mp4?tag=10\n", "1151678566246932480 video https://video.twimg.com/ext_tw_video/1151678566246932480/pu/vid/640x360/2LhsAHtiG0rebrsi.mp4?tag=10\n", "1151862717927608322 video https://video.twimg.com/ext_tw_video/1151862717927608322/pu/vid/640x360/UOMrOVNZx_klCaaB.mp4?tag=10\n", "1152004915192979456 video https://video.twimg.com/ext_tw_video/1152004915192979456/pu/vid/360x640/roiy9_Jn2aGSH7SQ.mp4?tag=10\n", "1151606800719974412 video https://video.twimg.com/ext_tw_video/1151606800719974412/pu/vid/360x640/dQgx5SxsxJemccS_.mp4?tag=10\n", "1151651246803226626 video https://video.twimg.com/ext_tw_video/1151651246803226626/pu/vid/360x636/W2q4IlxSBfyKXb6u.mp4?tag=10\n", "1151489207971536897 video https://video.twimg.com/ext_tw_video/1151489207971536897/pu/vid/848x432/ExtgtUXDU-IDvPvV.mp4?tag=10\n", "1151296144368160773 video https://video.twimg.com/ext_tw_video/1151296144368160773/pu/vid/480x480/OOm7rY7c8FKPR59F.mp4?tag=10\n", "1151717235632877568 video https://video.twimg.com/ext_tw_video/1151717235632877568/pu/vid/360x640/nAAV65cIK59UNA9w.mp4?tag=10\n", "1151992913573793794 video https://video.twimg.com/ext_tw_video/1151992913573793794/pu/vid/360x778/0tAZLRjMOklmaoZ1.mp4?tag=10\n", "1151997249703964672 video https://video.twimg.com/ext_tw_video/1151997249703964672/pu/vid/480x480/-clEuROvWDIhsiRI.mp4?tag=10\n", "1151992251536404480 video https://video.twimg.com/ext_tw_video/1151992251536404480/pu/vid/480x480/6_SPdX_Nz0v5KdVi.mp4?tag=10\n", "1151729254989799424 video https://video.twimg.com/ext_tw_video/1151729254989799424/pu/vid/640x360/36tZfXKlePOw7ReD.mp4?tag=10\n", "1151994618000490496 video https://video.twimg.com/ext_tw_video/1151994618000490496/pu/vid/404x720/LluZGZPn5oBCQsPJ.mp4?tag=10\n", "1134728835381882880 video https://video.twimg.com/ext_tw_video/1134728835381882880/pu/vid/480x480/9n5vZ0lxtMOwNcwl.mp4?tag=9\n", "1151602953159086083 video https://video.twimg.com/ext_tw_video/1151602953159086083/pu/vid/360x640/wKJukDjDGIqrcdgr.mp4?tag=10\n", "1151951734698336256 video https://video.twimg.com/ext_tw_video/1151951734698336256/pu/vid/360x640/afU1Zrf5PPl6Txpv.mp4?tag=10\n", "1151981723502227456 video https://video.twimg.com/ext_tw_video/1151981723502227456/pu/vid/360x640/d6wTdVxIE2Sc0loY.mp4?tag=10\n", "1152011962450153474 video https://video.twimg.com/ext_tw_video/1152011962450153474/pu/vid/640x360/wgFz3d9tBHK-7-8p.mp4?tag=10\n", "1151954257681121280 video https://video.twimg.com/ext_tw_video/1151954257681121280/pu/vid/720x406/R23ruNUB-t0JQG39.mp4?tag=10\n", "1151998880021544960 video https://video.twimg.com/ext_tw_video/1151998880021544960/pu/vid/352x640/Sdhe3xPKioIh__98.mp4?tag=10\n", "1151632834567782401 video https://video.twimg.com/ext_tw_video/1151632834567782401/pu/vid/360x778/IPHcKGhUxemIgmcH.mp4?tag=10\n", "1151974606741413889 video https://video.twimg.com/ext_tw_video/1151974606741413889/pu/vid/360x778/NZKLJEy8ugyzdSA0.mp4?tag=10\n", "1151711345882140672 video https://video.twimg.com/ext_tw_video/1151711345882140672/pu/vid/360x640/93M0ICMAxOP_xqLb.mp4?tag=10\n", "1151640522378874880 video https://video.twimg.com/ext_tw_video/1151640522378874880/pu/vid/360x640/I2vPfwsjaNsmRlBb.mp4?tag=10\n", "1151682719530790914 video https://video.twimg.com/ext_tw_video/1151682719530790914/pu/vid/640x360/5naQSgbrfKd4E8CD.mp4?tag=10\n", "1152012170525327361 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_zE9ykWwAEGfMa.jpg\n", "1152012062240989184 video https://video.twimg.com/ext_tw_video/1152012062240989184/pu/vid/640x360/Ehe1vHrzyOv0MaOh.mp4?tag=10\n", "1151924609253552130 video https://video.twimg.com/ext_tw_video/1151924609253552130/pu/vid/360x740/DuxEzVQDjHo0gt-t.mp4?tag=10\n", "1151618224531955713 video https://video.twimg.com/ext_tw_video/1151618224531955713/pu/vid/848x464/UT-hMJGsNtUrFLOX.mp4?tag=10\n", "1151966952044290069 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_yb1umW4BU-qLB.jpg\n", "1151943150761631746 video https://video.twimg.com/ext_tw_video/1151943150761631746/pu/vid/320x576/mp6vpNcKTUt2SI0c.mp4?tag=10\n", "1151679892746133505 video https://video.twimg.com/ext_tw_video/1151679892746133505/pu/vid/640x360/pF40wNij1xPlF-EF.mp4?tag=10\n", "1152006648854003713 video https://video.twimg.com/ext_tw_video/1152006648854003713/pu/vid/320x320/I6sXYQuUKH4cnUnw.mp4?tag=10\n", "1152007464323186688 video https://video.twimg.com/ext_tw_video/1152007464323186688/pu/vid/640x360/8KL0EN52UUuCWnmY.mp4?tag=10\n", "1152010519857025031 video https://video.twimg.com/ext_tw_video/1152010519857025031/pu/vid/360x492/J-HMsLOwWR_vWWQt.mp4?tag=10\n", "1152010870588870659 video https://video.twimg.com/ext_tw_video/1152010870588870659/pu/vid/360x640/Etpbzxg9Oj3ZAU4g.mp4?tag=10\n", "1151835204518129664 video https://video.twimg.com/ext_tw_video/1151835204518129664/pu/vid/360x640/4IxjMS2DpnKAccq2.mp4?tag=10\n", "1151639194353250306 video https://video.twimg.com/ext_tw_video/1151639194353250306/pu/vid/360x640/mHtKEXbWkzICkPbK.mp4?tag=10\n", "1152009640328216576 video https://video.twimg.com/ext_tw_video/1152009640328216576/pu/vid/360x640/sRb81ORwf_Dd-VoY.mp4?tag=10\n", "1152008328412688385 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_zBeJlUcAELdMd.jpg\n", "1152005905124212736 video https://video.twimg.com/ext_tw_video/1152005905124212736/pu/vid/640x360/UqVyyYXceDkiLJo8.mp4?tag=10\n", "1152010601247465472 video https://video.twimg.com/ext_tw_video/1152010601247465472/pu/vid/408x360/H3-ZukPydHd50rSV.mp4?tag=10\n", "1151914095303634944 video https://video.twimg.com/ext_tw_video/1151914095303634944/pu/vid/400x400/pgJFBvGhswgwh-zh.mp4?tag=10\n", "1151575535300087809 video https://video.twimg.com/ext_tw_video/1151575535300087809/pu/vid/640x360/gd-WrgYNjEz8bT8m.mp4?tag=10\n", "1152010180617494528 video https://video.twimg.com/ext_tw_video/1152010180617494528/pu/vid/480x480/gL6ijOx6cwPUPm0M.mp4?tag=10\n", "1151588776927473665 video https://video.twimg.com/ext_tw_video/1151588776927473665/pu/vid/360x640/TMMoXw3dFLcBstbm.mp4?tag=10\n", "1152010272980291584 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_zDPVqXUAArLvm.jpg\n", "1152009755675758592 video https://video.twimg.com/ext_tw_video/1152009755675758592/pu/vid/360x714/2zDaWCL4wEAhA1_6.mp4?tag=10\n", "1151681158561837058 video https://video.twimg.com/ext_tw_video/1151681158561837058/pu/vid/360x640/5wPGHP7onh158aUX.mp4?tag=10\n", "1151995822663507969 video https://video.twimg.com/amplify_video/1151995822663507969/vid/480x480/kay_lmlC8l4zdhHb.mp4?tag=13\n", "1151993416244322306 video https://video.twimg.com/ext_tw_video/1151993416244322306/pu/vid/360x640/vCkB54AyWYhVdtTV.mp4?tag=10\n", "1151999062335352832 video https://video.twimg.com/ext_tw_video/1151999062335352832/pu/vid/360x636/nRkqBotkutNaHFz5.mp4?tag=10\n", "1152008614326022144 video https://video.twimg.com/ext_tw_video/1152008614326022144/pu/vid/480x270/xU7qFNNDZ2S8iVyA.mp4?tag=10\n", "1151979442211831808 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_ynMwGWwAAwE8_.jpg\n", "1152009641905348610 video https://video.twimg.com/ext_tw_video/1152009641905348610/pu/vid/480x480/BKC7z1R0cAJ89xfV.mp4?tag=10\n", "1151986865286733824 video https://video.twimg.com/ext_tw_video/1151986865286733824/pu/vid/360x640/gIq7FfQjFuiGGFDf.mp4?tag=10\n", "1152009312140767232 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_zCXaQXsAARuE0.jpg\n", "1151559268669317120 video https://video.twimg.com/ext_tw_video/1151559268669317120/pu/vid/360x640/bPri0vCp6TCuBjnB.mp4?tag=10\n", "1152008485745451009 video https://video.twimg.com/ext_tw_video/1152008485745451009/pu/vid/360x640/D8r0-JySVO8jUnlw.mp4?tag=10\n", "1151709497473359872 video https://video.twimg.com/ext_tw_video/1151709497473359872/pu/vid/360x640/JpMN-E3rEuURMXy0.mp4?tag=10\n", "1151651237219241986 video https://video.twimg.com/ext_tw_video/1151651237219241986/pu/vid/360x778/lAtX7EV_UfXcd15r.mp4?tag=10\n", "1151953189090025475 video https://video.twimg.com/ext_tw_video/1151953189090025475/pu/vid/640x352/FXO581qi3PLVQC0G.mp4?tag=10\n", "1151716406729330689 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_u3-EkXUAElJsM.jpg\n", "1151649073688195072 video https://video.twimg.com/ext_tw_video/1151649073688195072/pu/vid/360x640/BwmAmAC1rnqrlW8L.mp4?tag=10\n", "1151708652425875457 video https://video.twimg.com/ext_tw_video/1151708652425875457/pu/vid/360x640/cxqOMM1qD2NhMtVl.mp4?tag=10\n", "1150923825174986752 video https://video.twimg.com/ext_tw_video/1150923825174986752/pu/vid/640x360/rKwtbhw9iGAjBD00.mp4?tag=10\n", "1151662444688920576 video https://video.twimg.com/ext_tw_video/1151662444688920576/pu/vid/360x640/qAHJaP818TpSU8L8.mp4?tag=10\n", "1152007420304117765 video https://video.twimg.com/ext_tw_video/1152007420304117765/pu/vid/360x696/UcYXCPFGR0EJ5boT.mp4?tag=10\n", "1151784747804348416 video https://video.twimg.com/ext_tw_video/1151784747804348416/pu/vid/360x450/h7Z_i_hOXaVrtCw8.mp4?tag=10\n", "1152008415365013505 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_zBjNgXoAEqaus.jpg\n", "1151865997978198019 video https://video.twimg.com/ext_tw_video/1151865997978198019/pu/vid/360x480/lqb4kGU_ynu1vXDG.mp4?tag=10\n", "1151844659465003008 video https://video.twimg.com/ext_tw_video/1151844659465003008/pu/vid/360x640/rqBwLR5ZenDI09_k.mp4?tag=10\n", "1151709982519435265 video https://video.twimg.com/ext_tw_video/1151709982519435265/pu/vid/360x640/Ng7Fn3C6CkBFSVZC.mp4?tag=10\n", "1152007994512744453 video https://video.twimg.com/ext_tw_video/1152007994512744453/pu/vid/848x448/OayCuwemEhahniMf.mp4?tag=10\n", "1151958781812727824 video https://video.twimg.com/ext_tw_video/1151958781812727824/pu/vid/360x636/qB7URnG6MEW1praR.mp4?tag=10\n", "1152005551020077056 video https://video.twimg.com/ext_tw_video/1152005551020077056/pu/vid/360x640/OcQNmuMXKrVDi7ii.mp4?tag=10\n", "1151673505244745732 video https://video.twimg.com/ext_tw_video/1151673505244745732/pu/vid/640x360/5Mbw7v7H-7-46Iqg.mp4?tag=10\n", "1152000979589459970 video https://video.twimg.com/ext_tw_video/1152000979589459970/pu/vid/640x360/TOlk9tanGb_cXB8N.mp4?tag=10\n", "1151617975541207043 video https://video.twimg.com/ext_tw_video/1151617975541207043/pu/vid/360x640/-qRU4bzypLcZdyGc.mp4?tag=10\n", "1151628491584094210 video https://video.twimg.com/ext_tw_video/1151628491584094210/pu/vid/640x360/OkPpLfNLs6-2mtp5.mp4?tag=10\n", "1151652331416621056 video https://video.twimg.com/ext_tw_video/1151652331416621056/pu/vid/360x636/ASpzFQXuc9Pg_ndt.mp4?tag=10\n", "1151260315969097728 video https://video.twimg.com/ext_tw_video/1151260315969097728/pu/vid/360x640/Let6KxkIKIbgd6oM.mp4?tag=10\n", "1152006165204819968 video https://video.twimg.com/ext_tw_video/1152006165204819968/pu/vid/360x640/xzl5jLeCPkYhMasK.mp4?tag=10\n", "1152006028566790144 video https://video.twimg.com/ext_tw_video/1152006028566790144/pu/vid/360x778/j-DpYkdnZYUWBDli.mp4?tag=10\n", "1152005971675209729 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_y_U-DUYAEzfIB.jpg\n", "1151983973482078210 video https://video.twimg.com/ext_tw_video/1151983973482078210/pu/vid/480x480/AvtfFxDgBTU4iyVC.mp4?tag=10\n", "1152005506371686402 video https://video.twimg.com/ext_tw_video/1152005506371686402/pu/vid/360x640/SPb4nNpQjoyTUbJs.mp4?tag=10\n", "1152005085490114561 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_y-hYwU8AEVKpR.jpg\n", "1152005097720885248 video https://video.twimg.com/ext_tw_video/1152005097720885248/pu/vid/640x360/m5D_70RJjCIbP7wB.mp4?tag=10\n", "1151016520618823680 video https://video.twimg.com/ext_tw_video/1151016520618823680/pu/vid/352x640/P811vW5sYvGbGoJN.mp4?tag=10\n", "1151916269265915911 video https://video.twimg.com/ext_tw_video/1151916269265915911/pu/vid/480x268/V73r-dvW48Fm9rhn.mp4?tag=10\n", "1152004936953057280 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_y-YvaVUAAM6Pq.jpg\n", "1152003376437088256 video https://video.twimg.com/ext_tw_video/1152003376437088256/pu/vid/480x360/msY5pciq6kwulqyh.mp4?tag=10\n", "1151714024637292544 video https://video.twimg.com/ext_tw_video/1151714024637292544/pu/vid/304x288/ESwPqtbgeB1eDEeg.mp4?tag=10\n", "1152001689211174913 video https://video.twimg.com/ext_tw_video/1152001689211174913/pu/vid/640x352/CLQGbKwFfdEb-FvU.mp4?tag=10\n", "1152004152765009920 video https://video.twimg.com/ext_tw_video/1152004152765009920/pu/vid/360x778/1x5a3Ug1DRKeTrlB.mp4?tag=10\n", "1151720342135746560 video https://video.twimg.com/ext_tw_video/1151720342135746560/pu/vid/640x360/eyTzyffJc62seEPW.mp4?tag=10\n", "1151617667700264960 video https://video.twimg.com/ext_tw_video/1151617667700264960/pu/vid/432x848/MbBrQ-CReNFbLQeD.mp4?tag=10\n", "1151963060350914561 video https://video.twimg.com/ext_tw_video/1151963060350914561/pu/vid/360x778/svkxPNeP5UQma8ki.mp4?tag=10\n", "1151675190276849664 video https://video.twimg.com/ext_tw_video/1151675190276849664/pu/vid/640x360/U2b91efw1ynwJ4D0.mp4?tag=10\n", "1152000183883821056 video https://video.twimg.com/ext_tw_video/1152000183883821056/pu/vid/720x406/imKeKvIWj8B2sav4.mp4?tag=10\n", "1152002410769833985 video https://video.twimg.com/ext_tw_video/1152002410769833985/pu/vid/360x640/tP_QzKXdLeUi1ioW.mp4?tag=10\n", "1152002077255540736 video https://video.twimg.com/ext_tw_video/1152002077255540736/pu/vid/360x640/JWkNuxtulN5EdtQS.mp4?tag=10\n", "1152002988090650624 video https://video.twimg.com/ext_tw_video/1152002988090650624/pu/vid/360x640/jnkagd2-8Jzeh_KI.mp4?tag=10\n", "1152002971040796672 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_y8mT0U0AA8AN2.jpg\n", "1151964697652027397 video https://video.twimg.com/ext_tw_video/1151964697652027397/pu/vid/224x400/nROjRQhy4kPK9bL3.mp4?tag=10\n", "1151923646836936704 video https://video.twimg.com/ext_tw_video/1151923646836936704/pu/vid/360x640/mAptjBc6ejYliwiu.mp4?tag=10\n", "1152001752385744896 video https://video.twimg.com/ext_tw_video/1152001752385744896/pu/vid/640x360/MdzU6iadzP51IWTA.mp4?tag=10\n", "1151955616820322308 video https://video.twimg.com/ext_tw_video/1151955616820322308/pu/vid/360x778/EXiaCgOKD2FVsLyX.mp4?tag=10\n", "1151696199742873601 video https://video.twimg.com/ext_tw_video/1151696199742873601/pu/vid/360x640/DzAsyvJgjT3I42vm.mp4?tag=10\n", "1151963974801481732 video https://video.twimg.com/amplify_video/1151963974801481732/vid/640x360/i1gAIOZYF_7Ddyh5.mp4?tag=13\n", "1152001648689991680 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_y7ZVrU0AAllxf.jpg\n", "1151687945079939072 video https://video.twimg.com/ext_tw_video/1151687945079939072/pu/vid/360x640/PrbaRsRmRdql-KUj.mp4?tag=10\n", "1151911852953223168 video https://video.twimg.com/ext_tw_video/1151911852953223168/pu/vid/224x400/mJOHeWQq6Scw8DYY.mp4?tag=10\n", "1151318921640677376 video https://video.twimg.com/ext_tw_video/1151318921640677376/pu/vid/360x640/4GbcVjgnhY6CD5FO.mp4?tag=10\n", "1152000972693970945 video https://video.twimg.com/ext_tw_video/1152000972693970945/pu/vid/636x360/rVi-DiIJ_HJKAJ66.mp4?tag=10\n", "1151895557918613506 video https://video.twimg.com/ext_tw_video/1151895557918613506/pu/vid/360x640/IMRpSMU9PjEfm6zn.mp4?tag=10\n", "1151996735935303680 video https://video.twimg.com/ext_tw_video/1151996735935303680/pu/vid/636x360/P5AeoE5zrnZEoeo9.mp4?tag=10\n", "1151943332765077504 video https://video.twimg.com/ext_tw_video/1151943332765077504/pu/vid/640x360/C6hT106WgxZQMc4-.mp4?tag=10\n", "1151917573556047872 video https://video.twimg.com/ext_tw_video/1151917573556047872/pu/vid/360x480/5Jn3nCK97TBBBOOV.mp4?tag=10\n", "1151852312937345024 video https://video.twimg.com/amplify_video/1151852312937345024/vid/480x480/t5cuBF6GZd6m23bg.mp4?tag=13\n", "1151748079714000896 video https://video.twimg.com/ext_tw_video/1151748079714000896/pu/vid/624x360/aXrFv6R7B4477Q11.mp4?tag=10\n", "1151994656755838976 video https://video.twimg.com/ext_tw_video/1151994656755838976/pu/vid/360x636/J12NFLS7_wAQAQMc.mp4?tag=10\n", "1150958472365793283 video https://video.twimg.com/ext_tw_video/1150958472365793283/pu/vid/400x256/jkw-bRV0uFd4usvO.mp4?tag=10\n", "1151689643332648961 video https://video.twimg.com/ext_tw_video/1151689643332648961/pu/vid/224x400/CBhrh7Lz54SZdMIy.mp4?tag=10\n", "1151724833685811201 video https://video.twimg.com/ext_tw_video/1151724833685811201/pu/vid/640x360/lOT0sWf1arYBJvPw.mp4?tag=10\n", "1151998802485661696 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_y4zqvVUAAqmYL.jpg\n", "1151993431243161601 video https://video.twimg.com/ext_tw_video/1151993431243161601/pu/vid/846x360/c68adcGTwgP-82PJ.mp4?tag=10\n", "1150823476040806400 animated_gif http://pbs.twimg.com/tweet_video_thumb/D_iL2qiXsAAiNIi.jpg\n", "1151872985860661248 video https://video.twimg.com/ext_tw_video/1151872985860661248/pu/vid/480x480/mTitNMHBB7s8Xpc3.mp4?tag=10\n", "1151715193715601408 video https://video.twimg.com/ext_tw_video/1151715193715601408/pu/vid/640x360/fmUlt5qa5YQdaZ9k.mp4?tag=10\n", "1151940191751307264 video https://video.twimg.com/ext_tw_video/1151940191751307264/pu/vid/360x778/FGCXmHI0r32dFKdO.mp4?tag=10\n", "1151263998849011712 video https://video.twimg.com/ext_tw_video/1151263998849011712/pu/vid/360x640/AgskmFTJlu7CI68U.mp4?tag=10\n", "1151660789956337664 video https://video.twimg.com/ext_tw_video/1151660789956337664/pu/vid/360x576/D-KmT16S0ioR9LVZ.mp4?tag=10\n", "1151995772746944512 video https://video.twimg.com/ext_tw_video/1151995772746944512/pu/vid/360x640/5WC5dxfewU5B7emM.mp4?tag=10\n", "1151988482492907520 video https://video.twimg.com/ext_tw_video/1151988482492907520/pu/vid/640x360/I4baL1gZEgz_A87D.mp4?tag=10\n", "1151651329170976768 video https://video.twimg.com/ext_tw_video/1151651329170976768/pu/vid/640x360/I699VCwQYeR7t99b.mp4?tag=10\n", "1151506558343372800 video https://video.twimg.com/ext_tw_video/1151506558343372800/pu/vid/360x456/Whqhxj0LoFcBMoOB.mp4?tag=10\n", "1151638906212892672 video https://video.twimg.com/ext_tw_video/1151638906212892672/pu/vid/480x480/sjJsT1WpYUotjpVw.mp4?tag=10\n", "1151996298712666112 video https://video.twimg.com/ext_tw_video/1151996298712666112/pu/vid/360x640/KsEq6fEivATHsTFk.mp4?tag=10\n", "1150837429189906433 video https://video.twimg.com/ext_tw_video/1150837429189906433/pu/vid/360x576/kVN3DTLeq_-Sdo1c.mp4?tag=10\n", "1151996049499627520 video https://video.twimg.com/ext_tw_video/1151996049499627520/pu/vid/360x634/leb-2_2MqfJLnQ4U.mp4?tag=10\n" ] }, { "ename": "KeyboardInterrupt", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcount\u001b[0m\u001b[0;34m%\u001b[0m\u001b[0;36m200000\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Done with: {count}\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 14\u001b[0;31m \u001b[0mtweet\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTweetAnalyzer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata_file\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreadline\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 15\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtweet\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhasMedia\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;31m# print(\"HasMedia\",tweet.hasMedia)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/home/rickyrenuncia/RickyRenuncia-case-module_shared/tweet_rehydrate/analysis.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, data)\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 96\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 97\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mjson\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloads\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 98\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mextractMeta\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 99\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/lib/python3.8/json/__init__.py\u001b[0m in \u001b[0;36mloads\u001b[0;34m(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)\u001b[0m\n\u001b[1;32m 355\u001b[0m \u001b[0mparse_int\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mparse_float\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32mand\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 356\u001b[0m parse_constant is None and object_pairs_hook is None and not kw):\n\u001b[0;32m--> 357\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_default_decoder\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 358\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcls\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 359\u001b[0m \u001b[0mcls\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mJSONDecoder\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/lib/python3.8/json/decoder.py\u001b[0m in \u001b[0;36mdecode\u001b[0;34m(self, s, _w)\u001b[0m\n\u001b[1;32m 335\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 336\u001b[0m \"\"\"\n\u001b[0;32m--> 337\u001b[0;31m \u001b[0mobj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mraw_decode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0midx\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0m_w\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 338\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_w\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mend\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 339\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/lib/python3.8/json/decoder.py\u001b[0m in \u001b[0;36mraw_decode\u001b[0;34m(self, s, idx)\u001b[0m\n\u001b[1;32m 351\u001b[0m \"\"\"\n\u001b[1;32m 352\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 353\u001b[0;31m \u001b[0mobj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscan_once\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0midx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 354\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 355\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mJSONDecodeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Expecting value\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0ms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyboardInterrupt\u001b[0m: " ] } ], "source": [ "# Find a video tweet\n", "import json\n", "from tweet_rehydrate.analysis import TweetJLAnalyzer, TweetAnalyzer, getsizeof\n", "from random import randint\n", "JL_DATA=\"/home/rickyrenuncia/tweetsRickyRenuncia-final.jsonl\"\n", "SAMPLE_SIZE = 1113758\n", "count = 0\n", "media_ids=[]\n", "with open(JL_DATA,'r') as data_file:\n", " for _ in range(SAMPLE_SIZE):\n", " count+=1\n", " if count%200000 == 0:\n", " print(f\"Done with: {count}\")\n", " tweet = TweetAnalyzer(data_file.readline())\n", " if tweet.hasMedia:\n", "# print(\"HasMedia\",tweet.hasMedia)\n", " if len(tweet.media) > 0:\n", " for m in tweet.media:\n", " if m.mtype().lower() != \"photo\" and m.id not in media_ids:\n", " media_ids.append(m.id)\n", " print(m.id, m.mtype(), m.url())\n", "# print(m.data)\n", " else:\n", " print(\"Length 0??\")\n", " try:\n", " print(tweet.data[\"entities\"][\"media\"])\n", " except:\n", " print(\"No Media at HERE\")\n", " try:\n", " print(tweet.data[\"retweeted_status\"][\"entities\"][\"media\"])\n", " except:\n", " print(\"No Media at RETWEET_STATUS\")\n", " print(json.dumps(tweet.data))\n", " break\n", "print(f\"DONE: {count}\")\n" ] }, { "cell_type": "markdown", "id": "sapphire-symposium", "metadata": {}, "source": [ "## Beautiful Imagery\n", "

Title

Foot Notes

\n" ] }, { "cell_type": "code", "execution_count": null, "id": "opposed-midwest", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }