{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Crowd Tangle LDA Evaluation Workflow " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Import text and stop words" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "from util import read_crowdtangle_files, create_corpus\n", "import time\n", "from datetime import timedelta\n", "from pprint import pprint\n", "from gensim.models.wrappers import LdaMallet\n", "import pickle\n", "from gensim.corpora.mmcorpus import MmCorpus\n", "\n", "#Specify path to input and output directories\n", "input_dir = '/Users/dankoban/Documents/EM6575/LDAInput'\n", "output_dir = '/Users/dankoban/Documents/EM6575/LDAOutput'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 1/1 [00:03<00:00, 3.09s/it]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "--- 0:00:03.095897 time elapsed ---\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Extract file names from input directory\n", "files = [file for file in os.listdir(input_dir) if file.endswith(\".csv\")] \n", "file_paths = [input_dir + \"/\" + file for file in files]\n", "\n", "# Select only n files for testing\n", "file_paths = file_paths[0:1]\n", "\n", "start_time = time.time()\n", "df = read_crowdtangle_files(file_paths)\n", "print(\"--- %s time elapsed ---\" % str(timedelta(seconds=time.time() - start_time)))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "130193\n" ] }, { "data": { "text/html": [ "
| \n", " | Facebook Id | \n", "Text | \n", "
|---|---|---|
| 0 | \n", "624614494274945 | \n", "Nika Vetsko, excerpts: ...Many researchers believe that Russia is trying to increase this traffic in Georgia, having already been active in fuelling anti-vaccination conspiracy theories. Some link this directly to the countrys measles outbreak last year. ...Russia has also revived conspiracy theories around the Lugar Laboratory, a US fi ced high-tech research centre in Tbilisi. Over the years, Russian authorities and media have worked to discredit the lab and US-Georgia relations more widely. Is Russia Exploiting Coronavirus Fears In Georgia? By Nika Vetsko* Experts warn that Russia is exploiting the recent appearance of coronavirus in Georgia to spread a new wave of disinformation and conspiracy theories. Georgia has registered only 15 | \n", "
| 1 | \n", "26781952138 | \n", "The capitals first Covid-19 patient, a 45-year-old man from Mayur Vihar Phase II, has recovered fully from the viral infection. He was discharged from Ram Manohar Lohia Hospital on Saturday, said a source. Delhis first coronavirus patient recovers fully The capitals first Covid-19 patient, a 45-year-old man from Mayur Vihar Phase II, has recovered fully from the viral infection. He was discharged fro | \n", "
| 2 | \n", "251907774312 | \n", "The coronavirus pandemic is yet to force widespread school shutdowns but many families are voluntarily withdrawing their children. 'I'm happy to be a small drop': Families withdrawing children from school to fight coronavirus The coronavirus pandemic is yet to force widespread school shutdowns but across Sydney, many families are voluntarily withdrawing their children. | \n", "
| 3 | \n", "138280549589759 | \n", "The safety and well-being of our community and the Brothers Fish&chips family is always the top priority. In challenging times like this, we are faced with many uncertainties. However, one thing that is certain is that together as a community we will overcome this situation and wed like to reassure that we are following CDC recommended guidelines regarding coronavirus, COVID-19 to keep you and our family safe as much as we can! #ossining #croton #briarcliff #westchester #lohudfood We are temporarily offering prepaid delivery and curb side pick-up. Call (914) 488-5141 to place your order and before arrival. Timeline Photos | \n", "
| 4 | \n", "32204506174 | \n", "With the coronavirus spreading across the globe @carynceolin with how the White House is trying to prevent it from spreading around the West Wing. Trump tested negative for COVID-19 - CityNews Toronto As the coronavirus inches closer to President Trump, Caryn Ceolin with how the White House is trying to prevent it from spreading around the West Wing. | \n", "