{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "***\n", "***\n", "# 数据清洗之推特数据\n", "***\n", "***\n", "\n", "王成军\n", "\n", "wangchengjun@nju.edu.cn\n", "\n", "计算传播网 http://computational-communication.com" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 数据清洗(data cleaning)\n", "是数据分析的重要步骤,其主要目标是将混杂的数据清洗为可以被直接分析的数据,一般需要将数据转化为数据框(data frame)的样式。\n", "\n", "本章将以推特文本的清洗作为例子,介绍数据清洗的基本逻辑。\n", "\n", "- 清洗错误行\n", "- 正确分列\n", "- 提取所要分析的内容\n", "- 介绍通过按行、chunk的方式对大规模数据进行预处理\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 1. 抽取tweets样本做实验\n", "此节学生略过" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2752\n" ] } ], "source": [ "bigfile = open('/Users/chengjun/百度云同步盘/Writing/OWS/ows-raw.txt', 'r')\n", "chunkSize = 1000000\n", "chunk = bigfile.readlines(chunkSize)\n", "print(len(chunk))\n", "with open(\"/Users/chengjun/GitHub/cjc/data/ows_tweets_sample.txt\", 'w') as f:\n", " for i in chunk:\n", " f.write(i) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Lazy Method for Reading Big File in Python?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:31:51.644484Z", "start_time": "2019-06-08T07:30:56.170308Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 262665\n", "1 525130\n", "2 787344\n", "3 1049351\n", "4 1312571\n", "5 1574666\n", "6 1835628\n", "7 2097136\n", "8 2358494\n", "9 2619723\n", "10 2880857\n", "11 3140945\n", "12 3404775\n", "13 3665565\n", "14 3927996\n", "15 4189419\n", "16 4449078\n", "17 4709001\n", "18 4969877\n", "19 5230937\n", "20 5492578\n", "21 5756613\n", "22 6022478\n", "23 6286119\n", "24 6549476\n", "25 6602141\n" ] } ], "source": [ "# https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python?lq=1\n", "import csv\n", "bigfile = open('/Users/datalab/bigdata/cjc/ows-raw.txt', 'r')\n", "\n", "chunkSize = 10**8\n", "chunk = bigfile.readlines(chunkSize)\n", "num, num_lines = 0, 0\n", "while chunk:\n", " lines = csv.reader((line.replace('\\x00','') for line in chunk), \n", " delimiter=',', quotechar='\"')\n", " #do sth.\n", " num_lines += len(list(lines))\n", " print(num, num_lines)\n", " num += 1\n", " chunk = bigfile.readlines(chunkSize) # read another chunk" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 字节(Byte /bait/)\n", "\n", "计算机信息技术用于计量存储容量的一种计量单位,通常情况下一字节等于有八位, [1] 也表示一些计算机编程语言中的数据类型和语言字符。\n", "- 1B(byte,字节)= 8 bit;\n", "- 1KB=1000B;1MB=1000KB=1000×1000B。其中1000=10^3。\n", "- 1KB(kilobyte,千字节)=1000B= 10^3 B;\n", "- 1MB(Megabyte,兆字节,百万字节,简称“兆”)=1000KB= 10^6 B;\n", "- 1GB(Gigabyte,吉字节,十亿字节,又称“千兆”)=1000MB= 10^9 B;" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 用Pandas的get_chunk功能来处理亿级数据\n", "\n", "> 只有在超过5TB数据量的规模下,Hadoop才是一个合理的技术选择。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "f = open('../bigdata/OWS/ows-raw.txt',encoding='utf-8')\n", "reader = pd.read_table(f, sep=',', iterator=True, error_bad_lines=False) #跳过报错行\n", "loop = True\n", "chunkSize = 100000\n", "data = []\n", "\n", "while loop:\n", " try:\n", " chunk = reader.get_chunk(chunkSize)\n", " dat = data_cleaning_funtion(chunk) # do sth.\n", " data.append(dat) \n", " except StopIteration:\n", " loop = False\n", " print(\"Iteration is stopped.\")\n", "\n", "df = pd.concat(data, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 2. 清洗错行的情况" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:42:24.661108Z", "start_time": "2019-06-08T07:42:24.648304Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with open(\"../data/ows_tweets_sample.txt\", 'r') as f:\n", " lines = f.readlines() " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:42:28.452634Z", "start_time": "2019-06-08T07:42:28.441018Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2753" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 总行数\n", "len(lines)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:42:32.821269Z", "start_time": "2019-06-08T07:42:32.816918Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'121813245488140288,\"@HumanityCritic i\\'m worried that the #ows sells out to the hamsher-norquist spitefuck, and tries to unite with the teahad.\",http://a2.twimg.com/profile_images/627683576/flytits_normal.jpg,2011-10-06,5,5,\"2011-10-06 05:05:15\",N;,fucentarmal,27480502,en,HumanityCritic,230431,\"<a href="http://www.tweetdeck.com" rel="nofollow">TweetDeck</a>\"\\n'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 查看第一行\n", "lines[15]" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function split:\n", "\n", "split(...) method of builtins.str instance\n", " S.split(sep=None, maxsplit=-1) -> list of strings\n", " \n", " Return a list of the words in S, using sep as the\n", " delimiter string. If maxsplit is given, at most maxsplit\n", " splits are done. If sep is not specified or is None, any\n", " whitespace string is a separator and empty strings are\n", " removed from the result.\n", "\n" ] } ], "source": [ "help(lines[1].split)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 问题: 第一行是变量名\n", "> ## 1. 如何去掉换行符?\n", "> ## 2. 如何获取每一个变量名?\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:43:39.363547Z", "start_time": "2019-06-08T07:43:39.358317Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['\"Twitter ID\"',\n", " 'Text',\n", " '\"Profile Image URL\"',\n", " 'Day',\n", " 'Hour',\n", " 'Minute',\n", " '\"Created At\"',\n", " 'Geo',\n", " '\"From User\"',\n", " '\"From User ID\"',\n", " 'Language',\n", " '\"To User\"',\n", " '\"To User ID\"',\n", " 'Source']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "varNames = lines[0].replace('\\n', '').split(',')\n", "varNames" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:43:49.131388Z", "start_time": "2019-06-08T07:43:49.127319Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "14" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(varNames)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:43:53.979866Z", "start_time": "2019-06-08T07:43:53.975920Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'121818600490283009,\"RT @chachiTHEgr8: RT @TheNewDeal: First they ignore you, then they laugh at you, then they fight you, then you win. - Gandhi #OccupyWallStreet #OWS #p2\",http://a0.twimg.com/profile_images/326662126/Photo_233_normal.jpg,2011-10-06,5,26,\"2011-10-06 05:26:32\",N;,k_l_h_j,382233343,en,,0,\"<a href="http://twitter.com/#!/download/iphone" rel="nofollow">Twitter for iPhone</a>\"\\n'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lines[1344]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 如何来处理错误换行情况?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T10:57:03.746530Z", "start_time": "2018-04-28T10:57:03.727339Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with open(\"../data/ows_tweets_sample_clean.txt\", 'w') as f:\n", " right_line = '' # 正确的行,它是一个空字符串\n", " blocks = [] # 确认为正确的行会被添加到blocks里面\n", " for line in lines:\n", " right_line += line.replace('\\n', ' ')\n", " line_length = len(right_line.split(','))\n", " if line_length >= 14:\n", " blocks.append(right_line)\n", " right_line = '' \n", " for i in blocks:\n", " f.write(i + '\\n')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T10:57:07.915900Z", "start_time": "2018-04-28T10:57:07.911441Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2627" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(blocks)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T10:57:16.586149Z", "start_time": "2018-04-28T10:57:16.582151Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'121818879105310720,\"RT @Min_Reyes: RT @The99Percenters: New video to go viral. From We Are Change http://t.co/6Ff718jk Listen to the guy begging... #ows #cdnpoli\",http://a3.twimg.com/sticky/default_profile_images/default_profile_0_normal.png,2011-10-06,5,27,\"2011-10-06 05:27:38\",N;,MiyazakiMegu,260948518,en,,0,\"<a href="http://www.tweetdeck.com" rel="nofollow">TweetDeck</a>\" '" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "blocks[1344]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 同时考虑分列符和引用符\n", "\n", "- 分列符🔥分隔符:sep, delimiter\n", "- 引用符☁️:quotechar\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:51:20.459071Z", "start_time": "2019-06-08T07:51:20.453871Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['121813245488140288',\n", " \"@HumanityCritic i'm worried that the #ows sells out to the hamsher-norquist spitefuck, and tries to unite with the teahad.\",\n", " 'http://a2.twimg.com/profile_images/627683576/flytits_normal.jpg,2011-10-06,5,5',\n", " '2011-10-06 05:05:15',\n", " 'N;,fucentarmal,27480502,en,HumanityCritic,230431',\n", " '<a href="http://www.tweetdeck.com" rel="nofollow">TweetDeck</a>\"\\n']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "re.split(',\"|\",', lines[15])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:52:33.453629Z", "start_time": "2019-06-08T07:52:33.441462Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "line = 35 length = 6\n", "line = 36 length = 6\n", "line = 37 length = 6\n", "line = 38 length = 6\n", "line = 39 length = 6\n", "line = 40 length = 6\n", "line = 41 length = 2\n", "line = 42 length = 5\n", "line = 43 length = 6\n", "line = 44 length = 6\n", "line = 45 length = 6\n", "line = 46 length = 6\n", "line = 47 length = 6\n", "line = 48 length = 2\n", "line = 49 length = 5\n" ] } ], "source": [ "import re\n", "\n", "with open(\"../data/ows_tweets_sample.txt\",'r') as f:\n", " lines = f.readlines()\n", " \n", "for i in range(35,50):\n", " i_ = re.split(',\"|\",', lines[i])\n", " print('line =',i,' length =', len(i_))\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:54:54.976462Z", "start_time": "2019-06-08T07:54:54.944533Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with open(\"../data/ows_tweets_sample_clean4.txt\", 'w') as f:\n", " right_line = '' # 正确的行,它是一个空字符串\n", " blocks = [] # 确认为正确的行会被添加到blocks里面\n", " for line in lines:\n", " right_line += line.replace('\\n', ' ').replace('\\r', ' ')\n", " #line_length = len(right_line.split(','))\n", " i_ = re.split(',\"|\",', right_line)\n", " line_length = len(i_)\n", " if line_length >= 6:\n", " blocks.append(right_line)\n", " right_line = ''\n", "# for i in blocks:\n", "# f.write(i + '\\n')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:54:59.860355Z", "start_time": "2019-06-08T07:54:59.856381Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2626" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(blocks)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 3. 读取数据、正确分列" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:55:54.719495Z", "start_time": "2019-06-08T07:55:54.712843Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# 提示:你可能需要修改以下路径名\n", "with open(\"../data/ows_tweets_sample.txt\", 'r') as f:\n", " chunk = f.readlines()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:55:57.501462Z", "start_time": "2019-06-08T07:55:57.497278Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2753" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(chunk)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:56:00.549021Z", "start_time": "2019-06-08T07:56:00.544656Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['\"Twitter ID\",Text,\"Profile Image URL\",Day,Hour,Minute,\"Created At\",Geo,\"From User\",\"From User ID\",Language,\"To User\",\"To User ID\",Source\\n',\n", " '121813144174727168,\"RT @AnonKitsu: ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE RT !!HELP!!!!\",http://a2.twimg.com/profile_images/1539375713/Twitter_normal.jpg,2011-10-06,5,4,\"2011-10-06 05:04:51\",N;,Anonops_Cop,401240477,en,,0,\"<a href="http://twitter.com/">web</a>\"\\n',\n", " '121813146137657344,\"@jamiekilstein @allisonkilkenny Interesting interview (never aired, wonder why??) by Fox with #ows protester http://t.co/Fte55Kh7\",http://a2.twimg.com/profile_images/1574715503/Kate6_normal.jpg,2011-10-06,5,4,\"2011-10-06 05:04:51\",N;,KittyHybrid,34532053,en,jamiekilstein,2149053,\"<a href="http://twitter.com/">web</a>\"\\n']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chunk[:3]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:56:05.677057Z", "start_time": "2019-06-08T07:56:05.656929Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2627\n" ] } ], "source": [ "import csv\n", "lines_csv = csv.reader(chunk, delimiter=',', quotechar='\"') \n", "print(len(list(lines_csv)))\n", "# next(lines_csv)\n", "# next(lines_csv)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2018-04-29T01:12:38.678653Z", "start_time": "2018-04-29T01:12:38.611535Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import re\n", "import csv\n", "\n", "from collections import defaultdict\n", "\n", "def extract_rt_user(tweet):\n", " rt_patterns = re.compile(r\"(RT|via)((?:\\b\\W*@\\w+)+)\", re.IGNORECASE)\n", " rt_user_name = rt_patterns.findall(tweet)\n", " if rt_user_name:\n", " rt_user_name = rt_user_name[0][1].strip(' @')\n", " else:\n", " rt_user_name = None\n", " return rt_user_name\n", "\n", "rt_network = defaultdict(int)\n", "f = open(\"../data/ows_tweets_sample.txt\", 'r')\n", "chunk = f.readlines(100000)\n", "while chunk: \n", " #lines = csv.reader(chunk, delimiter=',', quotechar='\"') \n", " lines = csv.reader((line.replace('\\x00','') for line in chunk), delimiter=',', quotechar='\"')\n", " for line in lines:\n", " tweet = line[1]\n", " from_user = line[8]\n", " rt_user = extract_rt_user(tweet)\n", " rt_network[(from_user, rt_user)] += 1 \n", " chunk = f.readlines(100000)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:56:22.886245Z", "start_time": "2019-06-08T07:56:22.198448Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Twitter IDTextProfile Image URLDayHourMinuteCreated AtGeoFrom UserFrom User IDLanguageTo UserTo User IDSource
0121813144174727168RT @AnonKitsu: ALERT!!!!!!!!!!COPS ARE KETTLIN...http://a2.twimg.com/profile_images/1539375713/...2011-10-06542011-10-06 05:04:51N;Anonops_Cop401240477enNaN0&lt;a href=&quot;http://twitter.com/&quot;&gt;...
1121813146137657344@jamiekilstein @allisonkilkenny Interesting in...http://a2.twimg.com/profile_images/1574715503/...2011-10-06542011-10-06 05:04:51N;KittyHybrid34532053enjamiekilstein2149053&lt;a href=&quot;http://twitter.com/&quot;&gt;...
2121813150000619521@Seductivpancake Right! Those guys have a vict...http://a1.twimg.com/profile_images/1241412831/...2011-10-06542011-10-06 05:04:52N;nerdsherpa95067344enSeductivpancake19695580&lt;a href=&quot;http://www.echofon.com/&quot;...
\n", "
" ], "text/plain": [ " Twitter ID Text \\\n", "0 121813144174727168 RT @AnonKitsu: ALERT!!!!!!!!!!COPS ARE KETTLIN... \n", "1 121813146137657344 @jamiekilstein @allisonkilkenny Interesting in... \n", "2 121813150000619521 @Seductivpancake Right! Those guys have a vict... \n", "\n", " Profile Image URL Day Hour \\\n", "0 http://a2.twimg.com/profile_images/1539375713/... 2011-10-06 5 \n", "1 http://a2.twimg.com/profile_images/1574715503/... 2011-10-06 5 \n", "2 http://a1.twimg.com/profile_images/1241412831/... 2011-10-06 5 \n", "\n", " Minute Created At Geo From User From User ID Language \\\n", "0 4 2011-10-06 05:04:51 N; Anonops_Cop 401240477 en \n", "1 4 2011-10-06 05:04:51 N; KittyHybrid 34532053 en \n", "2 4 2011-10-06 05:04:52 N; nerdsherpa 95067344 en \n", "\n", " To User To User ID \\\n", "0 NaN 0 \n", "1 jamiekilstein 2149053 \n", "2 Seductivpancake 19695580 \n", "\n", " Source \n", "0 <a href="http://twitter.com/">... \n", "1 <a href="http://twitter.com/">... \n", "2 <a href="http://www.echofon.com/"... " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv(\"../data/ows_tweets_sample.txt\",\n", " sep = ',', quotechar='\"')\n", "df[:3]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:57:21.705488Z", "start_time": "2019-06-08T07:57:21.701307Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2626" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:57:25.595447Z", "start_time": "2019-06-08T07:57:25.588512Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'RT @AnonKitsu: ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE RT !!HELP!!!!'" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.Text[0]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T11:06:31.097919Z", "start_time": "2018-04-28T11:06:31.092038Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "0 Anonops_Cop\n", "1 KittyHybrid\n", "2 nerdsherpa\n", "3 hamudistan\n", "4 kl_knox\n", "5 vickycrampton\n", "6 burgerbuilders\n", "7 neverfox\n", "8 davidgaliel\n", "9 AnonOws\n", "Name: From User, dtype: object" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['From User'][:10]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 4. 统计数量\n", "### 统计发帖数量所对应的人数的分布\n", "> 人数在发帖数量方面的分布情况" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:59:11.081963Z", "start_time": "2019-06-08T07:59:11.076747Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from collections import defaultdict\n", "data_dict = defaultdict(int)\n", "for i in df['From User']:\n", " data_dict[i] +=1 " ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:59:11.737607Z", "start_time": "2019-06-08T07:59:11.706495Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[('MiranHosny', 1),\n", " ('BradMarston', 1),\n", " ('Sir_Richard_311', 1),\n", " ('elChepi', 1),\n", " ('jboy', 1)]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(data_dict.items())[:5]\n", "#data_dict" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:59:23.202541Z", "start_time": "2019-06-08T07:59:22.945172Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### 安装微软雅黑字体\n", "为了在绘图时正确显示中文,需要安装/data/文件夹中的微软雅黑字体(msyh.ttf)\n", "\n", "详见[common questions](0.common_questions.ipynb)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:59:28.182593Z", "start_time": "2019-06-08T07:59:27.976785Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAETCAYAAAD6R0vDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvDW2N/gAAErRJREFUeJzt3X+sX/V93/HnazhkS1oFA1eM2G5NF1aJVtqCrghdugiVjgCJYlqlEVHUOAmSFYlsybIpcRqpVJsqwbo1S7aKyQssZkL5sTQZVus0cUmiaH9AYygh/EjGhUKxZ/BtoKQd6lLDe398PyZfLvfaF/vj7znXfj6kr77nfD6f77nve/y9evl8zvmeb6oKSZJ6+jtDFyBJOvkYLpKk7gwXSVJ3hoskqTvDRZLUneEiSerOcJEkdWe4SJK6M1wkSd2tG7qAoZx99tm1efPmocuQpDXlrrvu+ouqmjvauFM2XDZv3szevXuHLkOS1pQkj61mnNNikqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3Q0aLkluTnIwyX1Tbb+T5HtJ7k3y5SRnTPV9LMlCku8nefNU++WtbSHJ9ln/HpKkFxv6E/qfAf4zcMtU2x7gY1V1KMkNwMeAjya5ALga+DngtcAfJ/mH7TW/B/wzYB/w7SS7quqBE1n45u1/eCI3v6JHr3/LID9Xkl6OQY9cqupbwFNL2r5WVYfa6h3Axra8BfhcVf2/qvozYAG4qD0WquqRqvoR8Lk2VpI0kLGfc3kf8JW2vAF4fKpvX2tbqV2SNJDRhkuSjwOHgFs7bnNbkr1J9i4uLvbarCRpiVGGS5L3AG8F3lVV1Zr3A5umhm1sbSu1v0RV7aiq+aqan5s76h2jJUnHaHThkuRy4CPA26rq2amuXcDVSV6Z5DzgfOBPgG8D5yc5L8npTE7675p13ZKkHxv0arEknwUuAc5Osg+4jsnVYa8E9iQBuKOq3l9V9yf5AvAAk+mya6vqubadDwBfBU4Dbq6q+2f+y0iSXjBouFTVO5dpvukI438b+O1l2ncDuzuWJkk6DqObFpMkrX2GiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneDhkuSm5McTHLfVNuZSfYkeag9r2/tSfKpJAtJ7k1y4dRrtrbxDyXZOsTvIkn6saGPXD4DXL6kbTtwe1WdD9ze1gGuAM5vj23AjTAJI+A64A3ARcB1hwNJkjSMQcOlqr4FPLWkeQuwsy3vBK6aar+lJu4AzkhyLvBmYE9VPVVVTwN7eGlgSZJmaOgjl+WcU1UH2vITwDlteQPw+NS4fa1tpXZJ0kDGGC4vqKoCqtf2kmxLsjfJ3sXFxV6blSQtMcZwebJNd9GeD7b2/cCmqXEbW9tK7S9RVTuqar6q5ufm5roXLkmaGGO47AIOX/G1Fbhtqv3d7aqxi4Fn2vTZV4HLkqxvJ/Iva22SpIGsG/KHJ/kscAlwdpJ9TK76uh74QpJrgMeAd7Thu4ErgQXgWeC9AFX1VJJ/C3y7jfs3VbX0IgFJ0gwNGi5V9c4Vui5dZmwB166wnZuBmzuWJkk6DmOcFpMkrXGGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUnejDZck/zLJ/UnuS/LZJH83yXlJ7kyykOTzSU5vY1/Z1hda/+Zhq5ekU9sowyXJBuBfAPNV9fPAacDVwA3AJ6rqdcDTwDXtJdcAT7f2T7RxkqSBjDJcmnXA30uyDngVcAD4JeCLrX8ncFVb3tLWaf2XJskMa5UkTRlluFTVfuDfA3/OJFSeAe4C/rKqDrVh+4ANbXkD8Hh77aE2/qxZ1ixJ+rFRhkuS9UyORs4DXgu8Gri8w3a3JdmbZO/i4uLxbk6StIJRhgvwy8CfVdViVf0t8CXgjcAZbZoMYCOwvy3vBzYBtP7XAD9YutGq2lFV81U1Pzc3d6J/B0k6ZY01XP4cuDjJq9q5k0uBB4BvAG9vY7YCt7XlXW2d1v/1qqoZ1itJmjLKcKmqO5mcmL8b+C6TOncAHwU+nGSByTmVm9pLbgLOau0fBrbPvGhJ0gvWHX3IMKrqOuC6Jc2PABctM/ZvgF+bRV2SpKMb5ZGLJGltM1wkSd0ZLpKk7gwXSVJ3hoskqTvDRZLUneEiSerOcJEkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3hoskqbsj3hU5yQYm361yrAI8D/xUVf2f49iOJGkNWc0t9wNsBp47hu2H4wsnSdIatJpwKWBfVT1/LD9g8kWSkqRTiedcJEndGS6SpO4MF0lSd4aLJKk7w0WS1N1qrhYD+Kkkx3S1GJOrzSRJp5DVfs7l4fYsSdJRHTFcqmo/Tp1Jkl4mg0OS1N0RwyXJhiTPHcfj+SSHkrz25RaW5IwkX0zyvSQPJvmFJGcm2ZPkofa8vo1Nkk8lWUhyb5ILj3WHSJKO35jvLfZJ4I+q6u1JTgdeBfwGcHtVXZ9kO7Ad+ChwBXB+e7wBuLE9S5IGMMp7iyV5DfAm4D0AVfUj4EdJtgCXtGE7gW8yCZctwC1VVcAd7ajn3Ko6cCw1S5KOz1jPuZwHLAL/LcmfJvl0klcD50wFxhPAOW15A/D41Ov3tTZJ0gDGGi7rgAuBG6vq9cD/ZTIF9oJ2lPKyPkOTZFuSvUn2Li4uditWkvRiYw2XfUym4u5s619kEjZPJjkXoD0fbP37gU1Tr9/Y2l6kqnZU1XxVzc/NzZ2w4iXpVDfKcKmqJ4DHk/xsa7oUeADYBWxtbVuB29ryLuDd7aqxi4FnPN8iScNZ7e1fhvDPgVvblWKPAO9lEoZfSHIN8BjwjjZ2N3AlsAA828ZKkgYy2nuLVdU9wPwyXZcuM7aAa4/l50iS+vPeYpKk7ry3mCSpO4NDktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUnejDpckpyX50yR/0NbPS3JnkoUkn09yemt/ZVtfaP2bh6xbkk51ow4X4IPAg1PrNwCfqKrXAU8D17T2a4CnW/sn2jhJ0kBGGy5JNgJvAT7d1gP8EvDFNmQncFVb3tLWaf2XtvGSpAGMNlyA/wh8BHi+rZ8F/GVVHWrr+4ANbXkD8DhA63+mjZckDWCU4ZLkrcDBqrqr83a3JdmbZO/i4mLPTUuSpowyXIA3Am9L8ijwOSbTYZ8Ezkiyro3ZCOxvy/uBTQCt/zXAD5ZutKp2VNV8Vc3Pzc2d2N9Akk5howyXqvpYVW2sqs3A1cDXq+pdwDeAt7dhW4Hb2vKutk7r/3pV1QxLliRNGWW4HMFHgQ8nWWByTuWm1n4TcFZr/zCwfaD6JEnAuqMPGVZVfRP4Zlt+BLhomTF/A/zaTAuTJK1orR25SJLWAMNFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpu1GGS5JNSb6R5IEk9yf5YGs/M8meJA+15/WtPUk+lWQhyb1JLhz2N5CkU9sowwU4BPyrqroAuBi4NskFwHbg9qo6H7i9rQNcAZzfHtuAG2dfsiTpsFGGS1UdqKq72/JfAQ8CG4AtwM42bCdwVVveAtxSE3cAZyQ5d8ZlS5KaUYbLtCSbgdcDdwLnVNWB1vUEcE5b3gA8PvWyfa1NkjSAUYdLkp8Afh/4UFX9cLqvqgqol7m9bUn2Jtm7uLjYsVJJ0rTRhkuSVzAJllur6kut+cnD013t+WBr3w9smnr5xtb2IlW1o6rmq2p+bm7uxBUvSae4UYZLkgA3AQ9W1e9Ode0CtrblrcBtU+3vbleNXQw8MzV9JkmasXVDF7CCNwK/Dnw3yT2t7TeA64EvJLkGeAx4R+vbDVwJLADPAu+dbbmSpGmjDJeq+l9AVui+dJnxBVx7QouSJK3aKKfFJElrm+EiSerOcJEkdWe4SJK6M1wkSd0ZLpKk7gwXSVJ3o/yci1a2efsfDvazH73+LYP9bElri0cukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSdN67Uqg1100xvmCmtPR65SJK6M1wkSd0ZLpKk7gwXSVJ3J9UJ/SSXA58ETgM+XVXXD1yS1jC/9VM6difNkUuS04DfA64ALgDemeSCYauSpFPTyXTkchGwUFWPACT5HLAFeGDQqnTchjyCkHRsTqZw2QA8PrW+D3jDQLVIx8VA1Yk0i2nXkylcjirJNmBbW/3rJN8fsp6X4WzgL4Yu4mVYa/WCNc/KWqt5rdULq6g5NxzX9n96NYNOpnDZD2yaWt/Y2l5QVTuAHbMsqocke6tqfug6Vmut1QvWPCtrrea1Vi+Mp+aT5oQ+8G3g/CTnJTkduBrYNXBNknRKOmmOXKrqUJIPAF9lcinyzVV1/8BlSdIp6aQJF4Cq2g3sHrqOE2CtTeWttXrBmmdlrdW81uqFkdScqhq6BknSSeZkOuciSRoJw2UEkmxK8o0kDyS5P8kHlxlzSZJnktzTHr85RK1Lano0yXdbPXuX6U+STyVZSHJvkguHqHOqnp+d2n/3JPlhkg8tGTP4fk5yc5KDSe6bajszyZ4kD7Xn9Su8dmsb81CSrQPW+ztJvtf+3b+c5IwVXnvE99CMa/6tJPun/u2vXOG1lyf5fntfbx+45s9P1ftokntWeO3s93NV+Rj4AZwLXNiWfxL438AFS8ZcAvzB0LUuqelR4Owj9F8JfAUIcDFw59A1T9V2GvAE8NNj28/Am4ALgfum2v4dsL0tbwduWOZ1ZwKPtOf1bXn9QPVeBqxryzcsV+9q3kMzrvm3gH+9ivfNw8DPAKcD31n6tzrLmpf0/wfgN8eynz1yGYGqOlBVd7flvwIeZHLHgbVuC3BLTdwBnJHk3KGLai4FHq6qx4YuZKmq+hbw1JLmLcDOtrwTuGqZl74Z2FNVT1XV08Ae4PITVmizXL1V9bWqOtRW72DyubPRWGEfr8YLt5mqqh8Bh28zdcIdqeYkAd4BfHYWtayG4TIySTYDrwfuXKb7F5J8J8lXkvzcTAtbXgFfS3JXu/vBUsvdkmcsoXk1K/8hjm0/A5xTVQfa8hPAOcuMGev+fh+TI9jlHO09NGsfaFN5N68w9TjWffxPgSer6qEV+me+nw2XEUnyE8DvAx+qqh8u6b6byRTOPwL+E/A/Z13fMn6xqi5kcifqa5O8aeiCVqN9yPZtwP9YpnuM+/lFajLPsSYu80zyceAQcOsKQ8b0HroR+AfAPwYOMJlmWiveyZGPWma+nw2XkUjyCibBcmtVfWlpf1X9sKr+ui3vBl6R5OwZl7m0pv3t+SDwZSZTBtOOekuegVwB3F1VTy7tGON+bp48PKXYng8uM2ZU+zvJe4C3Au9qgfgSq3gPzUxVPVlVz1XV88B/XaGWUe1jgCTrgF8FPr/SmCH2s+EyAm2+9Cbgwar63RXG/P02jiQXMfm3+8HsqnxJPa9O8pOHl5mcwL1vybBdwLvbVWMXA89MTe0MacX/5Y1tP0/ZBRy++msrcNsyY74KXJZkfZvSuay1zVwmX9z3EeBtVfXsCmNW8x6amSXnA39lhVrGeJupXwa+V1X7luscbD/P8uoBH8s/gF9kMs1xL3BPe1wJvB94fxvzAeB+Jlen3AH8k4Fr/plWy3daXR9v7dM1h8kXuD0MfBeYH8G+fjWTsHjNVNuo9jOT4DsA/C2TOf1rgLOA24GHgD8Gzmxj55l86+rh174PWGiP9w5Y7wKTcxOH38//pY19LbD7SO+hAWv+7+19ei+TwDh3ac1t/UomV3Q+PHTNrf0zh9+/U2MH389+Ql+S1J3TYpKk7gwXSVJ3hoskqTvDRZLUneEiSerupPqyMGkMkqzqPlrVPpdwosdLQ/BSZKmzJKv6o6qqwx/WPKHjpSE4LSadGK8DXrHCY/MA46WZclpMOjGeqx/fcv5Fkjw3wHhppjxykSR1Z7hIkrozXCRJ3RkukqTuDBdJUneGiySpO8NFktSd4SJJ6s5wkSR1Z7hIkrozXCRJ3RkukqTuDBdJUnd+n4vUmd/nInnLfelE2DSy8dLMeeQiSerOcy6SpO4MF0lSd4aLJKk7w0WS1J3hIknqznCRJHX3/wFKs/AiOmGx0gAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.hist(data_dict.values())\n", "#plt.yscale('log')\n", "#plt.xscale('log')\n", "plt.xlabel(u'发帖数', fontsize = 20)\n", "plt.ylabel(u'人数', fontsize = 20)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T07:59:53.302817Z", "start_time": "2019-06-08T07:59:52.510526Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAEXCAYAAABsyHmSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvDW2N/gAAEDhJREFUeJzt3V+IpFV+xvHn6TEGeiF9seNNZqa7XEYkk71ZKJSQGy8SMu4yGowkDk0uFmPhgrlXOrC56YvcLbImUsFhEiwUkRBGM8FciTdeWBMIGRHJrEyPPYF1NoYObF+I7i8Xbw3T3dt/6j1dp963Tn0/UBTvef/Uz94tH8976j3HESEAAFIsNF0AAGB2ESIAgGSECAAgGSECAEhGiAAAkhEiAIBkhAgAIBkhAgBIRogAAJIRIgCAZPc1XUBuJ0+ejE6n03QZADBTrl279ouIeOCo44oPkU6no+Fw2HQZADBTbG+Mcxy3swAAyQgRAEAyQgQAkGymQsT279h+1fbbtn/UdD0AMO8aDxHbl2x/Yfv6nvbztj+1fcP2i5IUEZ9ExPOS/lTS72crajCQOh1pYaF6HwyyfRQAzLLGQ0TSZUnndzbYPiHpFUmPSzon6aLtc6N9T0j6F0lXs1QzGEi9nrSxIUVU770eQQIA+2g8RCLiA0lf7ml+RNKNiPgsIr6S9KakJ0fHX4mIxyWtZilobU3a3t7dtr1dtQMAdmnrcyKnJH2+Y3tT0qO2H5P0lKTf1CE9Eds9ST1JWl5ervfJt27VaweAOdbWENlXRLwv6f0xjutL6ktSt9uNWh+yvFzdwtqvHQCwS+O3sw5wW9KZHdunR235ra9Li4u72xYXq3YAwC5tDZGPJD1k+0Hb90t6RtKVOhewfcF2f2trq94nr65K/b60siLZ1Xu/X7UDAHZxRL27PRMvwH5D0mOSTkr6uaQfR8Rrtr8v6SeSTki6FBFJXYFutxvMnQUA9di+FhHdo45rfEwkIi4e0H5VuX7GCwCYiLbezgIAzIBiQyR5TAQAMLZiQyQi3omI3tLSUtOlAECxig0RAEB+hAgAIFmxIcKYCADkV2yIMCYCAPkVGyIAgPwIEQBAsmJDhDERAMiv2BBhTAQA8is2RAAA+REiAIBkhAgAIFmxIcLAOgDkV2yIMLAOAPkVGyIAgPwIEQBAMkIEAJCMEAEAJCNEAADJig0RfuILAPkVGyL8xBcA8is2RAAA+REiAIBkhAgAIBkhAgBIRogAAJIRIgCAZMWGCM+JAEB+xYYIz4kAQH7FhggAID9CBACQjBABACQjRJBmMJA6HWlhoXofDJquCEAD7mu6AMygwUDq9aTt7Wp7Y6PalqTV1ebqAjB19ERQ39ravQC5a3u7agcwVwgR1HfrVr12AMUiRFDf8nK9dgDFIkRQ3/q6tLi4u21xsWoHMFeKDRGmPclodVXq96WVFcmu3vt9BtWBOeSIaLqGrLrdbgyHw6bLAICZYvtaRHSPOq7YnggAID9CBACQjBABACQjRAAAyQgRAEAyQgQAkIwQAQAkI0QAAMkIEQBAMkIEAJCMEAEAJCNEAADJCBEAQDJCBACQ7L6mC6jL9h9L+oGk35L0WkT8W8MlAcDcakVPxPYl21/Yvr6n/bztT23fsP2iJEXEP0fEc5Kel/RnTdQLAKi0IkQkXZZ0fmeD7ROSXpH0uKRzki7aPrfjkL8a7QcANKQVIRIRH0j6ck/zI5JuRMRnEfGVpDclPenK30j614j492nXCgC4pxUhcoBTkj7fsb05avtLSX8g6Wnbz+93ou2e7aHt4Z07d/JXCgBzauYG1iPiZUkvH3FMX1JfqtZYn0ZdADCP2twTuS3pzI7t06M2oJ7BQOp0pIWF6n0waLoioBhtDpGPJD1k+0Hb90t6RtKVcU+2fcF2f2trK1uBmAGDgdTrSRsbUkT13usRJMCEtCJEbL8h6UNJD9vetP1sRHwt6QVJ70n6RNJbEfHxuNeMiHciore0tJSnaMyGtTVpe3t32/Z21Q7g2FoxJhIRFw9ovyrp6pTLQUlu3arXDqCWVvREgGyWl+u1A6il2BBhTASSpPV1aXFxd9viYtUO4NiKDRHGRCBJWl2V+n1pZUWyq/d+v2oHcGytGBMBslpdJTSATIrtiXA7CwDyKzZEuJ0FAPkVGyIAgPwIEQBAMkIEAJCs2BBhYB0A8is2RBhYB4D8ig0RAEB+hAgAIBkhAuTCYliYA8VOe2L7gqQLZ8+ebboUzKO7i2HdXcvk7mJYElOwoCiOKHsJ8m63G8PhsOkyMG86nSo49lpZkW7enHY1QG22r0VE96jjuJ0F5MBiWJgThAiQA4thYU4cGiK2T9n+5hivX9n+2vZvT+sfCGgFFsPCnBhnYN2SOpK+Sbi+JdF/x/y5O3i+tlbdwlpergKEQXUUZpwQCUmbEfGrlA+wnXLasfHrLDSOxbAwB4odE2HaEwDIr9gQAQDkR4gAAJIRIgCAZIQIACDZuHNnLdtO+nWWql93AQAKNE5PxJJ+Julm4quZ3/gCpWOWYLTAoT2RiLgtbnkB7cMswWiJYgOCNdZRtLW1ewFy1/Z21Q5MUbFzZ/GwIYrGLMFoCebOAmbR8vL+65UwSzCmrNi5s4Cira/vHhORmCUYjSh2TAQo2uqq1O9XKyXa1Xu/z6A6pq7YNdaB4jFLMFqAnggAnjlBMnoiwLzjmRMcAz0RYN7xzAmOgbmzgHnHMyc4hnGfE/mZmAMLKBPPnOAYDr2dFRG3I2IhIk6M3lNf/z2tf6C7mPYEGNP6evWMyU48c4IxFTsmwrQnwJh45gTHwK+zAPDMCZIV2xMBAORHiAAAkhEiAIBkhAgAIBkhAgBIRogAAJIRIgCAZIQIACAZIQIASEaIAACSESIAgGSECIDjS11eN+U8lvJtFSZgBHA8qcvrppzHUr6t44jZWXjQ9nckrUlaioinxzmn2+3GcDjMWxgwzzqd/Re1WlmRbt6c7Hmpn4XabF+LiO5RxzV+O8v2Jdtf2L6+p/287U9t37D9oiRFxGcR8WwzlQLYV+ryuinnsZRv6zQeIpIuSzq/s8H2CUmvSHpc0jlJF22fm35pAI500DK6Ry2vm3Je6mchm8ZDJCI+kPTlnuZHJN0Y9Ty+kvSmpCenXhyAo6Uur5tyHkv5tk7jIXKAU5I+37G9KemU7W/bflXS92y/dNDJtnu2h7aHd+7cyV0rMN9Sl9dNOY+lfFunFQPrtjuS3o2I7462n5Z0PiL+YrT955IejYgX6l6bgXUAqG9mBtYPcFvSmR3bp0dtAIAWaWuIfCTpIdsP2r5f0jOSrtS5gO0LtvtbW1tZCgQAtCBEbL8h6UNJD9vetP1sRHwt6QVJ70n6RNJbEfFxnetGxDsR0VtaWpp80QAASS14Yj0iLh7QflXS1SmXAwCoofGeCABgdhUbIoyJAEB+xYYIYyIAkF+xIQIAyK/YEOF2FgDkV2yIcDsLAPIrNkQAAPkRIgCAZIQIACBZsSHCwDoA5FdsiDCwDgD5FRsiAKDBQOp0pIWF6n0waLqi4jQ+ASMAZDEYSL2etL1dbW9sVNsSKyFOED0RAGVaW7sXIHdtb1ftmJhiQ4SBdWDO3bpVrx1Jig0RBtaBObe8XK8dSYoNEQBzbn1dWlzc3ba4WLVjYggRAGVaXZX6fWllRbKr936fQfUJ49dZAMq1ukpoZEZPBACQrNgQ4ddZAJBfsSHCr7MAIL9iQwQAkB8hAgBIRogAAJIRIgCAZIQIACAZIQIASEaIAACSFRsiPGwIAPkVGyI8bAgA+RUbIgCA/AgRAEAyQgQAkIwQAQAkI0QAAMkIEQBAMkIEAAYDqdORFhaq98Eg7Zi2mGKtrLEOYL4NBlKvJ21vV9sbG9W2dG999nGOaYsp1+qImPhF26Tb7cZwOGy6DABt1elU/6Lda2VFunlz/GPaYkK12r4WEd2jjiv2dhbTngAYy61bR7ePc0xbTLnWYkOEaU8AjGV5+ej2cY5piynXWmyIAMBY1telxcXdbYuLVXudY9piyrUSIgDm2+qq1O9XYwZ29d7v7x6EHueYtphyrQysAwB+zdwPrAMA8iNEAADJCBEAQDJCBACQjBABACQjRAAAyQgRAEAyQgQAkIwQAQAkI0QAAMkIEQBAspla2dD2tyT9raSvJL0fES1enxIAytd4T8T2Jdtf2L6+p/287U9t37D94qj5KUlvR8Rzkp6YerEAcJCj1jU/bH+OfVPShp7IZUk/lfSPdxtsn5D0iqQ/lLQp6SPbVySdlvSfo8O+mW6ZAHCAo9Y1P2y/NPl9U5yivhVTwdvuSHo3Ir472v49SX8dEX802n5pdOimpP+NiHdtvxkRzxx1baaCB5DdUeuaH7Zfmvy+Caz7Pu5U8G3oieznlKTPd2xvSnpU0suSfmr7B5LeOehk2z1JPUlabuPylQDKctS65inrnufYl0HjYyJ1RMQvI+KHEfGjwwbVI6IfEd2I6D7wwAPTLBHAPDpqXfPD9ufYN0VtDZHbks7s2D49agOA9jlqXfPD9ufYN00R0fhLUkfS9R3b90n6TNKDku6X9B+SfrfmNS9I6p89ezYAILvXX49YWYmwq/fXXx9/f459xyRpGGP8u7bxgXXbb0h6TNJJST+X9OOIeM329yX9RNIJSZciIileGVgHgPpmZmA9Ii4e0H5V0tUplwMAqKGtYyIAgBlQbIjYvmC7v7W11XQpAFCsYkMkIt6JiN7S0lLTpQBAsYoNEQBAfo0PrOdi+4Kqn/n+n+3/2rFrSdK497hOSvrFpGsrTJ2/Z9OaqjX3507y+se9Vur5KefxXZ6svX/PlbHOGud3wCW9JPVrHDvW76Tn+VXn79n0q6lac3/uJK9/3Gulnp9yHt/lyb5S/7ebx9tZB865hSSz9PdsqtbcnzvJ6x/3Wqnnp5w3S//fmwVJf8/GHzZsM9vDGONhGwDtxnc5n3nsidTRb7oAABPBdzkTeiIAgGT0RAAAyQgRAEAyQgQAkIwQqcH2t2z/g+2/t73adD0A6rP9Hduv2X676VpKMPchYvuS7S9sX9/Tft72p7Zv2H5x1PyUpLcj4jlJT0y9WAD7qvM9jojPIuLZZiotz9yHiKTLks7vbLB9QtIrkh6XdE7SRdvnVC3T+/nosG+mWCOAw13W+N9jTNDch0hEfCDpyz3Nj0i6Mfovlq8kvSnpSUmbqoJE4m8HtEbN7zEmiH8R7u+U7vU4pCo8Tkn6J0l/YvvvxJQLQNvt+z22/W3br0r6nu2XmimtHMXO4ptDRPxS0g+brgNAuoj4H0nPN11HKeiJ7O+2pDM7tk+P2gDMDr7HU0CI7O8jSQ/ZftD2/ZKekXSl4ZoA1MP3eArmPkRsvyHpQ0kP2960/WxEfC3pBUnvSfpE0lsR8XGTdQI4GN/j5jABIwAg2dz3RAAA6QgRAEAyQgQAkIwQAQAkI0QAAMkIEQBAMqY9ARLYPn30UVJEbE7jeKApPCcCJLA91hcnIjyN44GmcDsLSHdW0m8c8Oo0cDwwddzOAtJ9M5pa49fY3m/RstzHA1NHTwQAkIwQAQAkI0QAAMkIEQBAMkIEAJCMEAEAJCNEAADJCBEAQDJCBACQjBABACQjRAAAyQgRAEAyQgQAkIz1RIAErCcCVJgKHkhzpmXHA42gJwIASMaYCAAgGSECAEhGiAAAkhEiAIBkhAgAIBkhAgBI9v8o6Btow2NUUwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tweet_dict = defaultdict(int)\n", "for i in data_dict.values():\n", " tweet_dict[i] += 1\n", " \n", "plt.loglog(tweet_dict.keys(), tweet_dict.values(), 'ro')#linewidth=2) \n", "plt.xlabel(u'推特数', fontsize=20)\n", "plt.ylabel(u'人数', fontsize=20 )\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:00:01.767550Z", "start_time": "2019-06-08T08:00:00.760854Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import numpy as np\n", "import statsmodels.api as sm\n", "\n", "def powerPlot(d_value, d_freq, color, marker):\n", " d_freq = [i + 1 for i in d_freq]\n", " d_prob = [float(i)/sum(d_freq) for i in d_freq]\n", " #d_rank = ss.rankdata(d_value).astype(int)\n", " x = np.log(d_value)\n", " y = np.log(d_prob)\n", " xx = sm.add_constant(x, prepend=True)\n", " res = sm.OLS(y,xx).fit()\n", " constant,beta = res.params\n", " r2 = res.rsquared\n", " plt.plot(d_value, d_prob, linestyle = '',\\\n", " color = color, marker = marker)\n", " plt.plot(d_value, np.exp(constant+x*beta),\"red\")\n", " plt.xscale('log'); plt.yscale('log')\n", " plt.text(max(d_value)/2,max(d_prob)/10,\n", " r'$\\beta$ = ' + str(round(beta,2)) +'\\n' + r'$R^2$ = ' + str(round(r2, 2)), fontsize = 20)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T11:13:27.176717Z", "start_time": "2018-04-28T11:13:26.420464Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEZCAYAAABb3GilAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3XmcjXX7wPHPZcZkyJJMtFiSQZJ1HjxRGIosUdFTeSo/0YbwaKEFpZKKEqX0FKVSJFlSVFoUPUVaSGWdGkaWoqzTmOv3x/eMjlnPnDkz95k51/v1ul/m3Mv3vk4v5uq7i6pijDHGhFIprwMwxhhT8lhyMcYYE3KWXIwxxoScJRdjjDEhZ8nFGGNMyFlyMcYYE3KWXIwxxoScJRdjjDEhZ8nFGGNMyEV7HYBXqlSporVq1fI6DGOMKVZWr169W1Xj8rov7JOLiMQC1VX1p1zuqQ+MBmKAh1R1dV7l1qpVi1WrVoUuUGOMiQAikhTIfWHbLCYiFUTkLeBX4A6/81eIyBYR2Sgi/XynhwG3AjcAg4s+WmOMMf7CueaSDkwGFgGtAESkPDDB9/ko8LWILATiVHWX754K3oRrjDEmQ9jWXFR1v6p+AKT5ne4EfKyq21R1B7AM6AD8KSKVReQkYFdOZYrIDSKySkRW7dqV423GGGMKKJxrLtmpDvi39yUDpwKTgGnAYWBsTg+r6jTffSQkJNheA8YYU0iKW3KJwTWXZUgHjqrqV0Avb0IyxhiTWdg2i+UgBTjd7/MZwC/5KUBEuovItH379gUZQQq0bQs7dgT3vDHGRIDillyWAJ1E5BQRqQacByzNTwGqulBVb6hYsWJwEYwdC59+6v40xhiTrbBNLiJSXkQ2AuOB3r6fGwB3AyuBz4DhqnqgyIJKSYHp0yE93f1ptRdjjMlW2Pa5qOqfQJ0cLs8owlD+NnasSywAR4+6z0895UkoxhgTzsK25lJYgu5zyai1pKa6z6mpVnsxxpgcRFxyCbrPxb/WkiGj9mKMMeY4EZdcgrZy5d+1lgypqbBihTfxGGNMGAvbPpews2aN1xEYY0yxEXE1lwLPczHGGJOniEsuBZ7nYowxJk8Rl1yMMcYUPksuxhhjQi7ikov1uRhjTOGLuORifS7GGFP4Ii65GGOMKXyWXIwxxoScJRdjjDEhZ8nFGGNMyEVccrHRYsYYU/giLrnYaDFjjCl8EZdcjDHGFD5LLsYYY0LOkkt+paVl3TTMGGPMcSy55NcTT0CLFrZJmDHG5CLikkuBR4vVqgU7dkDr1tCnDyQnhzQ+Y4wpCSIuuRR4tFivXvDDD3DPPTB3LtSrBw88AIcOhTZQY4wpxiIuuYTEiSfC2LGwfj1cfDHcey80aOCSjarX0RljjOcsuRTEmWfCG2/ABx+4hNOrF3ToAN9+63VkxhjjKUsuoZCYCGvWwFNPwTffQNOmMHAg7NnjdWTGGOMJSy6hEh0Nt9wCGza4P599FuLjYcoUN3zZGGMiiCWXUKtcGSZPhq+/hmbNYPBgaNIE3n/f68iMMabIWHIpLA0bwnvvwbx5cPAgXHghXHopbN7sdWTGGFPoLLkUJhHo2RO+/x4eesglm7PPhrvugv37vY7OmGLlyJEjjBs3jnPPPZfY2FhiY2Np0qQJzz//vNeh5WrLli0MHTqUs88+m7Jly1KuXDlatmzJSy+9FFR5gwYNQkQQkePOb9269dj53I7OnTuH4mvlSTTChs6KSHege506dQZs2LChaF++bRuMHAkzZ8Kpp8L48W4iZinL8cbkJjU1lQsvvJBPPvmEJk2a0K5dOw4dOsSsWbP4448/ePfdd+nUqZPXYWbx/fff07x5c9LS0rjggguoXbs227Zt47333iMtLY2RI0fy0EMPBVzesmXL6NixI9HR0fz111/4//7+7bffGDVqVI7PfvDBB/zwww/897//5frrrw/6O4nIalVNyPNGVY3Io3nz5uqZlStV//EPVVBt1Ur1f//zLhZjioFHHnlEAb3xxhs1PT392PlXXnlFAb3nnns8jC5ny5cv1x49emhSUtJx57/88kuNiYnRqKgo/eWXXwIqa9++fVqjRg3t2LGj1qxZU92v78CkpqbqqaeeqpUqVdIDBw7k6ztkBqzSAH7H2v8ye6FVK/j8c5gxA7ZuhZYtoW9fSEnxODBjwtMzzzxD2bJlmTBhwnHNQdHR0QCcfPLJXoWWq2bNmjFv3jxq1Khx3PmEhAQ6d+7M0aNH+fzzzwMqa+jQoezatYunnnoq33HMnTuXlJQU+vbtS9myZfP9fDAsuXilVCm47jr46Se4806YNQvq1nVNZUeOeB2dMWEjKSmJzZs306FDB8qVK3fctdmzZwOQmJjoRWh5Klu2bJa+kQwnnXQSACeccEKe5bz99ttMnz6dBx98kLp16+Y7jilTpiAi3HTTTfl+NliWXLxWvjw8/DCsW+cmY44YAeecAwsW2FIyxgCrVq0CoGXLlsfOqSqTJk1i7ty5dOzYkUaNGnkVXtBWrVqFiNC4ceNc7/vtt98YMGAA559/PkOGDMn3e7755hs+++wzEhMTqVevXrDh5psll3BRpw7Mnw/vvgsxMdCjB3Tu7EaaGRPBVq9eDUDz5s358MMPGTBgAGeffTZDhw6lcePGvPzyywGVM2bMmIBGU40ZM6YQv42zaNEi1q1bR9euXbM0mWV2yy238McffzB9+nRKBTH4Z8qUKcfKKUrRRfo2k7dOndwSMk8/DaNHQ6NGMGiQ+9lXjTYmkmQkl4SEBAYNGsTrr79+7Fr9+vU5evRoQOW0aNGCgQMHBnRfYdq6dSv9+vWjfPnyPP7447neO2fOHF5//XUmT57MWWedle93/f7777z66qucfvrpXHLJJcGGHJxAev1L4uHpaLFA7dypetNNqqVKqZ58surUqappaV5HZUyRqlKlitasWVNVVdPS0nT37t360Ucf6eWXX66AnnvuuZ7E9fbbb+vAgQOzHD/99FOOz2zfvl3j4+M1OjpaFy1alGv5O3bs0JNPPlnbt29/3Ag5VQ14tNhjjz2mgI4ZMyawLxUAAhwtZjWXcBYXB1Onwo03wpAhcPPN8MwzMGkStG3rdXTGFLqkpCR2795NW9/f96ioKE4++WTatm1L27ZtadKkCd988w2bN2+mdu3aRRrbF198ke3IrV69ehEfH5/l/Pbt2+nQoQObN2/mxRdfpGvXrrmWf8stt7Bnzx5q1qzJfffdd9y1vXv3AhxrwsuuKU9VmTp1KtHR0QwYMCDAbxU6ETeJMkNCQoJmdBQWC6puef/bboOff4beveHRR6FmTa8jM6bQvPnmm1x++eWMGzeOESNGZLnevn17PvroI3bu3ElcXFyuZS1evJjFixfn+c4uXbrQpUuXoGPOTnJyMomJiWzZsoWZM2dy5ZVX5vlMrVq1SEpKCqj87H6Pv/3223Tr1o1evXoxZ86cfMeck0AnUVrNpbgQcQmla1d47DE3wmzhQrjjDjeUuYjGrhtTlPw78zP77bff+Oyzzzj33HPzTCyQc00jsypVqoQ0uSQlJZGYmMi2bduYO3duwH0fW7duzfFaRuLJrXLgVUf+MYG0nZXEo1j0ueTm559Vr7xSFVTPOEN11izVTO2yxhR3nTp1UkCvv/764/odjhw5or169VJAZ8yY4WGEudu0aZPWrFlTy5Ytq0uXLs313rS0NE1OTg6o3Lz6XDZs2KAiovXr189XvIEgwD6XiGsW83RtscKwfLnrj1mzBtq0cf0xzZp5HZUxIREXF8fu3bsBV3tJTEzkzz//ZOnSpWzevJm+ffsyffp0j6PMWfXq1UlOTqZly5YkJGTfkjRkyBDi4+Pp3r07ixYtYuLEiQwbNizXcvOquQwbNownnniCSZMmceuttxb4e/iztcVKes3FX1qa6nPPqcbFqYqo9u+v+uuvXkdlTIEkJSUpoBdeeKH27t1bTz75ZI2KitLKlStrx44ddfbs2V6HmCcgz+PDDz9UVdWbb75Zy5Urp6+88kqe5eZWczlw4IBWqlRJy5Ytq3v37g3l11FVq7nkqdh16Adi714YOxaefNL1wYwe7ebIxMR4HZkx+TZv3jwuu+wyHnnkEW6//XavwzE+gdZcbIZ+SVKpEkyYAN99B+edB8OHu0mY77zjdWTG5FtGZ34za+Ytliy5lET167uEsmgRpKdDly7QrZtbJNOYYiIjuTRt2tTjSEwwLLmUZF27wtq1bujyJ5+4rZdvuw327fM6MmPytHr1amrWrEnlypW9DsUEwZJLSRcT45rHNmyAa66BiRPd0v4vvOBqNcaEqZ07d+Y618OEN0sukaJqVXj+efjiC7cC8/XXQ4sWsGKF15EZY0ogSy6RJiEBPv0UXnkFduyA1q2hTx9ITvY6MmNMCWLJJRKJwNVXw48/wj33wNy5UK8ePPAAHDrkdXTGmBLAkkskK1fOzYtZvx4uvhjuvRcaNHDJJkLnP5mSY9y4cfzjH/+gQoUKxMXF0b17d9auXet1WBHDkouBM890Ky4vW+a2Xe7VCzp0gG+/9ToyY4L20Ucfccstt7BixQqWLVtGdHQ0HTt25LfffvM6tIhgycX8rX17+OortwvmN99A06YwcCDs2eN1ZMbk25IlS/i///s/GjZsyLnnnsvMmTPZtWsXn332madxLV26lMTERCpWrEj58uVp164d7733XkDPpqWlMXHiRJo0aUJsbCwVK1akU6dOrFy5Mtv7t2zZwtChQzn77LMpW7Ys5cqVo2XLlrz00kuh/ErZC2SNmJJ4lKi1xQrDnj2qgwerRkWpnnSS6pNPqv71l9dRmQjVsWPHLGtyxcXFaZs2bQJeY2z79u0K6PLlyws52pw9//zzCmiFChW0T58+eumll2rp0qVVRPSNN97I9dkjR45ohw4dFNAaNWrov//9b+3YsaNGRUVpdHS0vvPOO8fdv27dOi1TpoxGR0drYmKi9u/fXy+++GKNjo5WQEeOHBnUdyDAtcU8/yXv1WHJJUDffafaoYP7q3LOOarvved1RCYCVa5cWUVER40apaNHj9a7775bL7/8co2KilJAJ06cmGcZvXv31iZNmmiaR1uFJycna2xsrFapUkW3bNly7PySJUtURLRatWp68ODBHJ8fO3asAtq5c2c9cODAcc9HRUXpGWecoYcPHz52fvny5dqjRw9NSko6rpwvv/xSY2JiNCoqSn/55Zd8fw9LLpZcQic9XXXePNXatd1fmR49VDdu9DoqEyE2btyogNarVy/LtalTpyqgNWvWzLWMYcOG6amnnqqbNm0qpCjzNmrUKAX0sccey3Kta9euCujcuXNzfL5u3boK6I8//pjlWr9+/bI8f+DAgeP2wPF3ySWXKKBz5szJ9/cINLmUmD4XESktIh1EpJPXsZQ4ItCzJ6xbBw89BO+/70aV3XUX7N/vdXSmhMtYvTy73Sg7d+4MuNn8ORk2bBizZs1i2bJl1K5du3CCDMA7vgVke/bsmeVaxs6XH3/8cY7Pb926laioKOLj47Nca9OmDQCffvrpsXNly5ZFRLIt66STTgLghBNOCDD6/Aur5CIisSJSN8jHzwWaAxeHMCTjr0wZGDnSLYD5r3/BuHFuKZmZM20pGVNoMpJLdpttbdy4EYCzzz4722eHDBnCq6++yrJly6hfv37hBZkHVWXdunXExsZy1llnZbneoEEDAHLbwLBSpUocPXqUzZs3Z7n2xx9/AK4DPxCrVq1CRGjcuHFA9wcjLJKLiFQQkbeAX4E7/M5fISJbRGSjiPTLrQxV/QqYXcihGoDTToOXXoKVK6F6dbj2WrfE/xdfeB2ZKYFyqrns2bOH2267DYARI0ZkeW7gwIFMnz6dWbNmcdJJJ7Fjxw527NjB/gBr22PGjEFE8jzGjBmTZ1l79+7l4MGDVKtWLdvrp5xyCkCuw6QzajfDhw/n8OHDx87//PPPPProowD8+eefecayaNEi1q1bR9euXalRo0ae9wcrutBKzp90YDKwCGgFICLlgQm+z0eBr0VkIRAFvOb37IOqGtg4PhNarVq5BDNzJowYAS1bwnXXuRrNqad6HZ0pAVSVNWvWALBgwQKWLVvG0aNHSUpKYsGCBaSnp/P000/Tu3fvLM8+/fTTAHTo0OG486NHjw4oIbRo0YKBAwcGdF9eDhw4AECZMmWyvZ5x3j9pZPbggw/y/vvvM3/+fOrXr0+bNm04dOgQS5YsoXnz5mzbto3Y2Nhc49i6dSv9+vWjfPnyPP7443nGXSCBdMwU1QH0Bf7r+7kX8LLftVeBK/N4vhbwRCDvsg79EPvjD9U771SNiVE98UTVhx9W9Ru5Ykwwfvjhhxy3By5XrlyW4bfhKiUlRQGNj4/P9vr69esV0H/+8595ltO/f3+tVq2axsTEaJ06dXT8+PG6ePFiBfTaa6/N8dnt27drfHy8RkdH66JFi4L+LpSADv3qQJLf52Qgx/8dFpHGwINAooj8J4d7bhCRVSKyateuXSENNuKVLw8PP+w6/RMTXU3mnHNgwQJbSsYELaNJrF+/fsd+ae3Zs4eJEydy4MABrrrqKvbu3etxlHmrVKkSIsKeHCYk7969G4CqVavmWk61atV47rnnSElJ4ciRI2zYsIE77rjjWN9TRt9NZtu3bycxMZHNmzczY8YMunbtWoBvE5hwaRbLTgyuuSxDOq55LFuq+g3QJ7cCVXUaMA0gISHBfuMVhjp1YP58WLoUhg6FHj3goovg8cfdCDNj8iG7/pbKlSszbNgwVq5cyZw5c5g5cyaDBw8O+bsXL17M4sWL87yvS5cux/pDclKmTBlq1KhBUlISO3fuPNbHkmH9+vUAnHvuuUHFunTpUgAuuOCCLNeSk5NJTExky5YtvPzyy1x55ZVBvSPfAqneFNXB8c1i1wIv+F17Gbg0VO+yZrEikJqqOmmSaqVKbqb/rbeq/vab11GZYqRNmzYK6Oeff57l2pIlSwJqSgrW6NGjc2yS8z9Gjx4dUHl9+/ZVQJ9//vks13r06KGArly5Mt9xfvfddxoVFaV169bNcm3r1q1au3ZtPeGEE3T+/Pn5Ljs7FMdJlJmSS1VgG3AKUA3YDJQLwTu6A9Pq1KkTkv/QJgA7d6redJNqqVKqJ5+sOnWqqkezpE3xcfToUT3xxBM1OjpaDx06lOV6amqqVqpUSUVEt23b5kGE+bNixYpjEz537Nhx7PzSpUu1VKlSWZJkWlqaJicnH/u8f/9+3b1793H3bNy4UePj4xXQhQsXHndt06ZNWrNmTS1btqwuXbo0ZN+jWCUXoDywETcUeZ/v5/a+ZLPJd4Ss1qJWc/HGmjWqF1zg/to1bqz60UdeR2TC2Nq1axXQxo0b53jP1VdfrYBOmTKlCCML3qBBgxTQKlWq6HXXXac9e/bU0qVLa+XKlXX9+vXH3dutW7fjlrZZv369xsbGaqdOnbR///7apUsXjYmJURHRRx55JMu7zjjjDAW0ZcuWOnDgwGyPn376Kd/foVglFy8OSy4eSU9XnT1btUYN99evd2/VrVu9jsqEoRdffFEB7devX473zJkzRwFt3759EUYWvPT0dH3qqaf0nHPO0RNOOEFPOeUU/fe//61bs/k3cPPNN2u5cuX0lVdeUVXVXbt26SWXXKJVq1bV0qVLa9WqVfWyyy7TTz/9NNt3BdKk9+GHH+b7OwSaXMTdGzlEpDvQvU6dOgNymw1rCtmhQ/Doo26EmSrccQfceSeULet1ZMaYXIjIalXNulxCJuE8FLlQqOpCVb2hYsWKXocS2WJjYdQot9Vyz55w//1uq+XXXrOhy8aUABGXXEyYqV4dZs2CTz6BuDi46iq44AK3aZkxptiy5GLCw/nnw5dfwnPPudpMQgIMGAC5rHZrjAlfEZdcRKS7iEzbt2+f16GYzKKioH9/t+rysGEwYwbEx8PEiZCa6nV0xph8iLjkYn0uxUClSjBhAnz3nVttefhwaNQIfPthGGPCX8QlF1MIUlKgbVvYsSO05dav7xLKokVuv5guXaBbN1ezMcaENUsupuDGjoVPP3V/FoauXWHtWnjsMdfx37Ah3HYbWNOmMWHLkospmJQUmD7d1SymTw997SVDTIxrHtuwAa65xvXD1K0LL7xgu2AaE4YiLrlYh36IjR379y/3o0cLr/aSoWpVeP55t+tlnTpw/fXQogWsWFG47zXG5EvEJRfr0A+hjFpLxkiu1NTCrb34S0hwTXGvvOLe17o19OkDycmF/25jTJ4iLrmYEPKvtWQoitpLBhG4+mo3L+aee2DuXDfL/4EH3PIyxhjP5JlcROR0Ebkkj3vai0g5EakgIi+ISP3QhWjC1sqVWeefpKYWfRNVuXIuoa1fDxdfDPfe6zYmmzvXlpIxxiOB1FzaAfMAROQSEYn3vygidYGFwGNALG6Z/NNCGqUJT2vWuF/emY81a7yJ58wz4Y034IMP3LbLvXpBhw7w7bfexGNMBAu4WUxEBJgAvC4iMX6X/gv8BtwV4tgKhXXoR4DERLc22dNPwzffQNOmMHAg5LB/uTEm9AJOLr51/HsB8cDjACLSD2gNXKOqvxdKhCFmHfoRIjoabr7ZDV2+5RZ49lm3lMyUKZCW5nV0xpR4+erQV9VvgNuA/xORs4BxuG2JPy6M4IwpsMqVYfJk+PpraNYMBg+GJk3g/fe9jsyYEi3fo8VU9Vngn6q6CbgeGCEiD4rITSGPzphQadgQ3nsP5s2Dgwfhwgvh0kth0yavIzOmRMpXchGRKiLSzFeDAVgNvAXcDtgWgia8ibiNyb7/Hh56yCWbBg3grrtg/36vozOmRMlvzWUosEREGohIdeAb4Fxgn6pODHl0xhSGMmVg5Ei3AOa//gXjxrmlZGbOtKVkjAmRQJJLFb+fnwR2Akt8z44GugAVMj1jkwtM+DvtNHjpJTdf54wz4Npr3Uz/L77wOjJjir0ck4uInCIi04GJvs9nq+pO4F/A6cA7wKtAClBaRO4B1uISy5sistPv+LWwv0igbCiyyaJVK/j8c7c52dat0LIl9O3rlrcxxgQlt5pLa+AaXNOXACtFZABuTks6ro9lDnDYd/8eYJnv3v8Bb/sdiwsj+GDYUGSTrVKl4LrrXFPZnXfCrFmuqWz8eDhyxOvojCl2ckwuqjoPaIybOKm4SZKTgUtwCaSb7/pdvusrgCG+xx9R1f/zPwrvKxgTQuXLw8MPw7p1bjLmiBFwzjmwYIEtJWNMPuTa56Kq6/x+fhq4Et8ESmAXcBlwo+9zVayvxRQHgeycWacOzJ8P777r9pLp0QM6d3YjzYwxecrvJMq3gOG+j6ep6me4mosAtUIbmjGFJD87Z3bq5JaQeeIJ+N//oFEjGDoUfi8WC1IY45lgltyfCjQAvhWRj3B9MMOABSGMy5jCEczOmaVLw5AhbimZAQPcjP/4eHjmGbfFgDEmi2CSy3nA3bgmsI3AQ8DzqloEO0QZU0AF2TkzLg6mToXVq10/zM03Q/Pm8LGtfmRMZvmdoX8mMB9ohpvbcjdu1Nh9oQ/NmBAL1c6ZTZrARx/B7NmueaxdO7jiCkhKCnXExhRb+VlyPwp4E/gTuFBV96rqr8DDwKDiskGYzXOJYKHcOVMEevd2G5Tddx8sWgT168Po0W7tMmMiXKDJRVT1KPAG0EVV/WeXTcLNcbnd71zYjhqzeS4RrDB2zixbFkaNclst9+wJ99/vtlp+7TUbumwimmgu/wBEJBrXBFZeVT/I5noj4HcgDjc7vzTwMvAP4AvcBMznVHV76EMvmISEBF21apXXYZiSZvly1/m/Zg20aQOTJrml/o0pIURktaom5HVfXjWXs4CV2SUWn0m4jcK+UtVUVT0AjAdeAH4FrgU+zEfcxhRv558PX34Jzz3najMJCW6E2c6dXkdmTJEKZrRYrlT1c1Udpao3A72BOqF+hzFhLSoK+vd3S8kMG+bWLIuPh4kTszbLGVNCRQdwj4jI17j1xFKBQ8Be3OrIZwK1ReQkYASuWczfP3DzYIyJPJUqwYQJruYybBgMHw7TpsHjj8PFF3sdnTGFKtCay+vAXNxS+6uA3UA1oBLQD5dobgMGAY2As33HNuCq0IZsTDFTvz68844bUZaeDl26QLdurmZjTAkVSM1FVXVcdhdE5EPcgpVf45rAegK1gXtU9dWQRWlMSdC1q9te+ckn3aiyhg3h1lvh3nvBRi+aEiaQmouIyEUikiIi34rIhyIyR0SewSWSA6o6R1WvwK2S/BMwU0Tm+prLjDEZYmLgttvcUjLXXOP6YerWhRdesF0wTYmSV3I5CHyMG1bcH7dx2NvAJlytZw2wNeNmVV2vqp1xi1n2AJaLyOmhD9uYYi49HTZuhMWL3QrM118PLVoUbM6NMWEkr+SyG7eVcSPczPzNuESzGHgJl2ySRaQ2gIiU8e1IeSuuD6Yu8ImIxBVO+MYUUxkrMy9c6P585RW3DE3r1tCnDyQnex2hMQWS1yTKesD3uCX1M2g2nycBS4FngTOAz4BLcTtZTgSWq2rbkEZeQDaJ0ngmJQVq14bDhyE2FjZvhmrV4MABt1HZo4+64cwjR7oRZrGxXkdszDGhmkQJcFBVS6lqKdxS+2Tz+T9AFeAdXIJpBpykqk/gaj43ZluyB2xtMeO5nFZmLlfO/bx+vRuqfO+90KABzJ1rS8mYYieQmsuXqlrB7/P3qhqV3WffuRNwo8d+B9qoalj2UlrNxXjCv9aSwb/24u/DD91SMt99B+3buw3LGjUq2niNySSUNZeyIrJMRJbh+lnE7/OL/p99594BYoCWuCYxY0yG/KzM3L49fPUVPP202w2zaVMYOBD27CmaWI0pgLzmuewG/Oe4KK5vJUNZ3Iix3ZmeWwdcAtg2fcb4y+/KzNHRblOyf/0LxoxxiWbWLLfM/803u+vGhKFcm8WOu1GkD67mUl5VD4pINdyosS9VNUufioiUVtW/QhptCFmzmCmW1q6FoUPhgw/cbphPPAEdO3odlYkgBW4WE5EYEWmQceBGgQHU931+EjcPZpr/fX73x2f6bIwpqIYN4b33YN48OHTIzfjv2RM2bcr+/pQUaNs2/7ttGlNAOdZcRKQxrsmta82eAAAV+0lEQVTLf+hxTtUcyeF8BvXv9A8HVnMxxd7hw24RzAcfhL/+csOW77oLTjzx73tuuQWefRZuugmeesq7WE2JEWjNJbfkUho4ze/UpcAEYCTQETgfN6nybWAekOuGYKoaVhuMW3IxJcb27TBiBMycCaeeCuPHu4mYv/6a/XwaYwqgwM1iqvqXqiZlHMAuXA1liqpehJvXci/QBPgEmAY09n8m0/PGmMJw2mnw0ktusED16nDttXDeeTBoUPbzaYwpAvnZLGw1cDtwBEBVD6jqm75E0wi3YGVzEaka+jCNMXlq1colmBkzYMsWePPNv0empabC9OnW92KKTMDJRVV/UNUJqppleLFvwcrBqjpaVX8NbYjGmICVKgXXXQfdu7uf/aWlWe3FFJmQb3NsjAkDq1dnnaz5119u0zJbSsYUAUsuxpREa9a4JJJxLFkCZ5/tmss6d4bvv/c6QlPCWXIxJhJcdJFbQmbSJPjiC7dG2ZAh8PvvXkdmSihLLsZEitKl3bbKP/0EAwbAlCkQHw/PPONGkxVHNkk0bFlyMSbSxMXB1KmuX6ZhQ7dGWfPm8PHHXkeWfxmbrtlAhbBjycWYSNWkiVvWf/Zs1zzWrh1ccQUkFZNpaSkpbnh1eroNsw5DYZVcRCRWROoG+WyCiEwWkbdEpFeoYzOmRBKB3r3hhx/g/vth0SKoXx9Gj4aDB72OLnc5bbpmwkJYJBcRqSAibwG/Anf4nb9CRLaIyEYR6ZdHMWtUdTDQD+heiOEaU/LExrqdL3/8ES691CWaevXgtdfCc+hyRq3FJomGrbBILkA6MBn4T8YJESmPW8usje94SETiRKSaiHzkd1wI4De583ZgUtGGb0wJUb06vPoqLF/u+mauugouuMBtWhZO8rPpmvFEWCQXVd2vqh8AaX6nOwEfq+o2Vd0BLAM6qOoOVW3nd7wHICLRIvIosFBVw+xfgjHFTJs28OWX8NxzrjaTkOBGmO3c6XVkTn43XTNFLiySSw6qA/49i8nAqbnc/xDwT+AGEbkhuxtE5AYRWSUiq3bt2hW6SI0piaKioH9/2LABhg1za5bFx8PEiVl/sRe1zJNEM441a7yNyxwTzsklBtdcliGdXLZNVtU7VLWNqvZV1Wk53DNNVRNUNSEuLi7E4RpTQlWsCBMmuF0wW7d2+8Y0auSWkjEmB+GcXFKA0/0+nwH84lEsxph69WDxYnj7bdff0aULdOvmJmUak0k4J5clQCcROUVEqgHnAUsLWqiIdBeRafv27StwgMZEpKZNoWpVN1x5+XI3EfO22yCYf1M2w77ECovkIiLlRWQjMB7o7fu5AXA3sBL4DBiuqgcK+i5VXaiqN1SsWLGgRRkTmcaOdR3nu3a5Wsu117p+mLp14YUXso7iyqssm2FfIuW4zXFJZ9scGxOElJTst05evdqtW7ZihVtK5skn3W6YwZRlwlqBtzk2xpgscpoV37y5q4G8+qpr4mrdGvr0geTk/JdlSoSIq7mISHege506dQZs2LDB63CMKT78axoZsqtxHDgA48fDI4+44cwjR7oRZrGx+S/LhB2rueTA+lyMCVKgs+LLlXPLx/zwgxtRdu+90KABzJ3791IyNsO+xIu45GKMCVJ+Z8XXqgVz5sCyZVC+PPTqBR06wLff2gz7CGDJxRgTmGBnxbdv79Yme/pptxtm06aus3/3bpthX4JFXHKxeS7GeCA62m1KtmEDDBwIzz7rlpKZPBnS0vJ+3hQ7EZdcrM/FGA9VruyGKX/zjRthduutbtOy99/3OjITYhGXXIwxYeCcc2DpUpg3Dw4dggsvhJ49YdMmryMzIWLJxRjjDRGXUNatg3HjXO2lQQO46y7Yv9/r6EwBRVxysT4XY8JMmTIwYoRbSubKK12iqVsXZs7M31IyJqxEXHKxPhdjwtRpp8GLL7phytWruzXLzjsPvvjC68hMECIuuRhjwlyrVi7BzJgBSUnQsiX07etm9Ztiw5KLMSb8lCoF113nmsruvBNmzXJNZePHw5EjXkdnAmDJxRgTvsqXh4cfdp3+iYmub+acc2DBgr+XkjFhyZKLMSb81akD8+fDkiUQEwM9ekDnzvD9915HZnIQccnFRosZU4xddJGbgDlpkuvob9QIhgyB33/3OjKTScQlFxstZkwxV7q0m9n/00/Qvz9MmeKWknnmGbeysgkLEZdcjDElRFycSyirV0PDhm7tsmbN4OOPvY7MYMnFGFPcNWkCH34Is2fD3r3Qrh1ccYUbxmw8Y8nFGFP8iUDv3m6Dsvvug0WLoH59GD0aDh70OrqIZMnFGFNyxMbCqFHw449w6aVuR8x69eC112zochGLuORio8WMiQDVq8Orr8Ly5a5v5qqr4IIL3KZlpkhEXHKx0WLGRJA2beDLL+G551xtJiEBBgyAnTu9jqzEi7jkYoyJMFFRbsjyhg0wbJhbsyw+HiZOhNRUr6MrsSy5GGMiQ8WKMGECrF0LrVvD8OFuEuY773gdWYlkycUYE1nq1YPFi+Htt10nf5cu0K2bm5RpQsaSizEmMnXpAt99B489Bp984iZi3nYb2GCfkLDkYoyJXDExrnlswwa3OdnEiW5p/+eft10wC8iSizHGVK0K//2vG1lWp44bANCiBXz2mdeRFVuWXIwxJkPz5vDpp26OzI4dbihznz6QnOx1ZKGTkgJt27rvV4giLrnYJEpjTK5E3KTLH3+Ee++FuXPdIICxY+HQIa+jK7ixY10CHTu2UF8jGqFLIiQkJOiqVau8DsMYE+62boXbb4c33oBatdwAgMsuc0mouElJgdq14fBht1TO5s1QrVq+ihCR1aqakNd9EVdzMcaYfKlVC+bMcSsvV6gAvXpBhw7w7bdeR5Z/Y8f+PVDh6NFCrb1YcjHGmEC0a+f2jnn6abcbZtOmcMstsGeP15EFJiUFpk//e1WC1FT3uZD6Xiy5GGNMoKKj3aZkGzbAwIEwbZpbSmbyZEhL8zq63PnXWjIUYu3FkosxxuRX5crw5JOuBtO8udt2uUkTeP99ryPL2cqVWddSS02FFSsK5XWWXIwxJljnnANLl8Jbb7mRZBdeCD17wqZNXkeW1Zo1brmbzMeaNYXyOksuxhhTECLQowesWwcPPeRqLw0awF13wZ9/eh2dZyy5GGMiTzATCfN6pkwZGDnSLYB55ZUwbpybH/PSSxG5lIwlF2NM5AlmImGgz5x2Grz4ouvjqF4drrsOzjsP/ve/gsVczFhyMcZElowhuenpgQ/FDeaZVq1cgpkxA5KS3Oe+fV1ZEcCSizEmsgQzkTDYyYelSrmay08/wZ13wqxZbtXl8ePhyJHg4i8mIm75FxHpDnSvU6fOgA0bNngdjjGmKPkvf5Ihr2VQgnkmJxs3uiX+FyyAs85yS/x3716slpKx5V9yoKoLVfWGihUreh2KMaaoBTORMJSTD+vUgfnzYckSt5dMjx7QqRN8/33+ywpzEZdcjDERLJiJhIUx+fCii9wEzEmT3B4yjRrBkCHw++/BlxlmIq5ZLIOtimyMCQu7dsGoUW4pmZNOggcegAEDICrK68iyZc1ixhhTHMTFwdSpblHMhg3d2mXNmsHHH3sdWYFYcjHGmHDQpIlb1n/2bNi7163CfMUVbhhzMWTJxRhjwoUI9O4NP/wA998PixZB/fqu2ezAAa+jyxdLLsYYE25iY90Wyz/+CJde6kam1a/v5skUk35ySy7GGFMYAl2/LLf7qleHV1+F5cuhUiW4+mpo2RK++qpg7ywCllyMMaYwBLoWWSD3tWkDrVu7ZrO1ayEhwY0o27kzuHcWARuKbIwxoeY/qz+32fzB3FemjFtS5vnnoWxZGD0aBg1y2y0HUlYB2VBkY4zxSqBrkQVzX3q6mwOzdq2rzQwf7iZh9u8f3PpnhcRqLsYYE0qBrkUWqvsWL4bBg91nf4VUe7GaizHGeCHQtchCdV+XLtCxY9YZ/WlpntZeLLkYY0woBboWWSjv++ILl3D8/fUXLFzo2S6YllyMMSaU1qxxc1EyH2vWFN59me9ZtcrtfvnLL9CiRcEW2QySJRdjjClpmjd3Q5JfecXNeWndGvr0geTkIgshrJKLiMSKSN0gn60vIpNF5F0RaRDq2IwxplgRcZMuf/wR7rkH5s6FevXcqsuHDhX668MiuYhIBRF5C/gVuMPv/BUiskVENopIv9zKUNUfVHUwMANoWqgBG2NMcVGunOvYX78eLr7YLSszb16hvza60N8QmHRgMrAIaAUgIuWBCb7PR4GvRWQhEAW85vfsg6r6nu+ZCcA/gCuKLnRjjCkGzjwT3ngDPv/c9cMUsrBILqq6H/hARPr6ne4EfKyq2wBEZBnQQVVfA9rlUM5wEakH3AkMy3xdRG4AbgCoUaNGKL+CMcYUD61aFclrwiK55KA64L+RQTJwak43i0g3oCtQBVfjyUJVpwHTwE2iDFmkxhhjjhPOySUG11yWIR3XPJYtVV2Ea1YzxhjjsbDo0M9BCnC63+czgF8KWqiIdBeRafv27StoUcYYY3IQzsllCdBJRE4RkWrAecDSghaqqgtV9YaKFSsWOEBjjDHZC4tmMd/IsDVAeaCMiLQDBgB3Ayt9tw1X1eK1z6cxxkSosEguqvonUCeHyzOKMBRjjDEhEM7NYoXC+lyMMabwRex+LiKyi+OHOhcXFYGSkhnD8bsUdUyF+b5Qlh2KsgpSRjDPVgF2B/k+k7OaqhqX100Rm1yKKxGZpqo3eB1HKITjdynqmArzfaEsOxRlFaSMYJ4VkVWBbGplCkfENYuVAAu9DiCEwvG7FHVMhfm+UJYdirIKUkY4/l0xubCaizGmRLKai7es5mKMKammeR1AJLOaizHGmJCzmosxxpiQs+RijDEm5Cy5GGOMCbmwWP7FGGMKm4iUBi4AolV1idfxlHRWczHGFHsiEisidfO47VygOXBxEYQU8Sy5GGOKLRGpICJvAb8Cd/idv0JEtojIRhHpB6CqXwGzPQo14lizmDGmOEsHJuN2oW0Fx7bwmOD7fBT4WkQWquouz6KMQFZzMcYUW6q6X1U/ANL8TncCPlbVbaq6A1gGdPAkwAhmycUYU9JU5/gVz5OBU0WkMfAgkCgi//EksghizWLGmJImBtdcliEdOKqq3wB9vAkp8ljNxRhT0qQAp/t9PgP4xaNYIpYlF2NMSbME6CQip4hINeA8YKnHMUUcaxYzxhRbvpFha4DyQBkRaQcMAO4GVvpuG66qB7yJMHLZqsjGGGNCzprFjDHGhJwlF2OMMSFnycUYY0zIWXIxxhgTcpZcjDHGhJwlF2OMMSFnycWYAIhIaxFJE5HW2VybKCLLROQEETnL93PtPMqLFpFaIlIul3tqiEjLfMb5HxG51u+z+P0cJSKzRKRnfso0JhiWXIwJjABRvj//PikSB9wI7FbVI0Aq0AJ4wf8XezbOALYA3XO551bgs1yDEvm3iMwRkYx/y1cDXXzXKgJrROQq37WGwJVAhdzKNCYULLkYUzB3A2Vxq+2iqr8AY4C2wEWZbxaRK0XkzhC+fydwOTA603sEeBGoBXzlO/1P35/vh/D9xmTLln8xJkgiUge4BfgROFNEzvRd2gpMAWIzNUF9CfQAqgGvZyqrGm7Bxezek3kZjQ9UtSOAqi4VkSeAjiLyoN89rXA1mEtU9UffuR7AOlXdnq8vakwQbPkXY3IhIv8Frs/m0n24BREvzEdxVwEjgMPAYl8Z03E1iz24NbL83QFci2vO8ndAVX/xrat1FlAa37LywCu4FYBH4Jrekn2fFZe8ooG/solttqpem815Y4JiycWYXIjIeUBdIB64C3gI2AA0xfWJHAUqZXpsL/CI715/pXBJZDdwBKiJ2/t9P/Cdql6a6d2PAUNVNdsWBhHpDLwTwNe4GTjJF89A4A/f+dOA8cALwHRV/TSAsowJiDWLGZMLVV0BrBCRNrjk8g4uOUz2u2e//zO+fvzUbM53w/2b6wAcxHXoD1XV14IMbxlwqu/nCrg+llbAAtwe8j/5rqUB3/p+nq2qu33xtPede9MSiwk169A3Jv+64ZqZ5uTzuauBX1X1+8wXRGSHiKj/AQwHojKf9x0zVDXVt0d8eeAtXA1qC3A+8B7Q23d9OH8noVp+r63h+9N/S2BjQsJqLsbkk6o+JiJLgcuA3tl0uAOMFpHRmc69ALyZQ7FtyP3f42lALLDJ93mfiEQD/8GNFPsR6ISrWX2B68d5QkT+CcQB84CeQCNgla+MRrj+l425vNeYoFhyMSZ/BorIFOBdXMf8UbJ2uK8FnsGNGPO3Adf57u9MEbkPOKyq43J6qYiMAVqpai2/c+Nxnf4vAoNV9c+MqTWq+oiIbAKG4ua2HAEa45rNXvAV0Qz4VlUP5/21jckfSy7G5MLXL3ElvomJvp8PAzNxzVGo6g+ZngE3qfK48z5HReR0XBMZuE72o8CTQYQ3EfhCVeeKSIyIrAYmq+oMX1xzReRN9Y3aEZFPgYt9c2BOxI12mxbEe43Jk/W5GJO7wcANuGG9AKOAqqo6Ib8FichVIrICNzT4Ad/pp4DTgT9y6FvJ6H/5F1DT79y7uKS0TkTq40aBNcNt9Vs/4wD8l6F51/c9zgO6AjHA/Px+D2MCYTUXY3K3DPgIlxDeBD5U1T/8b/D9Es+sSjbnzwMqAyOBj3F7vH+qqr/6mtpeA+oD2dV4JgBNcCPNwA1fHkSmmfnA1EyftwIZkzvnA3/imspOw416+yibdxlTYJZcjMmFqk4B8A1Fzk4UsD6b8wN9h7+aqjrYV16tTO/ZLSJVgdnAA6p6v/91EfkT+CtTU9sYYIyIPAP0Buqr6i7f/eWAdfgt9aKqB0VkBi4pCTBGVdNy+F7GFIg1ixlTMEdVVfwPXHPVfZnPq+rPuRWkqutwtZf7ROTKQF7um+R5A24eS6zfpceAU4CxmR6ZhkssCjwXyDuMCYYlF2PCy43A17gaSSAtC58D/waqAptE5GURedJXzv/5JzQRKQ08kfERmCoiUSGN3hgfaxYzJjC5LZ8fMqp6SER6AAczNVmdTDZrgqlqOvCqiLyGW6E5ozktGdiRcZ+IxOAWy+yAG0xwFm6ts+kiMsC3XYAxIWPJxZhc+IYin4+b5AiwL9MtUfmYRPm6qubZ3KWqP4vIpSLSHdcBXxmXFBZkE5/g9o8ZhEsWybimrwHARyLyKPAoMNf3PSap6r2+ZBMHXAM0FpE+qro2r9iMCZQlF2NydyZu9WIFFuImSPrLbhJlTv7I+5ZjBLcichRuA7KVuAmTf98g8h/fuaq4fV1GAU+q6n5fUhkMfI/rjzkFN7JsLICqporIJbjJntfi5ttcko/4jMmVJRdjcvcSbgTX4ZxGVuUwWbJAVPVN8v73uQ74EHgDWKCqx5rNfLPuH/XtULkYt2DlkkzvOARc55tc+W4o4zfGltw3xhgTcjZazBhjTMhZcjHGGBNyllyMMcaEnCUXY4wxIWfJxRhjTMhZcjHGGBNyllyMMcaE3P8D17enqYkzrzEAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "histo, bin_edges = np.histogram(list(data_dict.values()), 15)\n", "bin_center = 0.5*(bin_edges[1:] + bin_edges[:-1])\n", "powerPlot(bin_center,histo, 'r', '^')\n", "#lg=plt.legend(labels = [u'Tweets', u'Fit'], loc=3, fontsize=20)\n", "plt.ylabel(u'概率', fontsize=20)\n", "plt.xlabel(u'推特数', fontsize=20) \n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T11:14:19.219105Z", "start_time": "2018-04-28T11:14:19.171044Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import statsmodels.api as sm\n", "from collections import defaultdict\n", "import numpy as np\n", "\n", "def powerPlot2(data):\n", " d = sorted(data, reverse = True )\n", " d_table = defaultdict(int)\n", " for k in d:\n", " d_table[k] += 1\n", " d_value = sorted(d_table)\n", " d_value = [i+1 for i in d_value]\n", " d_freq = [d_table[i]+1 for i in d_value]\n", " d_prob = [float(i)/sum(d_freq) for i in d_freq]\n", " x = np.log(d_value)\n", " y = np.log(d_prob)\n", " xx = sm.add_constant(x, prepend=True)\n", " res = sm.OLS(y,xx).fit()\n", " constant,beta = res.params\n", " r2 = res.rsquared\n", " plt.plot(d_value, d_prob, 'ro')\n", " plt.plot(d_value, np.exp(constant+x*beta),\"red\")\n", " plt.xscale('log'); plt.yscale('log')\n", " plt.text(max(d_value)/2,max(d_prob)/5,\n", " 'Beta = ' + str(round(beta,2)) +'\\n' + 'R squared = ' + str(round(r2, 2)))\n", " plt.title('Distribution')\n", " plt.ylabel('P(K)')\n", " plt.xlabel('K')\n", " plt.show()\n", " " ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T11:14:26.818914Z", "start_time": "2018-04-28T11:14:26.499569Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEdCAYAAAAb9oCRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8VNX9//HXJ2ExUURlkUWSsAhhB0FAREAFoQoqFfxhgxVQKVW0LthSqHUDtIpata2KIirEakUUly+iLFKVRcMmqEGx7GgFKyKbSji/P04SA0kgk8zMzUzez8djHsncuXPnk5Tm7b3nns8x5xwiIiIllRB0ASIiElsUHCIiEhIFh4iIhETBISIiIVFwiIhISBQcIiISEgWHVChmtsHMXO5ju5nNMLM2BV5/38wuKONnVDGzbDNrn/v8HTMbVdbac4/Vwcw+NbMq4TieSGkoOKQi6g8kAl2ALOAdM+sD4Jw70zn3xpHebGZTzGxIca875350zqU751aUtVAz62dmLxU49jLnXHPn3I9lPbZIaSk4pEJyzh10zn3hnLsHGApMM7NjS/j2tkClol4ws3D/fyoNqBHmY4qUiYJDKjzn3KvAd0Cf3EtZ/QDMrIeZrTCz/Wb2mZkda2YbgA7AVDNzufs9bWaPmdlCYF3uNmdmrQp8THUze8XM9pjZcjPrkLtfTzPbUbAeM9uRu/124BGgR+7xhh6+v5nVMbMXci+7bTOzh8wsKfe1oWaWZWa/N7P/mdlmM+sfmd+iVCQKDhEvGzj1sG2ZwBSgJnAFcMA5lwYsA4Y556zAvpcAf8Zf/irKNcDfgXrAAuBFM6t8pIKcc7cD1wELnXPmnHu64Ou5Zzd5oZcOdAfOBO4ssFvz3K9pwDRg8pE+U6QkFBwiXnXgwGHbcvB/cH9yzi12zv1whPe/65xb6Jz7upjXn3fOve2c+w4Yiw+QlmWsuSPQDLjeOfeNc24d8Cfg8gL7fOucu9c5twv4B1DHzGqV8XOlglNwSIVnZon4/2JffdhLA4DOwAYzu/Yoh9lwlNfX532TG0A7gZNCq7SQNGCzc27/YXXUzv2ZAL4q8NrO3K8lHcsRKVKRA3wiFcxIYDcwv+BG59xy4Cwz6wrMMbPVzrl/A0W1lD54lM/IH+A2sxpALfwf+ZMo8Ic89w9+tYJlHOGYW4FTzKxqgbOhNGCLcy7HzIp/p0gZ6IxDKiTz6prZzcBtwK+ccwcO2+cKM6sObAL2AHlzJ74F2oZ4yefq3DkYJwEPA+855/4DfAYczBuQB35b4HPyPquRmZ1kZiccdsylwBbgr2ZWw8wa4cc3/hFCXSIhU3BIRfQafvxiJXAa0M05t6SI/X6Dv9TzIfC4c25u7vb7gV8Bn4bwmS8AjwMbgROAwQC5Yw/XAI+a2Sf4M5NvDqt1C7ANOLvgAXODrh/QAPgCf8b0OnBfCHWJhMy0kJOIiIRCZxwiIhISBYeIiIREwSEiIiFRcIiISEgUHCIiEpK4nABYs2ZNl5aWFnQZIiIxZdmyZTucc0ednxSXwZGWlkZWVlbQZYiIxBQz21iS/XSpSkREQhJXwWFm/c1s8nfffRd0KSIicSuugsM595pzbkT16tWDLkVEJG7FVXCIiEjkKThERCQkCg4REQmJgqOgPXtg166gqxARKdcUHHkyM6FePaheHWrVgunTg65IRKRciqvgKPXtuJmZMGLEz2cbO3bAFVfAgw+Gv0gRkRgXV8FR6ttxx42DvXsP3XbwINx8M0ycCD/+GL4iRURiXFwFR6lt2lT0dud8qLRrB//+d3RrEhEppxQcACkpRW9PTYXXX4d9+6BHDxg2zF/GEhGpwBQcABMmQHLyoduSk/32Cy6Ajz+GMWP8gHmzZvDUU/5SlohIBaTgAMjIgMmT/RmGmf86ebLfDj5E7r4bVq6EFi3gyiuhZ08fKCIiFYyCI09GBmzY4M8kNmz4OTQKatkSFi6EKVN8aLRrB3/8Y+GBdRGROKbgCFVCAgwfDmvXwpAhcM89PlD+7/+CrkxEJCriKjii2la9Zk2YOhXeeQeSkvxYyMCBsGVL5D9bRCRAcRUcgbRV79HDj31MmABvvAHNm8NDD8GBA9GrQUQkiuIqOAJTpQqMHevHPbp1gxtugM6d4cMPg65MRCTsFBzh1KiRH+v417/gyy99eIwaBVqRUETiiIIj3Mxg0CDIzvah8eijkJ4OL7zgZ6KLiMQ4BUekHH88PPwwLF0K9evD4MHQty+sWxd0ZSIiZaLgiLSOHX14PPIILF4MrVrB+PHwww9BVyYiUioKjmhITPSXrbKz4aKL4NZboW1bWLAg6MpEREKm4IimevX8WMfs2fDTT3DOOX7dj+3bg65MRKTEFBxB6NsX1qzxLdv/+U/fOPHJJ9U4UURiQlwFR1RnjpdVUpIf61i1Clq3hquvhrPOgtWrg65MROSI4io4Apk5XlbNm/u2JU8/7ftfnXYa/OEPsGdP0JWJiBQproIjZpn5sY61a/3Xe+/17dtfey3oykREClFwlCc1avixjnffhWrV4MIL4Ze/hM2bg65MRCSfgqM86tYNli/3LdvffNNfznrgATVOFJFyQcFRXlWp4sc6PvnErzZ4880/TyYUEQmQgqO8S0vzYx0vvQQ7dsAZZ8A118DOnUFXJiIVlIIjFpj5sY5PP/Ut2x9/3DdOfO45NU4UkahTcMSSatX8WEdWFqSk+HXRzzsPPv886MpEpAJRcMSi9u19w8S//x0++MBPILzjDjVOFJGoUHDEqsREP9aRnQ0DBsDtt0ObNjBvXtCViUicU3DEurp1fb+rOXMgJwd69YIhQ+C//w26MhGJUwqOeHHeeb7P1a23+qVr09P9ILoaJ4pImCk44klSEtx5J3z0kR8HGTkSzjzTN1IUEQkTBUc8Sk/3Yx3TpsEXX0CHDjB6NOzeHXRlIhIH4io4YqqteqSZ+bGO7Gzo3h3uv9/fzlu7NmRmBl2diMSwuAqOmGyrHmmzZx/apmT7dt+B96GHgqtJRGJaXAWHFGHcONi799BtOTlw440waZJfwlZEJAQKjni3aVPR252DW27x4x+LF0e3JhGJaQqOeJeSUvT21FR45RXfLLFrV/jNb+Dbb6Nbm4jEJAVHvJswAZKTD92WnOy3X3SRb9t+880wZQo0awbTp6txoshR/PTTT3zyySdBlxEYBUe8y8iAyZP9GYaZ/zp5st8OcNxxfqxj2TJo3BguvxzOPdcvYytSRmZGkyZNSElJ4dJLL2X//v1H3P/GG2+MSl379+9nxIgRNGvWjNTUVB588MFC+4wZM4b09HRSUlK4995787cPGTKEunXrcv3110el1nLJORd3jw4dOjgphZwc5x57zLkTTnCuShXn/vxn5/btC7oqiWGJiYnOOedycnJc79693cyZM0u0f6Tt2LHDzZgxwx08eNBt377d1a5d223atOmQfbZu3eqcc2779u3uuOOOc7t27XLOOff666+72bNnu3PPPTcqtUYTkOVK8DdWZxzys4QEP9aRnQ2DBvlZ6K1bw9tvB12ZxLjdu3eza9cuWrRoAcCKFSvo0qULp556KldddRUHDx6kU6dO5OTk0KRJE+bOncvq1avp2LEjDRs2pFevXuwO4wTWGjVqcMkll2Bm1KxZkwYNGrDzsMXR6tWrB8C2bdtITU3l2GOPBeCCCy7gmGOOCVstsUjBIYWdfLIf65g711/eOu88+NWv4Kuvgq5MYkxOTg7p6enUrVuX9PR0GjduzIEDBxg5ciQvvvgin3/+OXv27GHWrFl88MEHJCYmsm7dOnr16kXVqlV54403WL9+PSeccAIzZswodPyHHnqI9PT0Qx5Tp04NqcY1a9awf/9+WrVqdcj2d955hwYNGtC9e3cmTZpEQoL+XObRb0KKd+65vu/V7bf7pWvT0+HRR/08EJESSExMJDs7m507d5KUlMT48ePJzs5mzZo19O7dm/T0dBYtWsSGDRsKvbdBgwbMnDmT4cOHs3z5crZu3Vpon9/97ndkZ2cf8hg2bFih/caOHUuTJk1o0qQJixYtyt++Y8cOLr/8cqZOnYqZHfKenj17snnzZhYvXszIkSP5z3/+U/ZfSJxQcMiRHXMM3Hab77zbsaNfA6RrV1ixIujKJIZUrlyZQYMG8eGHH3LgwAHS09Pz/9Bv3LixyEHxYcOGsWHDBsaOHcugQYNwRdzt9+CDD+YHQt5jypQphfabOHEi69atY926dXTt2hWAb7/9lv79+zNx4kROP/30Ymtv3rw53bp1Y/ny5WX4DcQXBYeUTNOmfqwjMxM2bPAhctNN8P33QVcmMeKNN96gffv2NGvWjK1bt7I4d+LpihUryOsvV7lyZXbu3IlzjjVr1nDxxRdTv3595hWzQNmNN96YHwh5jyuvvPKotezatYsLL7yQcePG8Ytf/KLQ6/v372fZsmUAfP311yxZsoR27dqV9kePPyUZQY+1h+6qirD//c+5kSOdM3Oufn3nXnrJuYMHg65KyiHANW7c2DVs2NANHjzYff/998455+bMmeOaNm3qGjVq5Hr16pV/x9KoUaNcnTp13Lx589yUKVNczZo1XZcuXdyQIUPcXXfdFba67rrrLpecnOwaN26c//jiiy/czJkz3X333ef27t3rOnXq5FJTU12LFi3c9OnT89/bvn17V69ePZeUlOQaN27spk2bFra6gkYJ76oyV8TpX6zr2LGjy8rKCrqM+LdkiV/zY9Uq6NcPHnkE0tKCrkpESsnMljnnOh5tP12qktLr0gWysnzL9gULoGVLuPdeNU4UiXMKDimbSpX8WMenn/rbdv/wBzjtNHj//aArE5EIUXBIeDRoAC+/DLNmwa5d0K0bXH01fPNN0JWJSJgpOCS8LrzQN0685RaYOtXP/XjmGTVOlJhx++23M378+KDLKNcUHBJ+xx7rxzpWrPC38Q4dCmef7VuZSIWS1+QwNTWVAQMGhLVtSHk2ZswYTjnlFFq3bp1/W29BEydOpGnTpjRr1oxZs2blb9+6dSt9+/alQYMGnHHGGdEsOSQxERxmVtnMzjWzPkHXIiFo3Rrefdd34/3oI2jTBm69FfbtC7oyiZK8FiIbNmzg+OOP529/+1vQJUXc/Pnzee+999iwYQMPPPBAoXklCxYsYNasWaxatYq5c+dy/fXX5wfqZZddRkZGBps3b2b+/PlBlF8iUQsOM0sys6alfHtroANQeKaOlG8JCX6sIzsbBg+G8eOhVSuYMyfoyiSKzIwePXqwefPmQq9NmTKFRo0akZKSwptvvgnAE088QZMmTWjfvj1XXnklV111FeDbgLz33nsAbNiwgSZNmgD+v9S7d+9O48aN6dSpU357kp49e3LDDTeQkpLCmjVrimyuCDBhwgQaNmxI586dy7zOxsyZMxk6dCiVKlWid+/ebN++na8K9HnLysqiV69eJCUl0aBBA9q0acPSpUtZtmwZzjkuv/xyAJKSkspURyRFPDjM7HgzewX4L/D7AtsvNbP1ZrbOzIYf6RjOueXAvyJcqkRS7drw7LMwfz5Urgx9+/og+dvf/NyPhAT/NTMz6EolAvbt28c///lPevfuXei1m266Kb9fVZcuXcjOzmbChAksWrSIpUuXsmXLlqMe38x46qmn+OKLL+jevTtPPPFE/mt79+5l06ZNNGvWrMjminPnzmXmzJmsWbOGuXPnsraYtWhmzJhRqKFiUWMhmzdvJjU1Nf95/fr1+fLLL/Oft2zZkrfeeovdu3fz5ZdfsmLFCrZv387KlSupX79+fg+vSZMmHfXnDkqlKHzGQeAR4HWgC4CZVQPuz32eA6w0s9eAROD5Au+d4JxTT+94cvbZfsLgvffCXXfBCy/8/NrGjTBihP8+b6EpiWl5bdI3b97MpEmTuPjiiwvtc9ZZZzFq1CjuuOMOWrZsSWZmJgMGDKB27doADBw4kKVLlx7xc04++WQyMzO57777WLRoEV26dMl/7Ze//CUAa9euzW+uCD7MOnXqxLZt27jiiivy26b379+/yM8YOHAgAwcOPOrP/OOPPx7SSTchIYHExMT85+effz6LFy+mY8eOtGjRgjZt2lCjRg3Wr19PdnY2CxYsICcnh86dO9O7d2/atm171M+MtoifcTjndjvn5gEHCmzuAyx0zm11zn0FzAfOdc595ZzrWeCh0IhHVav6sY5atQq/tncvjBsX/ZokIvLGOGbMmMFjjz3GviLGt1599VUGDBhA//79efHFF9m/fz9Vq1bNf/2nAhNKK1WqRE5ud+aC28eNG8e8efO44YYbGDly5CENEY877jiAYpsr7t+/n8qVK+fv/8MPPxT5s7z44ouFGirecccdhfarW7fuIZ18t23bximnnHLIPnfddRfZ2dnMnDmTLVu2kJ6eTu3atenevTsnnngiNWvW5Mwzz+Szzz4r+hcbsKAGxxsAGws83wLULW5nM2sLTADOMbObitlnhJllmVnW9u3bw1qsREiB0/dDbNxY9HaJWf3796dXr17ceeedhV5bt24dGRkZjBo1ivfff59OnToxa9Ysdu3axb59+3juuefy901LS2PlypUAhwwer1mzhr59+9K0aVPmFDN+Vlxzxc6dO/Pcc8/xww8/sH37dmbOnFnk+wcNGlSooeJtt91WaL8LLriAZ555hpycHN5++22aNm3KSSedlP/6gQMH2LNnDwCTJ0+mYcOGNGjQgN69ezNv3jx27drFzp07WbJkCe3btz/arzYQ0bhUVZQq+EtYeQ7iL1kVyTm3CjjitQvn3GRgMvheVWGoUSItJaXokEhMhBkz4JJL/EJSEhfGjx9P69atGTx48CGXXwYNGsSePXuoU6cOzz77LI0aNeLiiy+mZcuWnHzyyZx11ll8n9uFefTo0QwePJj58+fTvHnz/GNce+21DB8+nHvuuYeuXbsecjaSJykpiWeffZahQ4dy4MABGjVqxMyZMxk8eDALFiygcePGNGrUqMhxmFAMGDCAhQsX0qhRI2rUqJEffGPHjuW8887jtNNOo0OHDuzbt4+2bdvy1FNPAZCSksLo0aM5/fTTcc4xZsyY/MH/8iZqTQ7NbCjQzTl3lZn9GujpnBue+9p04CXn3Mvh+Cw1OYwRmZl+TGPv3p+3Va0Kder4QDn/fD943rBhcDVK4J5++mnee+89nnzyyaBLiXvlvcnhHKCPmdU2szpAV+CtgGqRoGRk+Dkeqan+zCI1FaZMgXXr4MEH4d//9o0T774bfvwx6GpFJFc0bsetZmbrgL8Ag3K/bwGMAxYD7wM3O+f2hOGz+pvZ5LxFYSQGZGT4haEOHvRfMzJ848QbbvCNE88/H8aOhfbt/WRCEQmc1uOQ8u+NN+Daa/3lq2HD/K28NWsGXZVI3Cnvl6pESu6CC+Djj33L9mnTfOPEqVPVOFEkIAoOiQ3HHgv33OMbJ6anw/Dh0KOHDxQRiaq4Cg6NcVQArVr5QfMnn/Sh0a6dHwMpeGeWiERUXAWHc+4159yI6tWrB12KRFJCAlx5pW+cOGSIv+uqVSuYPTvoykQqhLgKDqlgatXyYx3vvOPnf5x/PgwaBAXaPYhI+Ck4JPb16OEbJ44fD6+/Ds2bw8MPQ06xzQhEpAwUHBIfqlTxzRHXrIGuXeF3v4NOnUC3ZYuEXVwFhwbHhcaN/VjHCy/4JoqdOsF118Hh/yYyM7UOiEgpxVVwaHBcAN++5NJL/czza6+Fv//d38L7wgt+7kdej6yNG/3zvHVAFB4iJaKZ4xL/srLgN7+B5cuhTx9YvRq2bSu8X2qqb3siUkFp5rhIno4d4YMP4KGHYNGiokMDYNOm6NYlEqMUHFIxJCbC9df7y1fJyUXvk5IS3ZpEYpSCQyqW+vV9K/cCS5MCkJQEEyYEU5NIjImr4NBdVVIiGRl+3Y8GDX7elpAA+/b59u4ickRxFRy6q0pKLCPDj2k453tenXYaXH01dO/u54KISLHiKjhESqVFC1i4EJ56yve/at/et3DfU+a1xUTikoJDBPzcj2HDfHD8+td+saiWLf0iUiJyCAWHSEE1a/rxj4UL/Rog/frBJZfAli2hH0uz0yVOKThEitK9u1806u67fQuT5s3hwQfhwIGSvV+z0yWOxVVw6K4qCasqVWDMGD94ftZZcNNNcPrpfjLh0YwbV3hxqb17/XaRGBdXwaG7qiQiGjb0Yx0zZsDXX0OXLnDNNbBzZ/HvKW4WumanSxyIq+AQiRgzP9bx6ad+Bvrjj/vLV88/7y9FHa64WeianS5xQMEhEorjj4e//hU+/BBOOQUuu8w3Tly37tD9Jkwo3NokOVmz0yUuKDhESuO002DJEvjb32DpUr/m+Z13wg8/+NczMnxrk9RUf7aSmuqfZ2QEW7dIGKitukhZbdvmB85feAGaNoXHHoOzzw66KpGQqa26SLTUq+fHOt58069zfs45fhLh118HXZlIRCg4RMIlb5GoW2/1QZKe7i9PqXGixJm4Cg7N45DAJSX5sY6PPoK2bf3Kg926+ecicaLEwWFmDcxssJndbGajzSzDzBpGsrhQaR6HlBvp6TB/PjzzDHz+uR9Mv+UWNU6UuHDU4DCzVmY2B5gGtAb2AbuBZsAUM5trZm0jW6ZIDDLzYx1r18Lw4TBpku/E++qrQVcmUiaVSrDPXcD1zrm1Rb1oZo2Bu4FLw1mYSNw46SQ/1nHFFTByJFx0kX88/LAmBEpMKsmlqgeOEBq3OOe+cM4pNESO5swzYfly37L97bf92cf998NPPwVdmUhIShIcU8zsyoIbzKyKmU0DNJtJJBSVK/uxjk8+8bftjh4NHTv6yYQiMaIkwXEG8P/M7CEzSzCzusC7gOW+JiKhSk2FWbPg5Zfhf/+Drl39Zaxvvw26MpGjOmpwOOe+AfriB8XnAouA6c65Ic65fRGuTyR+mcHFF/uzjxtvhCef9HdjTZ9edONEkXKiJHdVjQXGAN8DxwAfA9XMbGzuayJSFtWq+bGOrCy/UuDll0OvXvDZZ0FXJlKkklyqqpz7qATMAT4ssK1y5EoTqWDatYNFi+DRR2HZMmjdGm6/HfbvD7oykUMctcmhmaU45464+oyZ1XfObQ1rZWWgJocS8776Cm6+GZ57Dk49Ff7xD38WIhJB4WxyONnMxhc1Szx3NvntwBOlqDHs1HJE4kadOn598rff9s979/Yt2b/6Kti6RChZcPwCWAM8YWafm9liM1tkZp/jZ5OvA/pFssiSUssRiTu9evk+V7fd5peuTU/3l7JycoKuTCqwkNbjMLMEoAb+VtxvnHPl8l+vLlVJXPrsM/jtb30PrM6d/bof7doFXZXEkbBdqjKzE8xskpnNAK52zm13zn1dXkNDJG41bQrDhkGNGn7Vwfbt4Re/gO+/D/1YmZn+Dq6EBP81MzO675eYVqKZ4/i7p54CepjZHyJbkogUKTPTt2n/5puft735pv/D/fLLJZ/7kZkJI0bAxo3+PRs3+ucl/eNf1vdLzCvJXVXrnXMNc79PAhY559pHo7jS0qUqiUtpaf6P9OEqV/b9rvr182ugp6aW7jipqbBhQ+nrKOn7pdwK511V+R3YcmeKVy1LYSJSSpuKuSv+p598y/YFC3zjxHvvPXLjxOKOU9z2cL9fYl5JgqOJmf2Y9wDSc7//Kfe5iERDcS3YU1P9nI9PPoHzzoM//MEvHPX++6Edp6Qt3sv6fol5JelVleCcq1Lgkfe8snOuSjSKFBFgwgRITj50W3Ky3w7+D/fLL/vmid9955esvfpq30QxlOOUtQ6Je3G15rhIXMvI8AtCpab6Bompqf55xmGrG1x4oT/7GD0apk6FZs3g2Wd/Hjwv6XHKWofErZDmccQKDY6L5ProI9+uffFi6NnTTx5MTw+6Kimnwjk4LiKxqk0beO89ePxxWLnSP7/1VtinFRGk9BQcIvEuIcHPs1i7FgYPhvHjfefdt94KujKJUQoOkYqidm0/1jFvHiQmQp8+Pki+/DLoyiTGxFVwqDuuSAmcc44f+7jjDnjlFT/m8fe/q3GilFhcBYe644qUUNWq8Oc/w+rV0KkTjBoFZ5wBy5cHXZnEgLgKDhEJ0amn+rGO557zM79PPx1uuAF27Qq6MinHFBwiFZ0ZXHYZZGf7W3cffhiaN/frf8Th7fpSdgoOEfFOOMGPdSxe7AfSBw3yjRPXrw+6MilnFBwicqjOneHDD+HBB+Hf/4aWLeHuu+FHtaYTT8EhIoVVquTHOj791C8WNXasXzjq3XeDrkzKAQWHiBTvlFPgpZfgtddgzx7o3h2GD4cdO4KuTAKk4BCRo+vXDz7+2LdsnzbNz/2YOlWD5xWUgkNESubYY+Gee/xcj/R0f+bRo4fvxCsVioJDRELTurUfNH/iCX8W0ratHwPZuzfoyiRKFBwiErqEBLjqKj/3IyPD33XVqhXMnh10ZRIFCg4RKb1ateDpp/1651WqwPnn+/kfW7cGXZlEkIJDRMquZ09Ytcq3bH/9dT/z/OGH1TgxTik4RCQ8qlaFceNgzRrfMPF3v/OTCQuuxpmZCWlp/lJXWpp/LjFHwSEi4dW4Mbz5Jjz/vL9k1akTXHedH0wfMQI2bvS38W7c6J8rPGKO1hwXkcj57jv40598D6yEhKIvXaWmwoYNUS9NCtOa4yISvOrV4ZFHYOnS4sc7Nm2Kbk1SZgoOEYm800+HlJSiXytuu5RbCg4RiY6JEyE5+dBtZjBkSDD1SKkpOEQkOjIyYPJkP6Zh5tf8qFkTJkyAoUNh+/agK5QSilpwmFmSmTUt5Xs7mtkjZvaKmQ0Md20iEiUZGX4g/OBB+O9//fd//KO/syo9HaZM8a9JuRbx4DCz483sFeC/wO8LbL/UzNab2TozG36Uw6xwzl0HDAf6R7BcEYmm5GR/CWvlSr9g1FVX+caJH38cdGVyBNE44zgIPALclLfBzKoB9wPdch8TzayWmdUxs3cKPHoDOOfybse4BXgoCjWLSDS1bAkLF8JTT/nFo9q182ciapxYLkU8OJxzu51z84ADBTb3ARY657Y6574C5gPnOue+cs71LPB4G8DMKpnZfcBrzrnlka5ZRAJgBsOG+caJv/61b+HesiV9akf6AAAKGUlEQVS88UbQlclhghocbwBsLPB8C1D3CPtPBM4ARpjZiKJ2MLMRZpZlZlnbNcgmErtq1vRjHQsX+ktZ/frBJZfAli1BVya5ggqOKvhLWHkOAsV2Q3PO/d451805N9Q5N7mYfSY75zo65zrWqlUrzOWKSNR17w4rVviW7bNn+8aJf/0rHDhw9PdKRAUVHF8C9Qs8PwXYHFAtIlJeVakCY8b4wfKzzoIbb/S9rz74IOjKKrSggmMO0MfMaptZHaAr8FZAtYhIedewoR/rePFFfxtvly5w7bWwc2fQlVVI0bgdt5qZrQP+AgzK/b4FMA5YDLwP3Oyc2xOGz+pvZpO/++67sh5KRMobMxg40N91df318Nhj/vLV88/7brsSNeqOKyKxadkyGDnSr/fRuzf84x/QpEnQVcU0dccVkfjWoQMsWeK77y5Z4tc8v/NO+OGHoCuLewoOEYldiYkwapSf+3HxxXDbbdCmDcyfH3RlcS2ugkNjHCIVVL16fqxj9mx/u+6558Lll8PXXwddWVyKq+Bwzr3mnBtRvXr1oEsRkSD07evXPB83Dl54wTdOfOIJNU4Ms7gKDhERkpJg/HhYtcpfthoxArp1g48+CrqyuKHgEJH41Lw5LFgATz8Nn38Op50Gt9wCe8p853+Fp+AQkfhlBldc4QfPhw2DSZOgRQt49dWgK4tpcRUcGhwXkSLVqOHHOt59F6pVg4su8ndhbdoUdGUxKa6CQ4PjInEoMxPS0iAhwX/NzCz9vt26+caJf/kLvPWWP/u4/3746afI1R8uofweIs05F3ePDh06OBGJA9OnO5ec7JxvKuIfycl+e1n2dc659eudu+ACv1+bNs4tXhzRH6VMQv3ZSgnIciX4G6uWIyJSfqWlwcaNhbenpvr1yku7bx7n4JVX4LrrYNs2fwfW3XfDiSeWre5wK83PVgpqOSIisa+4MYiitoeybx4zGDDAN0684QY/DpKe7i8Dlaf/qC7NzxZBCg4RKb9SUkq+PZR9D1etGjzwgG+YmJYGQ4b4xomffVbiUiOqLD9bBMRVcOiuKpE4M2GCXz62oORkv70s+xanfXtYtMh32s3Kgtat4Y47YP/+0GsPp3D8bOFUkoGQWHtocFwkjkyf7lxqqnNm/uuRBoRD2fdovvzSucsu8wPRp57q3Ny5pT9WOITzZysGGhzX4LiIhMHbb8M118C6dZCR4W/fPfnkoKuKCA2Oi4iEQ+/esHo1/PnPfuna9HR4/PEK3ThRwSEicjTHHOPHOj76yPe8GjkSzjzTN1KsgBQcIiIl1awZzJ0L06bBF1/4VQhHj4bdu4OuLKriKjh0V5WIRJyZv1137Vq46io/5tG8uZ9IWEHEVXA49aoSkWg58UR47DF/++6JJ/qJhBddVPQM7zgTV8EhIhJ1Z5wBy5bBfff5y1gtWvjvY6FxYikpOESkYippt9mS7Fe5MtSt68889u6F3/8eGjf2ZyOhHKc8dcA9kpJM9oi1hyYAisgRlbTbbFn2M/NfR4xw7rHHjn6cKHXAPRI0AVATAEWkGCXtNlvW/Y4/3i9V61zR8z4KHidKHXCPRBMARUSKU9Jus2Xd7/vv/fhHcZMFC76vnHXAPRIFh4hUPCXtNhuO/dq2LdlxylkH3CNRcIhIxVPSbrPh2m/ixMKvm8HgwaF/VnlQkoGQWHkA/YHJTZo0Ccc4kYjEs5J2mw3XfgVfr13buTp1/AD44MHObdsW2mdFCBoc1+C4iJRj+/fDvff6s5GqVf2Stb/5DSQmBlaSBsdFRMqzY47xHXdXr4ZOneDaa/1kwuXLg67sqBQcIiJBOvVUeOstP9lv40Y4/XS//vn33wddWbEUHCIiQTODX/0KsrNhxAh4+GHfOPGll/wckHJGwSEiUl6ceCI8+qhvVVKzJgwcCP36RW0CYEkpOEREypsuXSArCx54ABYu9I0T//KXctM4UcEhIlIeVaoEN94In34KffrAmDHQvj28917QlSk4RETKtQYN4OWXYdYsP2B+1ll+AalvvgmsJAWHiEgsuPBC+OQTuOUWePppSE+HZ54JZPA8roJDS8eKSFw79lg/aXDFCmjaFIYOhbPP9pezoiiugsNp6VgRqQhat4Z334UnnoCPPvKNFP/0J9i3LyofH1fBISJSYSQk+LGO7Gy47DLfDLFVq0NXHYzUR0f8E0REJHJq1/ZjHfPn+0tZJ5wQ8Y+sFPFPEBGRyDv7bFi1ys9CjzCdcYiIxIsohAYoOEREJEQKDhERCYmCQ0REQqLgEBGJB5mZkJbmb9NNS/PPI0R3VYmIxLrMTL+Ox969/vnGjf45QEZG2D9OZxwiIrFu3LifQyPP3r1+ewQoOEREYt2mTaFtLyMFh4hIrEtJCW17GSk4RERi3YQJkJx86LbkZL89AuIqONRWXUQqpIwMmDwZUlP97PHUVP88AgPjAOYCWAQk0jp27OiysrKCLkNEJKaY2TLnXMej7RdXZxwiIhJ5Cg4REQmJgkNEREKi4BARkZAoOEREJCRxeVeVmW0HNpby7dWBeL2ftzz/bEHVFo3PjcRnhOuYZT1Oad9fE9hRhs+V0JXkf6tU51ytox0oLoOjLMxssnNuRNB1REJ5/tmCqi0anxuJzwjXMct6nNK+38yySnLbp4RPOP8d6lJVYa8FXUAEleefLajaovG5kfiMcB2zrMcpz/+m5FBh+99KZxwiEnU644htOuMQkSBMDroAKT2dcYiISEh0xiEiIiFRcIiISEi05riIBMrMKgPdgUrOuTlB1yNHpzMOEYkYM0sys6ZH2a010AH4RRRKkjBQcIhI2JnZ8Wb2CvBf4PcFtl9qZuvNbJ2ZDQdwzi0H/hVQqVIKulQlIpFwEHgEeB3oAmBm1YD7c5/nACvN7DXn3PbAqpRS0RmHiISdc263c24ecKDA5j7AQufcVufcV8B84NxACpQyUXCISLQ04NDmo1uAumbWFpgAnGNmNwVSmYREl6pEJFqq4C9h5TkI5DjnVgEZwZQkpaEzDhGJli+B+gWenwJsDqgWKQMFh4hEyxygj5nVNrM6QFfgrYBrklLQpSoRCbvcO6hWANWAY8ysJ3A1MA5YnLvbzc65PcFUKGWhJociIhISXaoSEZGQKDhERCQkCg4REQmJgkNEREKi4BARkZAoOEREJCQKDhERCYmCQ0REQqLgEIkCMztQ4PtbzGy2mSUGWZNIaanliEgUmdk5wK+Bbs65nKDrESkNtRwRiYLcM46GwFygn3Pu84BLEik1XaoSiZ4ZwHUKDYl1Cg6R6NkOnBl0ESJlpeAQiZ7LgQwzGxh0ISJloTEOkSgwswPOuUq562vPBXrlLpkqEnN0xiESRblhcRMwy8xqBV2PSGnojENEREKiMw4REQmJgkNEREKi4BARkZAoOEREJCQKDhERCYmCQ0REQqLgEBGRkCg4REQkJAoOEREJyf8H65s7sC01H0sAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "powerPlot2(data_dict.values())" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T11:14:50.947967Z", "start_time": "2018-04-28T11:14:50.933308Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import powerlaw\n", "def plotPowerlaw(data,ax,col,xlab):\n", " fit = powerlaw.Fit(data,xmin=2)\n", " #fit = powerlaw.Fit(data)\n", " fit.plot_pdf(color = col, linewidth = 2)\n", " a,x = (fit.power_law.alpha,fit.power_law.xmin)\n", " fit.power_law.plot_pdf(color = col, linestyle = 'dotted', ax = ax, \\\n", " label = r\"$\\alpha = %d \\:\\:, x_{min} = %d$\" % (a,x))\n", " ax.set_xlabel(xlab, fontsize = 20)\n", " ax.set_ylabel('$Probability$', fontsize = 20)\n", " plt.legend(loc = 0, frameon = False)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T11:14:53.968210Z", "start_time": "2018-04-28T11:14:53.962880Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from collections import defaultdict\n", "data_dict = defaultdict(int)\n", "\n", "for i in df['From User']:\n", " data_dict[i] += 1" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2018-04-28T11:14:57.469192Z", "start_time": "2018-04-28T11:14:56.922983Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZkAAAEZCAYAAABFFVgWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3XdYVGf68PHvDL0oWFCxgRFRRKWpiS0WVGwkRmOaMcmmsL93E5Ndzabopm3WmGg0xY2JJHFNMUVNJY3YSwxRaY4gYhcUFUURI0g77x+PDGKdgRlmYO7Pdc3FzJkz59ykzM3T7kenaZqGEEIIYQV6WwcghBCi8ZIkI4QQwmokyQghhLAaSTJCCCGsRpKMEEIIq5EkI4QQwmokyQghhLAaSTJCCCGsRpKMEEIIq3G2dQC21rJlSwIDA20dhhBCNCgHDhzgxIkT1z3P4ZNMYGAg27Zts3UYQgjRoPTu3duk86S7TAghhNVIkhFCCGE1kmSEEEJYjSQZIYQQViNJRgghhNVIkhFCCGE1kmSEEEJYjSSZuvjtQziwxdZRCCGE3WowSaa4uJjs7OxrnpOVlcXdd9/NxIkTSU5Otm5ApcVw2AAn9lv3PkII0YDZfZI5c+YM48ePp3Xr1syZM8d4fNmyZXTq1ImgoCAWL14MwBtvvMHbb79NfHw8CxYssG5grh4wcS5ETFSvj2VD9nrQNOveVwhhF0pKSujbty9hYWGEhobywgsv2Doks+Tk5DB06FBCQkIIDQ3lrbfessp97L6sjF6vZ+rUqYwbN46kpCQAioqKmD59OklJSTg5OREeHk5sbCz5+fn4+fkBKjlZnZNL9fMdP8MRA3S6EVzcrX9vIYRNubm5sWbNGry9vSkrK2PgwIGMHj2am266ydahmcTZ2Zl58+YRGRlJUVERUVFRjBgxgu7du1v0PnbfkvH29iY6Ohpn5+p8mJiYyODBg2nXrh1t2rRh2LBhrF69miZNmlBQUMCpU6eMyabeDH0Mbp2lEoxWCXs3q59CCLuTkZHB8OHDCQ4O5uWXX2bq1Kls3brVrGvodDq8vb0BKCsro6ysDJ1OZ9Y1hg4dysqVKwH417/+xeOPP27W5+vC39+fyMhIAJo0aUJISAiHDx+2+H3sviVzJTk5OQQEBBhft2/fnry8PJ544gni4uJwd3fnueeeu+rn4+PjiY+PByA/P98yQemdoGlr9Xz/H/DrHBg9EwL7WOb6QjRG382ErsOgWzRUlMMPL0DICAgeAmXn4ad/Q+hoCBoI5/+EX16BnuPghn5QfAZ+fQ3CboXAvnDuFHg2u+4tS0pKmDRpEsuXL+eGG26gW7duREVF0adP9f+rgwYNoqio6LLPvv766wwfPtz4uqKigqioKPbs2cOjjz7KjTfeaNav/9JLL/H8889z/PhxUlNT+f777836/NWYGn+VAwcOkJqaanb8pmiQSaa0tBS9vroRptfrcXJyIjIykhUrVlz383FxccTFxQGmVxI1S6ebYMy/oGOUen0qF3z8VSISQtjUqlWriIiIIDQ0FFDfJ9OnT69xzsaNG026lpOTE2lpaZw+fZrbbruNHTt20KNHD5Njufnmm9E0jfnz57Nu3TqcnK7/HfHcc8/x8ssvX/McU+MHOHv2LBMnTuTNN9+kadOmJn/OVA0yyfj7+7Nu3Trj69zcXKtk4FrT6SDgQvIqLVZ/rQX0hqFTbRuXEPbm1lnVz52ca752cav52s2r5muPpjVfm9CKAUhNTTV2Ex05cgRvb28GDBhQ4xxzWwK+vr4MGTKEX375xawkYzAYyMvLo2XLljRp0gSAo0ePcueddzJ27FgyMjLo378/K1eu5MUXX6Rly5aUl5eTm5vLlClTuOWWW0hKSuLLL7+sVfxlZWVMnDiRyZMnM2HCBJPjNkeDTDIxMTE8++yzHD9+nMrKSjZv3syiRYvMukZCQgIJCQkUFhZaKcoLXNxh0F/Bt616XV6qktDFkwaEEPXGzc2N3NxcAJ599llKS0svO8eUlkB+fj4uLi74+vpSXFzMqlWrePrpp696fnR0NB9//DHt2rUDIC8vj8mTJ/Pdd9/x+OOPk5iYSExMDKmpqUyYMIEnnniC8ePH88gjj+Dr68vBgwfJyckhPDyc9PR0xo8fzxNPPMHkyZNrFb+maTz00EOEhIQwbdq0655fW3Y/8F9UVERQUBBPP/00y5cvJygoiMzMTGbNmkW/fv0YMGAA8+bNw8vLy6zrxsbGEh8fj4+Pj5Uiv0Cng879oUWger31c1g+DcpKrHtfIcQV3XPPPWzYsIGuXbsSFhZGv379+Pvf/272dfLy8hg6dCi9evWiT58+jBgxgnHjxl3x3MrKSvbs2UPz5s0BOHfuHBMmTGDevHmEhITw3HPP8eKLLwKQlpZGTEwMZWVltGjRAr1ez44dO+jZsydpaWnGJBMTEwNg9mSDKr/99huffPIJa9asITw8nPDwcH766adaXeta7L4l06RJE/bs2XPF9x544IH6DcYS2vVU3QJV05w1TSUiIUS9aN++vUUWa/fq1YvU1FSTzs3MzGTixIl4eHgA4Onpye+//258/+abbza+3rNnD8HBwWzfvp2QkBBADcx37NiRPXv20KVLF+M5J06coE2bNrWKf+DAgWj1sK5Pp9XHXexQVXfZ2rVr2b17t22CKMyDn2dD9OPgF2SbGIQQohZ69+5t0tb1dt9dZi311l12LaXn1OCmh2kDlkII0dA4bJKxC36dYcIc8G6hXm/6QK2xEUKIRkKSjK1VjceUFkNeBhQctG08QghhQXY/8G8t9TaF2VSuHjBhbvXro7ug8Iha+SwTA4QQDZTDtmTsYkzmUk7O6gGQmQhbPoPy87aNSQgh6sBhWzJ2b+hjUJSvpjpXVsC+39V6G53D/l0ghGiA5BvLXun01QU3D2yBla/DwetPFxRCCHvisC0ZuxuTuZZON8GY56CjqrfEqRxo6l/dtSaEEHbKYVsydjkmczU6HQREqZ+lxfDdv2D9QltHJYQQ1yV/Cjc0rh6XFNw8D+jA2dWmYQkhxJU4bEumQesQDs0vbNq2RQpuCiHsl7RkGqLVb0JxIfR7ADqEqdI0xoKblTIDTQhhNxz22yghIYG4uLiGMfB/seIzcCwbju2Cb5+FjETocrN6rzAPvngcjtuo4KcQQlzCYaswVzG1kqhdKS2GtG8h/Vs1JqN3gu4xahbalqUQ8zR4Nbd1lEKIRkyqMDdmrh7Q9264510IGaH2pNnxEyS+qrZ5druwgdvGeNiXZNtYhRAOTZJMQ+bVHIY8CpPeUGtoSs+plsznj6putKNZak2NEELYiCSZxqBFAIx9HmJfgpad4OwJ2PCuauH4dVbnHM2CrNXqmBBC1BOZXdaYtA+D2+dB9nr441M4uR9+/Ldq5ehd4MQ+6DygeiaaEEJYmcMmmQZVVsYcOj10HaqKaW5PgJSv4FAKoIOgAapLzckF9v6mEo7eydYRCyEaMZld1hBnl5mjuBC2fam2DqisAGc36Ngb9v0Go2dAYF9bRyiEaIBkdplQPHxgUBzc+baa4lx+XiUYVy/4s0AlnoJDUFFu60iFEI2Qw3aXORzfdjDqGcjLhM1L4Hg2bHgPtn8P506rBDTscVtHKYRoZKQl42j8u8OE12DkP9V+NaePqHGaghzI36taOrIbpxDCQqQl44h0OjXoH9gXdvwMycsgfzesmA6+7aGiDO58E1w8bB2pEKKBk5aMI3NygbBbYPJ7EDYe9M5wOhfO5sO2ZXD+rCq4KYQQteSwLZlGO4W5Nty8of8D0GO0qhiwewOkfQOZv6rdN2OeVt1sQghhJpnC3NinMNfG8T3w+//gSIZ67e0H/f8CN/RTXW1CCIdn6nenw7ZkxDW0CoJb/gMHt8LvH6sutF/ngIcv9BwLUZNsHaEQooGQJCOuTKdTEwM6RsHOlbDlMyg+rbrTTuyDG6dUbwEthBBXIUlGXJveCUJHQZfBkLoC0r+Hfb/D/j+gbSgMfxI8fWwdpRDCTsnsMmEaVw/VernnPegWrWadHTbAZ/8HqV/L2hohxBVJkhHm8W4BQ6fCpPngHwplxZD0MXz8EOxaI1OehRA1SJIRtdPyBhg/C8a9qGafnT8La96GFU9C7nZbRyeEsBOSZETddAhX20CHjQfP5mpSQMLz8MNLqvCmEMKhSZIRdefkrBZz3vMuRN2hjuWkwrK/w7p3VLVnIYRDktllwnJc3KDvPeDjDweTYd9mNf05ewNEjIfw8VIPTQgH47BJRsrKWFHXoepx6jD89DKcOXph47Rfoc/danaa7MgphEMwu7tsyJAhZGRkWCOWehUbG0t8fDw+PrLGw2qatYMhj0K34dCqC5w7BesXwuePqp06K8psHaEQwsrMTjK///47ERERTJs2jaKiImvEJBqTdj1h6GMwYQ70f1C1YM4chfXvwqdxkPYtlBbbOkohhJWYnWS2b9/OkCFDePPNNwkODuaTTz6xRlyisdHpoH0vaBUMg/4KLQJVy+b3JfDpIxfK1pyxdZRCCAszO8l07dqVX3/9lS+//BJnZ2ceeOABBg0axPbtsjZCXEeLQLhtttpSYNIb0DESfNqpNTbJy1Sy2fSB2s9GCNEo1HoK86RJk9i1axdPPvkkW7ZsISoqiqlTp3L69GlLxicaq7ISKCmCbkNh/CuqEGf5eTD8AEv/H6xdoCYOCCEaNIvsJ5OVlcVjjz3GmjVr8PPz49VXX+Uvf/mLJeKzOtlPxoYqK9RPvRPk7YScFDidp6Y+a5WADm64CSIngl+QTUMVQtRk6nenRRZjduvWjVWrVrF06VKKi4t5+OGH6devHykpKZa4vGis9E7VU5l3rYHs9WqSwN3vQPeR6r19v6tSNQkvqIKcjr3HnhANTp1aMkePHiUpKYk//viDpKQktm3bxp9//ml838nJib/97W+89tpruLu7WyRgS5OWjJ3QNFUZwLuFauHs3gBte4DhR8j8RXWvgZo4EDkRAvuATgpWCGErVtsZ84033jAmlpycHAA0TUOn0xESEsLAgQMZMGAAnTp1Ys6cOSxYsIB169aRmJhImzZtzP9NhGPQ6VSCAbUj55q3YPQMVa4mciLs+EmN1xzPhl9mQ7OOEHEbBA1SZW2EEHbJ7JaMXq/+evTw8KBPnz4MGDCAAQMG0L9/f3x9fS87/7PPPuPBBx/ktttu4/PPP7dM1BYkLRk7pGlwZIdqyeh0cPIA+LZTLZydK9Xamj9PqnObtFLlarpFg7ObTcMWwpGY+t1pdpKZP38+AwcOJDIyEmdn0/6CfOSRR/j66685efKkObeqF5Jk7FxpMSz9q5ruHP13dayiTHWnpX4Npy/MQPPwgV6xEDoa3LxsF68QDsJq3WVt2rShXbt2JicYgM6dO8vUZlE7rh4w7HHVYgEoOw9oquUSPERtA536FeTvhT8+VYkndLRKOJ6Xt6yFEPXL7JHTKVOm8OGHH5r1mcmTJ/Pee++ZeyshlIDe0Lyjer5lqdpCoKxYzT7r3B8mvq42T2vbE0rPqaTzaRxsWARnjtk0dCEcndktGVN611asWEF6ejovv/wyAB06dOCRRx4xP7paKCsrY8OGDZSXlxMTE1Mv9xT1qFNfcG9SvWVAZYVKNh3C1eNYNqSsgANbIONnVYizyyCImFidqIQQ9caklswXX3zBnj17TEowABkZGbzyyiu1Dqq4uJjs7OxafdZgMJCcnMzPP/9c6/sLO9a2B0RNUs9PH1YVnY/uqn6/dbCalXbn26o7DdT6my8fh59fqXmuEMLqTEoy99xzD127dsXX1xedTsfKlSv54IMPSElJoazs8nLtf/75J25u5s/0OXPmDOPHj6d169bMmTPHeHzZsmV06tSJoKAgFi9efM1rREZGcscdd5h9b9EAVZSpac9N/C5/r3lHNVHgnnehxxhwclWtm2+ehu/+pXbulIWdQlidSd1lr7/+OqmpqSQnJ5OVlcVvv/3G5s2bAXBxcaF79+5ERkYSERFB06ZN+fzzzwkICDA7GL1ez9SpUxk3bhxJSUkAFBUVMX36dJKSknByciI8PJzY2FgqKiq46667jJ+dOXMmI0aMMPueogFrEQi3zqp+ve4daNdLdY9VadoaBsWpbaG3J6gutCM71MOvM4SNhxv6yVobIazEpP+zpk2bZnyu1+t54IEHjGVjUlJSMBgMpKWl1fjMf//7X7OD8fb2Jjo6miVLlhiPJSYmMnjwYNq1awfAsGHDWL16NXfddRfr1q0z+x6ikSothoJDauvnK/H0hZumQMQENU6T/r2akbZqHni3hJ7jIGQ4uHnXb9xCNHJm//n21FNP0bt3b26//XbjsYqKCjIzM0lLS+PkyZNERUUxaNCga1zFdDk5OTVaRe3btycvL++q56enpzNnzhwMBgPz58+vkSCrxMfHEx8fD0B+vpSVbxRcPVQ15yp5mXDyIITG1Cw/4+alEk3PsbBrnWrdnM5V+9ps+0Lt4tlrHDSV6hRCWMJ1k8yXX35JVFQUQUGqCu6rr7562TlOTk707NmTnj17WjzA0tJSY5UBUC0pJ6er7w8fFhbG0qVLr3nNuLg44uLiALWgSDQS+ov+u8heBzlp0HUouFyhbp6zm0pA3UfAoRTVsjm8XZWu2fETBPaFsFuhTTdVdUAIUSvXTTJ33303er2ezMxMgoODeeaZZ4zjL126dLF6gP7+/jW6xXJzc7nxxhutfl/RwN38/9TOmy7uappz9jo120x/yR8oOr1ahxPQG07sh+3fw+6NsD9JPVp1Ucnmhn6Xf1YIcV3XTTKvv/46ycnJeHp6AjBnzhx0F/6y8/b2Jjw83Jh0IiMj6d69e42WR13FxMTw7LPPcvz4cSorK9m8eTOLFi2q83UTEhJISEigsLDQAlEKu6PTgVdz9fzgVrUJmnsT1UK5mpadYNgTcOMUNUEg4xc4vhtWvg7efqqLLWSElK0Rwgxm1y5bt26dccA/JSWF7OxsKisrjYnH3d2dnj17EhUVxTvvvGNWMEVFRURERFBUVERJSQl+fn68//77HDx40Liw8/XXX+e2224z67rXIrXLHICmQV4G+Ieq5HNiPzRrD04u1/5c2XnVAtr+fXWNNBf3C+M2sWrmmhAOymoFMi917tw50tPTaySezMxMysvLqaioqMul64UkGQdTVgyf/hU6RkD0P0z7jFZ5YdzmO7VxGqhutk43qq601l1l3EY4nHpLMldSWlrKjh07iIyMtPSlLaaqu2zt2rXs3r3b1uGI+nQoRU1bbt7xQsHNyuoyNddzYh+kJ8CejVBZro61CoawW2TcRjgUiyWZL774gsjISIKDgy0WnD2RloyD++1DVQngjjdNTzSgdvHccWHc5nyROubtp6Y/dxsu4zai0bNYktHr9eh0OuMgf1RUFJGRkURGRhISEmIci2moJMk4uLxMyNupdt+E6oKbpio7D9lr1RTowiPqmIuHWtjZc5yM24hGy2JJZv78+aSmppKSksKuXbtqDPJ7eHgQFhZWI/GEhoZecx2LvZDuMnGZU7nw479h+DS1PsYcWiUcTFaTBGqM29ykutLMvZ4Qds4qYzLnzp0jLS2N5ORk4yD/zp07KS8vNyYeNzc34+yyhQsX1v43qCfSkhFGBYdg4/swYhp4Nqv9dWTcRjiAehv4P3/+POnp6TUSz44dO2R2mWj41r2jthYIHly7z/9ZoKoHZCRWj9s0aaWmP4cMN28MSAg7Y7Htly8tK3MpNzc3+vbtS9++1YvcysrKMBgMZoQrhJ0pK1bdZ1cruGkKr+Zw470QeTvsWqvqpBUeUZMNtn4BoaPUAs+qRaNCNEImDfxfWlamanV/fZSVsTZpyYir0irVQk69ExzJgIKDKjHoalnRorJCVR9I+w6O7lTH9M7Q5Wa13qaF+dtjCGErFmvJ2LqsjLVIWRlxXTo9VE2e3L0BctOgazS4mL8hH6CSVaeb1OPoLrW4c38S7FqjHh0iIHy82hOngc/aFKKKXZWVsQVpyQiTaJoquOnVXLVIdq2B4KF13+ysME91o2WthvLz6liLQJVsOg+UzdSE3ZKyMiaSJCPMtn8L/PIKjJ5x7YKb5igpUgs7DT9C8Wl1zKuFWmvTfaQs7hR2R8rKmEiSjKiVvExoE6K6tfL3qYKbzq51v25FGWSvV11pp3LUMRcPte9Nz1ho4lf3ewhhATZNMg2BLMYUFlFWDJ/GQcdI0wtumkKrhEOpkP5tzcWdnQeorjS/zpa7lxC1YNUkc+bMGRYuXIjBYKCkpISAgABGjhzJyJEjG8Sg/8WkJSPqLCdNdW017wBlJWr8xtWCa2Dy96qWzZ5NKvkAtO0J4beq5Fbb2W5C1IHVkkxGRgbR0dHk5+dz8Ud1Oh3BwcG8/fbbjBgxwvyIbUSSjLCoTR+ogpt3vmX5xZZF+Wp76MxfVQsKVDdd2K3QZbBluuuEMJGp351m/wk0bdo0jh8/zr333ktSUhIHDhxg/fr1PP744xw6dIgxY8Y0iFllQlhF0EC1lqYqwVRacPJLEz/o/xeY8gH0e0C1nk7lqsoEn8ZB8nIoOWO5+wlhAWa3ZLy9vYmKimL9+vWXvZeXl8d9993H2rVrWbt2LYMGDbJYoNYiLRlhNady4ceXIHoa+IdY/voV5bD3NzVuc2K/OubsBt2iVemaulQrEOI6rNaScXd3p1+/fld8z9/fn2+//RZ/f39mz55t7qWFaFw0DXzagk8b61zfyVnVVbt9PsS+BB0i1VqbHT/BZ3+DxNfgaJZ17i2Eicxe6TV48GD27dt31fe9vLyYMGECH330UZ0CszZZ8S+srnkH9eUPKuGsXQDtekLXoZa9j04H7cPU4+RBtd1A9nrY97t6tOkGYeMhsI9UgBb17rotmSVLlmAwGKisVLNann/+eX766Se2b99+1c+4udWy7EY9io2NJT4+Hh8fH1uHIhxBeQmcOQrnTlv3Pi0CYOhUuDceIiaCq5dqzSS+Cl88plo5ZSXWjUGIi5i8M6a7uzu9evUiKiqKAwcOsGXLFubOncuUKVNqTFs+d+4c4eHh9OrVixUrVlj9F6grGZMR9aZGwc0dahylxxjrti7KimHnatW6KTqujrl4qAkK3YZD62CpkyZqxWJTmBcuXEhqairJyclkZGRQVlZW/WGdjoCAAGJjYwkMDKSgoIDPP/8cV1dXVq9ejb+//Q88SpIRNrFhEeSkwB1v177gpjkqK1QxzvTv4diu6uPNOqiJAsFDwNPX+nGIRsMq62RKS0sxGAwkJycbNykzGAyUlpaqi134iygiIoI+ffrw7rvv1jL8+iNJRtiEpqkaZZ7NVALIWg1dh9VPQcxTOap1k70Wii+MSeqdIKC3Sjgdo2TsRlxXvZWVKS8vx2AwkJKSYkw+BoOB8+fPS4FMIUxhjYKbpqgoh0PJsHOV+llVTcCzmZqc0DUamrWrv3hEg2LT2mUVFRVkZGTQq1cvS1/a4iTJCLtwNAtad71QcHOP6sZyrscJNH8WQPY6lXAKj1QfbxOitoru3F+2ixY1SIFME0mSEXalquBmhwgYPq3+769pKuFlrYI9v6lZcQAu7mp/m5Dh1clQODSL7Yx5JXPnzuV///sfBQUFtG/fnrCwMHr37k1UVBTh4eG4utp/DSVZJyPskosHjHhSbY4GFwpuVoKrZ/3cX6dT1Qn8Q2DAw6qiQNZqtV101ir18G2nxm66DlVda0Jcg9ktmSVLlvDggw/i7e1N27Ztyc7OVhe68JeNs7Mz3bt3p0+fPsTHx1s+YguTloywa5s+gP1/wF1v27a76tRhlWB2ra3eVE2nrzlZQHbxdChW6y7r27cvBw4cwGAw0Lp1a/R6Pffffz8dO3Zk8eLFHD58GL1ej6ZpMvAvRF0d3aVaEeHj1euKctt+mVeUq6nXWavh4LbqAqAevtB1iEo4zTrYLj5Rb6xWu2zPnj3ceuuttG7d2nisU6dOvPTSSxgMBkaOHMnEiRNJTk4299JCiEu16VqdYE7lwGf/B3k7bRePk7OaATfqWZjyoaoG7dtetW7SvoUvpsLXT8POlVBabLs4hd0w+0+i8vJyWrVqZXyt1+uNCzR9fX1ZunQp3bp145ZbbiE8PNxykQoh1P4x9lJd2dNXJcCwW9UCz6zVsGejen5sl+rq6zxAtW78u8tkAQdldkvG39+f48ePG183bdqUgoIC4+sWLVowevToBrEQU4gGpVkHGPei+nLXNFjzFmStsXVUKnm06QZDHoX7l6jaaf7dVUXoXWvgu5nw+aOQ8pWaKi0citktmT59+pCZmWl83bVrV1JSUmqc4+/vz3fffVf36IQQV1ZeomqR2dsmZS7uquXSLRpOH1ZJcNcatfbmj09gy1K1ZXS34RAQBU4uto5YWJnZLZkxY8awdetWY2tmzJgxbNmyhY0bNwJqIeaqVatwd3e3bKRCiGouHnDLy2pzMoDDBlWXzJI7cdaVbzu4aYrayXPMv6DTTarVc3Cbqgr98cOweTEUHLJ1pMKKarUY8+zZs+h0Ory8vDh9+jTdunXjzJkzDBkyhH379rF7926mTJnCkiVLrBCyZcnsMtEobIyHQylwx1v1U3Czts6dht3rVWWBUznVx1sFQ0i0WvDp5mW7+ITJ6nXF//bt25kyZQoGgwGAYcOG8eWXX9KiRYu6XtrqJMmIRkHTVLFLT1/Vmtm5UnVZ2Wt3lKbB8d1q7c3ujarSAYCzK9wwQCUc/1CZLGDHrLri/1K9evUiPT2d3Nxc3N3dadmypSUuK4QwlU5XXar/UDJseE9VDajPgpvm0OnUXjatg6H/g2oHz6xVcCRDVYfOXgutuqiJDtKyadActnZZVVmZtWvXsnv3bluHI4RlHctWX9I6nWoxNOto391oVQrz1GSBnSvV2psOEWo8R7YesDtW7S47c+YMCxcuxGAwUFJSQkBAACNHjmTkyJE1dslsCKS7TDRqti64WVtnjsJXT6nZcz3HwsBHbB2RuITVussyMjKIjo4mPz+fi/PTW2+9RXBwMG+//TYjRoww97JCCGtw8YCRT1UXsiwrhspK+++CatoGRj0D3z8Phh/VTLUeY2wdlagFs5sd06ZN4/jvA9NHAAAb4UlEQVTx49x7770kJSVx4MAB1q9fz+OPP86hQ4cYM2YM77zzjjViFULURrueqlIAwB+fwrInGkbJF//uaoEnqOoBOWm2jUfUitndZd7e3kRFRbF+/frL3svLy+O+++5j7dq1rF27lkGDBlksUGuR7jLhUI5lQ17mRQU3y+x3BlqVpE8g9Su13cGE16QAp52wWoFMd3d3+vXrd8X3/P39+fbbb/H392f27NnmXloIYW2tg2sW3Fz6f2pGlz27cbJayFl6Dn76DxTbWZUDcU1mJ5nBgwezb9++q77v5eXFhAkT2Lx5c50CE0JYmx5aBKjxDnum00P036HlDXDmGCS+plpgokG4bpJZsmQJBoOByspKAJ5//nl++ukntm/fftXPuLk1gKmSQji6Zu1g7PPVBTdXv6EqKdsjF3cYPVOt/cnLUOuAHHP1RYNz3STz4IMPEh4eTpMmTejXrx/vv/8+Q4YMYfjw4Xz00UfG5FPl3LlzfPvttwwfPtxqQQshLKy8RFVILimydSRX590CRs9QVQGyVkPaN7aOSJjgugP/CxcuJDU1leTkZDIyMox7x4DacjkgIIDY2FgCAwMpKCjg888/x9XVldWrV+Pvbyf7XlyDDPwLcYGmgVapFj4eNkD+XlWA094WQu77XXWZoYNRT6vxGlHvrLIYs7S0FIPBQHJyMsnJyaSkpGAwGCgtLVUXu1BnKCIigj59+jSIPWUkyQhxBZveVwU3J71pn5UCkperbQOc3WD8bPC7wdYROZx6K5BZXl6OwWAgJSXFmHwMBgPnz5+nosKOyo5fhSQZIa5A09Rqew8fVXAzMxFCRtjPdOeqTduy14FXC5g4V43XiHpTbwUynZ2diYiIICIigoceeghQe8pkZNj5tEghxNXpdCrBgGrRbIwH75b2U3BTp1MLNc8cg6M74ZfZcMt/7LPV5eBqXWissLCQtWvXsmrVKrKzs2u85+TkRK9eveocnBDCDgT2US2FgD7q9bFsKCuxbUygWlWjnoEmrVQR0LVvqTElYVdqlWReffVV/P39GT58ODExMYSEhODv78/MmTMpKrLj2SlCiNqpquhcVgw/vgzr7KR0lIePqtLs4gF7N8PWL2wdkbiE2Unm448/ZsaMGXh6ejJlyhT+/ve/c+edd6LX65k9ezYRERHs37+/TkEVFxdf1joy1bZt25g6dSrjx49nxYoVdYpDCHEJFw/Veuh9l3pdVgznz9o2puYdYeQ/1aLN5GWQfXnJK2FDmpkiIiK0tm3baidOnKhxvKKiQvvggw80b29vrXPnztrZs2fNvbRWWFio3XrrrVqTJk20hx56yHj8yy+/1AIDA7XOnTtrH3744TWvUV5ermmapp08eVK77777rnvPqKgos+MUQlywYZGmffSgpp0/Z+tING17gqYtvFXTFt2uaXk7bR1No2fqd6fZLZmsrCxuu+22y7ZW1uv1PPTQQ3z77bfs37+fefPmmZ3w9Ho9U6dOZf78+cZjRUVFTJ8+nU2bNrFp0yZmzJhBfn4+R48eZciQIcbHypUrATUeBDB37lyeeOIJs2MQQpih2zAIuxVcPdRrW5Z76TEWQkepGH6ZDUXHbReLMDI7yXh6euLu7n7V96Ojo4mJieGrr74yOxhvb2+io6Nxdq6e9JaYmMjgwYNp164dbdq0YdiwYaxevZo2bdqwbt0646NqD5vy8nL++c9/EhsbS2RkpNkxCCHM4BcEYbeo5wU5aoM0WxXc1OlgwMPQPgyKC1UxzdJztolFGJmdZHr06MHq1deubxQWFlbncZkqOTk5BAQEGF+3b9+evLy8q54/Y8YMfv/9d+Lj44mPj7/iOfHx8fTu3ZvevXuTn59vkTiFcHh6Pfh1tm3BTSdnNT7j2x4KDsGq+Wqdj7AZs5PMAw88QHp6Oq+99tpVzzly5EidgrpYaWlpjS2d9Xq9sUvsSubMmcOmTZtYsmQJcXFxVzwnLi6Obdu2sW3bNvz8/CwWqxAOzbedmulVVXBz1RuQubL+43DzhjEzwa0JHNwGv39U/zEIo1olmREjRjBjxgzuvvtu0tJq7la3Zs0ali1bRt++llm05e/vz+HDh42vc3Nz6dCh7psWJSQkEBcXR2FhYZ2vJYS4RPl5KD4NZTbqrvLxV3XN9M6w/XtVsUDYRK3KypSUlHD33Xfz3XffodPpaNq0KYGBgZw6dYqcnBz0ej1r1qyp9c6YS5YsYdOmTXzwwQccO3aMyMhIUlNTqayspH///hgMBry8LLNHuZSVEcJKNA3Q1NTi3O2Qv0dNEqjPgptZq2HtAnXPsS9Ae1kkbilW2xlz586duLm58c033/D9998zduxYNE0jPT2d3Nxc+vbtS2JiYq0STFFREUFBQTz99NMsX76coKAgMjMzmTVrFv369WPAgAHMmzfPYglGCGFFOp1KMAAHtsDOVVBZXr8xdItWO4FWVqjKzacPX/8zwqLMbsno9XruvfdePv744xrHz549i5ubGy4udlJAz0TSkhGinpScAfem6gs/42cIGan2hrG2qgRzYAv4tIUJr4F7E+vft5GzWkumWbNmVxwT8fb2blAJRsZkhKhn7k3Vz0MpsOkDyE279vmWoneC4f+AFoFQeAQS50BFPbeoHJjZSWbQoEFkZWVZI5Z6FRsbS3x8PD4+PrYORQjHEtgHbn+9uqLzsWxVnsaaXDzUjDPPZnDEABsXyfbN9cTsJDNz5kx+/PFHtm7dao14hBCOwC9I/SwrgZ9ehnULrX9Pbz+1fbOTK+xcqWadCaszO8msWLGCYcOGMXz4cD76qOHOP5fuMiHsgIs7jHoW+lwouFlq5YKbrbrAsMfV881L4ID8sWxttRr41+l0aJqGTqejVatWjB07lhtvvJHevXvTs2fPGmVh7J0M/AthRzbGw/4/4K7/VtdDs4ZtX8LWz1WSu+1VNV4jzGK17ZfXrVtHSkqK8ZGdnU1lZSU6nQ4AV1dXevbsSe/evVm4sB6awHUkSUYIO5K/D/IyoFesel1RZp0tnzUNVr8BuzeobrSJc1WlAmEyqyWZS507d4709PQaiSczM5Py8nIqKuy/ZpAkGSHsVMEhSHgBRkyHtj0sf/3yUvj+OTi2C1p3hVv+Dc6yfbOpTP3urHO/lqenJ/369aNfv37GY6WlpezYsaOul7aqhIQEEhISZExGCHvl5KLGUJrVvYzUFTm7qvGgr/6pEs3a/8LwaWoRqbAYkwf+X3zxRfz9/XF1daVLly785z//oazsyntHuLq62n2ZfZnCLISd8/FXs8E8fFT31sp5kPmrZe/h6aumNru4w56NamdNYVEmJZnFixfz73//m2PHjlFeXs7evXt54YUXuP32260dnxBCqIKb58+qKc+W1iIQRjypSuBs/Rz2bLL8PRyYSUnmvffew9XVlaVLl5Kbm8uqVauIjIzkhx9+YPny5daOUQjh6FzcYezz0Gucep27HVJWWG7lfkBv6PeAer7mbbVAVFiESUlm79693H777dx99920bduWYcOGsXLlSpo1a3ZZDTMhhLCKiwtuHtwGWWtAs+Dkol6xEDICKkrh51fgrGxoaAkmJZlTp04RFBRU45ivry9jx44lJSXFKoFZmyzGFKIBG/CgKnTp7KYKYG5PULPF6kKng0F/hXY91V44P82yfrkbB2DywP/Fu1NW6dixIydPnrRoQPVFBv6FaOCqKinnpMJvH0Juet2v6eQMI59S1ZpPHlC7e8r2zXVidlmZizk7O191hpkQQtSLgN5w+3xVeBPgaJYqT1Nb7k0ubN/srbYH+OMTy8TpoExOMi+//DI9e/bk4YcfJj4+nrS0NMrLpVy2EMIO+N2gfpaVqG6u9XWsNuLbDmKeVtsEpH2rCmqKWjFpxf+IESNITU2loKBAfeiSxUr/+Mc/CA8PJyIigpCQkCt2rdkrWfEvRCNzNEvtXePbVrVoKsuq97IxV+avKmHpnSD2JetUHmigrFJWZt++fWzbts34SE1NNQ6cVyUeNzc3evToQUREBIsWLapl+PVHkowQjdiGRbA/Ce5eWPuCm5sXQ/r34NYEJs5Ri0RF/dUuy87OrpF40tLSOHv2LDqdzq5rl1WVlVm7di27d++2dThCCGs4sV8V3Ox5YX1Nean5Wz5XVsAvs9W0ad92alabm7flY21g6i3JXErTNHbu3ElycjJTpkyx5KWtQloyQjiIkwch4Xm1ur9dT/M+W1oM3zwDBQdVhYDAPmrjNb/O4NXCIeud1VuBzEvpdDq6d+9O9+7dLX1pIYSoPWdXaNMNmnc0/7OuF7Zv/vppNbX55IHq9zx8qhOOX2f13Ku5QyaeK2k4u4sJIURd+PirqsugCm6umqcG8kNHmfb5Jq3grgWqpE3+HsjfC8f3QHEhHEpWjyoevtUJx68ztLqQeByQJBkhhOMpPw+l58yvEuDmDZ37qweoZFV0rDrh5O9Vj+LTlycez2YXtXYuavE0cpJkhBCOx8UdxjwHXBiSzk1Xe8qET1Cr/k2l00HTNurReYA6pmlw5mh1wqlq9Zw7pSYPHLxoHKNG4qka42lciUeSjBDCMel0wIVxk0Mp6ss/bLxlruvjrx5BA9WxGolnDxzfCyeulXgu6mbz66yONVCSZIQQov9fIGqSmhxQUQ47flRjNZbajvmKiacSzhyr2c1mTDxb1aOKV3O1RfTIpxrchAKHTTKy/bIQooaqtS+5abD5f6pIZlU9NGvQ6asTT5dB6phWCYVHa3az5e+FPwtUQmpgCQassE6moZF1MkKIy5zYDy07qedHs9S0Z1dP28SiVUJhHpz/E1oH2yaGKzD1u7PhFBkTQoj6UpVgykrgZwsU3KwLnV5VGrCjBGMOh+0uE0KI66qahVbVlVZ6To3ZeNSy4KYDkpaMEEJcS+tgVdEZIOkTWPZE3farcTDSkhFCCFOFxkDzDtUVncvPW24GWiMlLRkhhDBVi0DoMUY9P3kAPnlElZkRVyVJRgghasPFQ9U+axGoXjv2RN2rkiQjhBC10bQ1xDylJgFoGvw6F3b8ZOuo7I4kGSGEqKvy81BRqjY4EzXIwL8QQtSVizuMnkmNgpt5OyFyIji52DQ0W3PYlkxCQgJxcXFSVkYIYRk6nVo4CZCTCns2yjgNUlZGysoIIazj/J/g5qUWbxp+gNDR4NJ4pjtLWRkhhLAlNy/1Mzcdfl8Chx1zqrMkGSGEsKaAKLjjreqKzkcyVCvHQUiSEUIIa2sRoH6WlcAvr8KGd20bTz2S2WVCCFFfXNxh3POXFNwsAw8f28ZlRdKSEUKI+tSqi9qoDCDpY/jyCZVsGilpyQghhK30GAPNA6o3RCs736hmoIG0ZIQQwnaad4Qeo9Xzkwfg04cbXcFNSTJCCGEPXDygXS9oGaheN5IljJJkhBDCHjRtDSP/Ce4XCm4mvqYWcTZwkmSEEMLelJeCVmkshdaQycC/EELYGxc3GPVs9eucVFVwM2pSgyu4KS0ZIYSwRzqdegAcNsDezQ1ynMYuk0xxcTHZ2dm1+mxWVhZTp05l1KhRZGZmWjgyIYSwgZvug4lzwNlVFdxM/VpVD2gA7CrJnDlzhvHjx9O6dWvmzJljPL5s2TI6depEUFAQixcvvuY1unXrxoIFC3jggQdITU21dshCCFE/qtbS5KarRZxHdtg2HhPZ1ZiMXq9n6tSpjBs3jqSkJACKioqYPn06SUlJODk5ER4eTmxsLBUVFdx1113Gz86cOZMRI0YAMH36dLZu3cqyZcts8nsIIYTVBETBnQugeQf1+kiGqo1WVarGzthVS8bb25vo6GicnatzX2JiIoMHD6Zdu3a0adOGYcOGsXr1atq0acO6deuMj6oEAzBv3jzef/99XnvtNVv8GkIIYV1VCabsPCS+Cuvtt+CmXbVkriQnJ4eAgADj6/bt25OXl3fV83/44Qd+/PFHTpw4wfTp0694Tnx8PPHx8QDk5+dbNmAhhKgvLm4w7gVwvbB3zfk/VcFNT1/bxnURu08ypaWl6PXVDS69Xo+Tk9NVzx83bhzjxo275jXj4uKIi4sD1O5uQgjRYPkFVT9P+gT2/w73vFs9hmNjdtVddiX+/v4cPnzY+Do3N5cOHTrYMCIhhLBTvcZCn3suKrhp+xlodp9kYmJiSExM5Pjx4xw9epTNmzczcuTIOl83ISGBuLg4CgsLLRClEELYgWYdIDRGPT+xHz55WM1GsyG76i4rKioiIiKCoqIiSkpKWLduHe+//z6zZs2iX79+gBrU9/LyqvO9YmNjiY2Nle4yIUTj5OYFHSKgZSf1WtOqF3fWI52mNcAlpBbUu3dvtm3bZuswhBDCeqoKbrbtAb2uPWZtKlO/O+2qJVOfEhISSEhIkO4yIUTjV16qfkpLpv5JS0YI4TCquswOpcLxbOh9Z60vZep3p90P/AshhLCQqpbMkR1qK4F64LDdZUII4bBumlJvFZ0dNsnImIwQwqHV0/iMw3aXxcbGEh8fj4+Pj61DEUKIRsthk4wQQgjrkyQjhBDCamRMRsZkhBDCahy2JSNjMkIIYX0Om2SEEEJYnyQZIYQQVuPwZWVatmxJYGBgrT5bWFjYqLvb7P33s2V89XFva93DUte1xHVqe438/Hz8/PzqdG9Re4WFhZw6dYoTJ05c/2RN1Nojjzxi6xCsyt5/P1vGVx/3ttY9LHVdS1yntteIioqq871F7Znz7026y+ogNjbW1iFYlb3/fraMrz7uba17WOq6lriOvf83Jq7MnH9vDt9dJoRoeKR6esMhLRkhRIMTFxdn6xCEiaQlI4QQwmqkJSOEEMJqJMkIIYSwGkkyQgghrMZhC2QKIRqHsrIyNmzYQHl5OTExMbYOR1xCWjJCCLtVXFxMdnb2Nc8xGAwkJyfz888/11NUwhySZIQQdufMmTOMHz+e1q1bM2fOHOPxZcuW0alTJ4KCgli8eDEAkZGR3HHHHbYKVVyHdJcJIeyOXq9n6tSpjBs3jqSkJACKioqYPn06SUlJODk5ER4eTmxsrNQws3PSkhFC2B1vb2+io6Nxdq7+OzgxMZHBgwfTrl072rRpw7Bhw1i9erUNoxSmkCQjhGgQcnJyCAgIML5u3749eXl5pKenM3PmTNasWcP8+fNtGKG4EukuE0I0CKWlpej11X8X6/V6nJycCAsLY+nSpTaMTFyLtGSEEA2Cv78/hw8fNr7Ozc2lQ4cONoxImEKSjBCiQYiJiSExMZHjx49z9OhRNm/ezMiRI20dlrgO6S4TQtidoqIiIiIiKCoqoqSkhHXr1vH+++8za9Ys+vXrB8C8efPw8vKycaTieqQKsxBCCKuR7jIhhBBWI0lGCCGE1UiSEUIIYTWSZIQQQliNJBkhhBBWI0lGCCGE1UiSEUIIYTWSZIQQQliNJBkhhBBWI0lGiEv4+fmh0+lMfrz33nu2Dtlk8+fPR6fT8dlnn9k6FOEgpHaZEBf5888/efTRR2scKy8vZ9asWbi6uvLss89e9plRo0bVV3h1lpKSAkBUVJSNIxGOQmqXCXEd6enphIeHExUVxbZt22wdTp2EhISQm5tLYWFhjb1ZhLAW+a9MiOuoSixX++t/1apV6HQ6nn/++RrH//jjD2OX2oEDB2q8N3nyZPR6Pbt27apx/KuvvmL06NG0bNkSV1dXunTpwiuvvEJFRcUV723q+U8//TQ6nY6srCzOnj2Lk5OTMbZPP/3UeN7GjRuZMGECnTt3xt3dnVatWtG3b19mzJhh0j8rIS4l3WVCXEdycjIAvXv3vuL7zZs3B1R5+ou99tprxucFBQUEBgYCcOTIEZYvX05sbCxdu3YFoKKignvvvZcvvviCoKAgJk2ahJubGz///DMzZ85k165dfPTRR8brmXt+VFQU999/Px999BH9+/dnxIgRxvcGDx4MwCuvvMLMmTPp2LEjMTExtGzZkmPHjrFt2zZ++eUXXnnlldr+IxSOTBNCXFPfvn01QEtOTr7i+/v27dMA7aGHHjIey87O1vR6vTZ+/HgN0FatWmV8b8aMGRqgbdy40Xjs0Ucf1QDtmWee0crKyozHS0tLtf79+2uAlpGRUevzNU3TFi1apAHaokWLLvsdjh49qjk5OWkDBw7Uzp8/f9n7+fn51/pHJMRVSXeZENdQXl7O9u3bcXV1pUePHlc8p1mzZkDNlszcuXNp0qQJTz31FKBaMgAlJSXEx8dz4403MnDgQEB1qy1cuJBbb72V2bNn4+xc3cHg4uLC/fffbzyvNudXqRr0j4yMvOx3yMrKoqKiguDgYFxdXS97v2XLllf9ZyTEtUh3mRDXkJGRQUlJCVFRUVf88gXw8fFBr9cbk8yxY8f4+OOPeeKJJ4x70FclmU8++YQTJ07w7rvvGj+/YMECNE3D09OTF1988bLr79ixAwDtwhwdc8+vkpKSgouLCz179rzsM6Ghofj4+LB48WLy8/OZPHkyI0eONCZQIWrNtg0pIezbBx98oAFaXFzcNc9r1qyZNnDgQE3TNO3ZZ5/VXF1dtcOHD2tnzpzRAG327NmapmlaaGio1rlzZ62iosL4WT8/Pw247mPlypW1Ol/TNK2srExzd3fXwsLCrvo7GAwG7fbbb9c8PT01QHNyctJGjRp11W5CIUwhLRkhruF6g/5VmjVrRlFREWfPnuW9997jnnvuoW3btgA4OTlRUFDAr7/+SkZGBv/973+N04dLSkrIz8/n5ptvZv369deNx9zzq2RmZlJSUnLFrrIqPXr0YPny5ZSWlrJhwwbi4+NZvnw5W7du5fDhw7i5uZl8PyGqyJiMENdQlWSut3ixKsnEx8dz+vRpnnzySeN7TZs2paCggDfffJMWLVrwl7/8xfiedqFL68SJEybFY+75VdLS0gCIiIi47rmurq4MHz6cZcuWMXDgQE6ePMmxY8fMup8QVSTJCHEVpgz6V2nWrBmnTp3izTffZMyYMYSGhhrf8/Hx4Y8//uCXX37hb3/7G56ensb3PDw86NWrF5mZmXz99ddXvPamTZuM617MPb/KyZMnAZXwLpWamsrevXsvO75nzx527NhBx44dad++/TV/fyGuRrrLhLgKUwb9q1QlmVOnTvHJJ5/UeM/X15e0tDTc3d157LHHLvvs3LlzGTt2LBMnTmT48OH06tWLyspKDh8+THJyMmVlZRw6dKjW50N1S2zmzJns2LEDLy8vQkNDmTRpEm+//TYfffQRffv2JTQ0lFatWrF//36+//57ABYvXizVAUTt2XpQSAh79eGHH5o06K9pmhYXF6cBWp8+fS57b/DgwRqgPfLII1f9/JYtW7SJEydqrVu31pydnbUWLVpoPXr00OLi4mqssant+ZqmaQsWLNCCg4M1Nzc3DdBmzJihaZqmffPNN9q9996rBQcHa02aNNFcXFy0gIAA7aGHHtKys7Ov+7sLcS1Su0wIIYTVSBtYCCGE1UiSEUIIYTWSZIQQQliNJBkhhBBWI0lGCCGE1UiSEUIIYTWSZIQQQliNJBkhhBBWI0lGCCGE1UiSEUIIYTX/H0ud0ZRl1TTuAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.cm as cm\n", "cmap = cm.get_cmap('rainbow_r',6)\n", "\n", "fig = plt.figure(figsize=(6, 4),facecolor='white')\n", "ax = fig.add_subplot(1, 1, 1)\n", "plotPowerlaw(list(data_dict.values()), ax,cmap(1), \n", " '$Tweets$')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 5. 清洗tweets文本" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-04-01T03:46:46.498846Z", "start_time": "2019-04-01T03:46:46.496236Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "tweet = '''RT @AnonKitsu: ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! \n", " #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE @chengjun @mili http://computational-communication.com \n", " http://ccc.nju.edu.cn RT !!HELP!!!!'''" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:02:19.500334Z", "start_time": "2019-06-08T08:02:19.259536Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import re\n", "\n", "import twitter_text\n", "# https://github.com/dryan/twitter-text-py/issues/21\n", "#Macintosh HD ▸ 用户 ▸ datalab ▸ 应用程序 ▸ anaconda ▸ lib ▸ python3.5 ▸ site-packages" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 安装twitter_text\n", "\n", "[twitter-text-py](https://github.com/dryan/twitter-text-py/issues/21) could not be used for python 3\n", "\n", "\n", "> ### pip install twitter-text\n", "\n", "Glyph debug the problem, and make [a new repo of twitter-text-py3](https://github.com/glyph/twitter-text-py).\n", "\n", "> ## pip install twitter-text\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 无法正常安装的同学\n", "## 可以在spyder中打开terminal安装\n", "\n", "pip install twitter-text" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:04:37.675542Z", "start_time": "2019-06-08T08:04:37.668241Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'AnonKitsu: @who'" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "\n", "tweet = '''RT @AnonKitsu: @who ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! \n", " #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE @chengjun @mili http://computational-communication.com \n", " http://ccc.nju.edu.cn RT !!HELP!!!!'''\n", "\n", "rt_patterns = re.compile(r\"(RT|via)((?:\\b\\W*@\\w+)+)\", \\\n", " re.IGNORECASE)\n", "rt_user_name = rt_patterns.findall(tweet)[0][1].strip(' @')#.split(':')[0]\n", "rt_user_name" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2019-04-01T03:59:45.727956Z", "start_time": "2019-04-01T03:59:45.720369Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'AnonKitsu'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "\n", "tweet = '''RT @AnonKitsu: @who ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! \n", " #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE @chengjun @mili http://computational-communication.com \n", " http://ccc.nju.edu.cn RT !!HELP!!!!'''\n", "\n", "rt_patterns = re.compile(r\"(RT|via)((?:\\b\\W*@\\w+)+)\", \\\n", " re.IGNORECASE)\n", "rt_user_name = rt_patterns.findall(tweet)[0][1].strip(' @').split(':')[0]\n", "rt_user_name" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:05:00.196880Z", "start_time": "2019-06-08T08:05:00.188010Z" }, "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "None\n" ] } ], "source": [ "import re\n", "\n", "tweet = '''@chengjun:@who ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! \n", " #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE @chengjun @mili http://computational-communication.com \n", " http://ccc.nju.edu.cn RT !!HELP!!!!'''\n", "\n", "rt_patterns = re.compile(r\"(RT|via)((?:\\b\\W*@\\w+)+)\", re.IGNORECASE)\n", "rt_user_name = rt_patterns.findall(tweet)\n", "print(rt_user_name)\n", "\n", "if rt_user_name:\n", " print('it exits.')\n", "else:\n", " print('None')" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:05:27.804540Z", "start_time": "2019-06-08T08:05:27.795572Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import re\n", "\n", "def extract_rt_user(tweet):\n", " rt_patterns = re.compile(r\"(RT|via)((?:\\b\\W*@\\w+)+)\", re.IGNORECASE)\n", " rt_user_name = rt_patterns.findall(tweet)\n", " if rt_user_name:\n", " rt_user_name = rt_user_name[0][1].strip(' @').split(':')[0]\n", " else:\n", " rt_user_name = None\n", " return rt_user_name" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:05:31.592897Z", "start_time": "2019-06-08T08:05:31.587624Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'chengjun'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet = '''RT @chengjun: ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! \n", " #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE @chengjun @mili http://computational-communication.com \n", " http://ccc.nju.edu.cn RT !!HELP!!!!'''\n", "\n", "extract_rt_user(tweet) " ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:05:42.978825Z", "start_time": "2019-06-08T08:05:42.975151Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "tweet = '''@chengjun: ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! \n", " #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE @chengjun @mili http://computational-communication.com \n", " http://ccc.nju.edu.cn RT !!HELP!!!!'''\n", "\n", "print(extract_rt_user(tweet) )" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:06:01.060683Z", "start_time": "2019-06-08T08:06:01.032491Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "[('RT @AnonKitsu: ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE RT !!HELP!!!!',\n", " 'Anonops_Cop'),\n", " ('@jamiekilstein @allisonkilkenny Interesting interview (never aired, wonder why??) by Fox with #ows protester http://t.co/Fte55Kh7',\n", " 'KittyHybrid'),\n", " (\"@Seductivpancake Right! Those guys have a victory condition: regime change. #ows doesn't seem to have a goal I can figure out.\",\n", " 'nerdsherpa')]" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import csv\n", "\n", "with open(\"../data/ows_tweets_sample.txt\", 'r') as f:\n", " chunk = f.readlines()\n", " \n", "rt_network = []\n", "lines = csv.reader(chunk[1:], delimiter=',', quotechar='\"')\n", "tweet_user_data = [(i[1], i[8]) for i in lines]\n", "tweet_user_data[:3]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:07:37.624179Z", "start_time": "2019-06-08T08:07:37.588574Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[(('OccupyNCGBORO', 'angela0328'), 1),\n", " (('evlance', 'KeithOlbermann'), 1),\n", " (('Lusho0487', 'anonops'), 1)]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from collections import defaultdict\n", "\n", "rt_network = []\n", "rt_dict = defaultdict(int)\n", "for k, i in enumerate(tweet_user_data):\n", " tweet,user = i\n", " rt_user = extract_rt_user(tweet)\n", " if rt_user:\n", " rt_network.append((user, rt_user)) #(rt_user,' ', user, end = '\\n')\n", " rt_dict[(user, rt_user)] += 1\n", "#rt_network[:5]\n", "list(rt_dict.items())[:3]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 获得清洗过的推特文本\n", "\n", "不含人名、url、各种符号(如RT @等)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:08:42.317807Z", "start_time": "2019-06-08T08:08:42.309193Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def extract_tweet_text(tweet, at_names, urls):\n", " for i in at_names:\n", " tweet = tweet.replace(i, '')\n", " for j in urls:\n", " tweet = tweet.replace(j, '')\n", " marks = ['RT @', '@', '"', '#', '\\n', '\\t', ' ']\n", " for k in marks:\n", " tweet = tweet.replace(k, '')\n", " return tweet" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:09:07.984224Z", "start_time": "2019-06-08T08:09:07.973948Z" }, "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['AnonKitsu', 'chengjun', 'mili'] ['http://computational-communication.com', 'http://ccc.nju.edu.cn'] ['OCCUPYWALLSTREET', 'OWS', 'OCCUPYNY'] AnonKitsu -------->\n" ] } ], "source": [ "import twitter_text\n", "\n", "tweet = '''RT @AnonKitsu: ALERT!!!!!!!!!!COPS ARE KETTLING PROTESTERS IN PARK W HELICOPTERS AND PADDYWAGONS!!!! \n", " #OCCUPYWALLSTREET #OWS #OCCUPYNY PLEASE @chengjun @mili http://computational-communication.com \n", " http://ccc.nju.edu.cn RT !!HELP!!!!'''\n", "\n", "ex = twitter_text.Extractor(tweet)\n", "at_names = ex.extract_mentioned_screen_names()\n", "urls = ex.extract_urls()\n", "hashtags = ex.extract_hashtags()\n", "rt_user = extract_rt_user(tweet)\n", "#tweet_text = extract_tweet_text(tweet, at_names, urls)\n", "\n", "print(at_names, urls, hashtags, rt_user,'-------->')#, tweet_text)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:10:11.740636Z", "start_time": "2019-06-08T08:10:11.722855Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "import csv\n", "\n", "lines = csv.reader(chunk,delimiter=',', quotechar='\"')\n", "tweets = [i[1] for i in lines]" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "ExecuteTime": { "end_time": "2019-06-08T08:10:16.517097Z", "start_time": "2019-06-08T08:10:16.506944Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[] [] [] None\n", "['AnonKitsu'] [] ['OCCUPYWALLSTREET', 'OWS', 'OCCUPYNY'] AnonKitsu\n", "['jamiekilstein', 'allisonkilkenny'] ['http://t.co/Fte55Kh7'] ['ows'] None\n", "['Seductivpancake'] [] ['ows'] None\n", "['bembel'] ['http://j.mp/rhHavq'] ['OccupyWallStreet', 'OWS'] bembel\n" ] } ], "source": [ "for tweet in tweets[:5]:\n", " ex = twitter_text.Extractor(tweet)\n", " at_names = ex.extract_mentioned_screen_names()\n", " urls = ex.extract_urls()\n", " hashtags = ex.extract_hashtags()\n", " rt_user = extract_rt_user(tweet)\n", " #tweet_text = extract_tweet_text(tweet, at_names, urls)\n", "\n", " print(at_names, urls, hashtags, rt_user)\n", " #print(tweet_text)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "# 思考:\n", "\n", "### 提取出raw tweets中的rtuser与user的转发网络\n", "\n", "## 格式:\n", "rt_user1, user1, 3\n", "\n", "rt_user2, user3, 2\n", "\n", "rt_user2, user4, 1\n", "\n", "...\n", "\n", "数据保存为csv格式" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# 阅读文献" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 0, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": false, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "1260px", "left": "1835px", "top": "224px", "width": "512px" }, "toc_section_display": false, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }