{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", " \n", "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n", "###
Author: Syrovatskiy Ilya, ODS Slack nickname : bokomaru\n", " \n", "##
Tutorial\n", " \n", "###
\"Epidemics on networks with NetworkX and EoN\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this tutorial, you'll tackle such an established problem in graph theory as **Epidemic dynamics models**.\n", "\n", "Firstly we'll have to deal with loading your own data from the **VKontakte network using it's API**, so we will go through some basic principles of requests and authentification. If you don't have account in this network - I'll give you already created graph on my own friends network (308 people), but with changed names and IDs. Probably, someone doesn't want to show his name and ID for OpenDataScience community (: . Also I will provide you the link to the graph based on social net with changed info for every person. Our main instrument for graph modeling will be the **NetworkX library** in Python.\n", "\n", "Since we get graph created, we are ready to start with somtething interesting. \n", "We'll go over the basic building blocks of graphs (nodes, edges, etc) and create **pseudo random graph** with the same depth and quantity of verteces. \n", "\n", "Then we are going to visualize created graphs - there will be some obvious differences between our graphs. \n", "\n", "Next point is to talk about main theme of this tutorial - Epidemic on Network. Thus, you'll know some new stuff about different models of epidemic's distributions.\n", "\n", "After you get to know basics it's time to go deeper into epidemic modeling. We'll explore the **most spread models** with code in two graphs (real and pseudo-random), and compare the results with python **library for epidemic modeling EoN** for each case. \n", "\n", "Since we have observed everything I planned in this tutorial, it'll be the time to look at results we got while getting in the world of network, and then - make a conclusion.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Here you can get familiarized with the content more properly:\n", "\n", ">> **TABLE OF CONTENTS** :\n", "\n", "0. **First meeting with graphs and libraries**\n", "\n", " 0.1 Intro \n", " 0.2 Packages installation\n", " 0.3 Packages importing\n", " \n", "> \n", "1. **Creation of a real Graph** :\n", " \n", " 1.1 Complex long start:\n", " \n", " 1.1.1 Fast (no) start with VK API \n", " 1.1.2 Loading your social net friends\n", " 1.1.3 Forming correct graph\n", " 1.1.4 (optional) Replacing real people's names and ID with random generated\n", "\n", " 1.2 Lazy fast start:\n", " \n", " 1.2.1 Uploading data for building graph\n", " 1.2.2 Building Graph with NetworkX\n", " 1.2.3 Saving created Graph \n", " \n", "> \n", "2. **Inspection of the Graph** \n", "\n", " 2.1 Loading graph from source\n", " \n", " 2.2 Creation of a pseudo-random Graph \n", "\n", " 2.3 Graph Visualization\n", "> \n", "3. **Introduction in Epidemics on Networks**\n", " \n", " 3.1 Basics of epidemic modeling\n", " \n", " 3.2 Connected components\n", "\n", "> \n", "4. **SI Model** \n", "\n", " 4.1. Statement of the model \n", " 4.2. Implementation in Real Graph\n", " 4.3. Implementation in Pseudo-random Graph\n", " 4.4. Compare with EoN modeling\n", " \n", "> \n", "5. **SIR Model** \n", " \n", " 5.1. Statement of the model \n", " 5.2. Implementation in Real Graph\n", " 5.3. Implementation in Pseudo-random Graph\n", " 5.4. Compare with EoN modeling \n", "\n", "> \n", "6. **SIS Model**\n", " \n", " 6.1. Statement of the model \n", " 6.2. Implementation in Real Graph\n", " 6.3. Implementation in Pseudo-random Graph\n", " 6.4. Compare with EoN modeling \n", "\n", "> \n", "7. **Conclusion**\n", "\n", "> \n", " \n", "> \n", "\n", "\n", "\n", "\n", "\n", "P.S. materials are based on :\n", "> Courses about networks in HSE(Higher School of Economics National Research University)\n", "\n", "> Couple of usefull ideas about EoN I got from the official EoN page https://media.readthedocs.org/pdf/epidemicsonnetworks/latest/epidemicsonnetworks.pdf\n", "\n", "> One example for SIR theory taken from :\n", "https://scipython.com/book/chapter-8-scipy/additional-examples/the-sir-epidemic-model/\n", "\n", "> One example for SIS theory taken from :\n", "https://chengjunwang.com/post/en/2013-03-14-learn-basic-epidemic-models-with-python/\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">>> ## 0. First meeting with graphs and libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">> ### 0.1 Intro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", " Since we live in the 21th centure, almost all people have accounts in different networks, where they can be closer to their friends wherevere they are. \n", " As it plays significant part of our lives, analysis in this sphere is an amazing opportunity to know something interesting about ourselves and our friendship.\n", "\n", "\n", "\n", "The nice thing about graphs is that the concepts and terminology are generally intuitive. Nevertheless, here's some basic lingo:\n", "\n", "Graphs are structures that map relations between objects. The objects are referred to as nodes and the connections between them as edges in this tutorial. Note that edges and nodes are commonly referred to by several names that generally mean exactly the same thing:\n", "\n", "node == vertex == point\n", "edge == arc == link\n", "\n", " For implement graph in our analysis it's good idea to use some libraries. \n", "\n", "**Firstly**, it's NetworkX library. NetworkX is the most popular Python package for manipulating and analyzing graphs. Several packages offer the same basic level of graph manipulation, but, most likely, NetworkX is the best.\n", "\n", "**Secondly**, it's EoN library. EoN (Epidemics on Networks) is a Python module, that provides tools to study the spread of SIS and SIR diseases in networks (SIR and SIS definition I'll provide in the chapter 6). EoN is built on top of NetworkX.\n", "\n", "**Thirdly**, since we want to get our friendlist from VK, we have to use their API - that means we need some libraries for requests. If you are not VK user, you can change a bit code in this notebook to get your friends, for example, from Facebook. I am sure, that is pretty the same.\n", "\n", "**Finally**, we will need usual basic libraries you already know (I hope) such as matplotlib, Garbage Collector interface, pandas, etc. \n", "\n", "\n", "> Let's start from installing NetworkX and EoN and importing everything we will need : \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">> ### 0.2 Packages installation" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "! pip install networkx\n", "! pip install EoN\n", "\n", "# for python3 use: python3 -m pip \n", "# instead of pip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">> ### 0.3 Packages importing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now import all libraries that we will use in this tutorial:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# System\n", "import os\n", "import sys\n", "import time\n", "import tqdm\n", "import gc\n", "\n", "\n", "# Basics \n", "import pandas as pd\n", "import random\n", "import numpy as np\n", "import copy\n", "\n", "\n", "# Graph analysis\n", "import networkx as nx\n", "import EoN\n", "\n", "\n", "# Usefull modules/functions\n", "import scipy as sp\n", "from numpy.linalg import eig\n", "from scipy.integrate import odeint\n", "\n", "\n", "# Get friends from network \n", "import requests\n", "import json\n", "\n", "\n", "# Visualization \n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">>> ## 1. Creation of a real Graph \n", "\n", ">> ### 1.1 Complex start\n", "\n", "If you are NOT VK user, you can skip this part and jump to loading already created data for graph (**Lazy fast start**). But probably, you can get some new really interesing information in this part for your future researches. There will be not only work with API, but also random generating people with saving their relationships!\n", "\n", "\n", ">#### 1.1.1 Fast start with VK API\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "API stands for Application Programming Interface, or an interface for programming applications. In the case of web applications, the API can provide data in a format other than the standard HTML, which makes it convenient to use while writing different applications. Third-party public APIs most often provide data in one of two formats: XML or JSON.\n", "\n", "Based on the API: various mobile and desktop clients for Twitter and Vkontakte are built. APIs have high-quality and well documented APIs.\n", "\n", "The VKontakte API is described in the https://vk.com/dev documentation and, more specifically, https://vk.com/dev/api_requests.\n", "\n", "For example : \n", "https://api.vk.com/method/getProfiles?uid=59249080. \n", "\n", "We received the answer in json format: (I was authenticated. And yes, it's my ID)\n", "\n", "{\"response\":[{\"uid\":59249080,\"first_name\":\"Ilya\",\"last_name\":\"Syrovatskiy\",\"hidden\":1}]}\n", "\n", "Else you got an error also in json: \n", "\n", "{\"error\":{\"error_code\":5,\"error_msg\":\"User authorization failed: no access_token passed.\",\"request_params\":[{\"key\":\"oauth\",\"value\":\"1\"},{\"key\":\"method\",\"value\":\"getProfiles\"},{\"key\":\"uid\",\"value\":\"59249080\"}]}}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to use all the features of the VK API, you need to get an access token account. To do this you will need to [create a standalone application](https://vk.com/editapp?act=create).\n", "\n", "After we created the application you can find access token in the [Applications](https://vk.com/apps?act=manage) section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many VK API methods assume the presence of a private token that must be passed as a parameter when executing the request. The process of obtaining a token is described in the documentation: https://vk.com/dev/access_token\n", "\n", ">Attention! Token is called private for a reason. The person possessing it can perform a variety of actions on your behalf. Do not show it to anyone.\n", "\n", "In short, you will be given the ID of your application and the list of access rights, that you want to provide to the user of the API. Then you need to specify this data as parameters in the URL of the following format \n", "\n", "https://oauth.vk.com/authorize?client_id={APP_ID}&scope={APP_PERMISSIONS}&response_type=token\n", "\n", ", confirm your intention to provide access and copy the current token from the URL in the opened window.\n", "\n", "For example: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'https://oauth.vk.com/authorize?client_id=8888888&scope=&response_type=token'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# your app ID here : \n", "APP_ID = 8888888\n", "\n", "# your additional permissions: (here no additional permissions)\n", "PERMISSIONS = \"\"\n", "AUTH_URL = \"https://oauth.vk.com/authorize?client_id={}&scope={}&response_type=token\".format(APP_ID, PERMISSIONS)\n", "AUTH_URL" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Click on this link and you'll get to the page with address : \n", "\n", "https://oauth.vk.com/blank.html#access_token=5614afdcc2bcd42cea3d9c5edc130101dd4be6639b484131870dc12337e5b74b94411de69f0996379dd6b&expires_in=86400&user_id=59249080\n", "\n", "where string after access_token= \n", "\n", ">5614afdcc2bcd42cea3d9c5edc130101dd4be6639b484131870dc12337e5b74b94411de69f0996379dd6b \n", "\n", "your access token. \n", "\n", "Let's keep it." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "TOKEN = \"5614afdcc2bcd42cea3d9c5edc130101dd4be6639b484131870dc12337e5b74b94411de69f0996379dd6b\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Queryings to VK API**\n", "\n", "After receiving a private token, you can safely perform requests to the API using the methods you need (https://vk.com/dev/methods). The request format is as follows: \n", "\n", "https://api.vk.com/method/METHOD_NAME?PARAMETERS&access_token=ACCESS_TOKEN\n", "\n", "For example, to get information about a user with id 59249080, you need to run the following query:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Paste here your user ID : \n", "uid = 59249080\n", "res = requests.get(\n", " \"https://api.vk.com/method/users.get\",\n", " params={\"user_ids\": uid,\n", " \"fields\": \"nickname, screen_name, sex, bdate, city, country, timezone, counters, photo_medium\",\n", " \"access_token\": TOKEN,\n", " \"version\": 5.85}).json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can experiment here, just look into API documentation. Requests to API are really usefull: you can build your own web app (using Python and Django), then make correct Auth and connection to API server, and so you will be able to get almost all information you want automatically. For example, you can mining posts, people profiles, etc. with respect to your aims, and then do a research in something amazing in society.\n", "\n", "OK, let's continue:\n", "\n", "If token is not correct or it is already outdated, you will get an error : " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'error': {'error_code': 5,\n", " 'error_msg': 'User authorization failed: invalid access_token (4).',\n", " 'request_params': [{'key': 'oauth', 'value': '1'},\n", " {'key': 'method', 'value': 'users.get'},\n", " {'key': 'version', 'value': '5.85'},\n", " {'key': 'fields',\n", " 'value': 'nickname, screen_name, sex, bdate, city, country, timezone, counters, photo_medium'},\n", " {'key': 'user_ids', 'value': '59249080'}]}}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**VK API Restrictions**\n", "\n", "There are limited number of requests via VK API - no more than three requests per second. \n", ">There can be maximum 3 requests to API methods per second from a client. \n", "\n", ">Maximum amount of server requests depends on the app's users amount. \n", "If an app has less than 10 000 users, 5 requests per second, up to 100 000 – 8 requests, up to 1 000 000 – 20 requests, 1 000 000+ – 35 requests. \n", "\n", ">If one of this limits is exceeded, the server will return the following error: 'Too many requests per second'. \n", "\n", ">If your app's logic implies many requests in a row, check the execute method. \n", "\n", ">Except the frequency limits there are quantitative limits on calling the methods of the same type. By obvious reasons we don't provide the exact limits info. \n", "\n", ">Excess of a quantitative limit access to a particular method will require captcha (see captcha_error). After that it may be temporarily limited (in this case the server doesn't answer on particular method's requests but easily processes any other requests).\n", "\n", "You can pause when performing any operation in Python using the sleep function from the time module. To do so you must pass the number of seconds for which the program will be suspended:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n" ] } ], "source": [ "for i in range(5):\n", " time.sleep(.5)\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We already saw that we can get response errors in JSON, so you have to check everything before and after querying to avoid getting false and incorrect information. \n", "\n", "Also, there are many different subtleties of usage API. For example, to get a list of friends of a user, you need to use the friends.get method, which can return both a simple friend list and detailed information about each friend, depending on whether the fields parameter is specified (if not specified, simply returns the ID list). And if the fields parameter is specified, then for one request you cannot get information about more than 5000 people.\n", "\n", "Since you've created your APP and got APP ID and token, you are ready to download your friends. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">#### 1.1.2 Loading your social net friends" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Let's define function for it:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def get_friends_ids(user_id, fields = \"\"):\n", " res = requests.get(\n", " \"https://api.vk.com/method/friends.get\",\n", " params={\"user_id\": user_id,\n", " \"fields\": fields,\n", " \"access_token\": TOKEN,\n", " \"version\": 5.85}).json()\n", " # also you can add access token in the request, receiving it via OAuth 2.0\n", " if res.get('error'):\n", " print( res.get('error'))\n", " return list()\n", " return res[u'response']" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# asking for friends and their gender \n", "# notice that gender is in the format 1=female, 2=male\n", "\n", "# uid supposed to be here your user ID to get YOUR friends\n", "full_friends = get_friends_ids(uid, [\"name\", \"sex\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">#### 1.1.3 Forming correct graph\n", "\n", "After we've downloaded friends, now it's time to download all friends of your friends. \n", "\n", "We will only make our research in graph of your friends only, but for getting correct links between each other we have to load graph of depth 2 (your friends and friends of your friends).\n", "\n", "Loading will take some time, something about 10 minutes (depends on total quantity of people, your system and internet connection), so you can make a tea/coffee in this pause :)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "full_graph = {}\n", "for i in tqdm.tqdm_notebook(full_friends):\n", " full_graph[i[\"user_id\"]] = get_friends_ids(i[\"user_id\"])\n", " time.sleep(.3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I recommend you to save this data on your local storage to prevent repeating of loading and waiting : " ] }, { "cell_type": "code", "execution_count": 275, "metadata": {}, "outputs": [], "source": [ "with open(\"full_graph_depth2.txt\", \"w+\") as f:\n", " f.write(json.dumps(full_graph))\n", "\n", "with open(\"full_friends.txt\",\"w+\") as f:\n", " f.write(json.dumps(full_friends))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can continue. The next step is optional, you can just read what is happening there without running a code.\n", "\n", "So I will replace real people's names and ID with random generated. \n", "\n", "Here I provide for you links to 2 sets : names and surnames. These sets I will use for random generating people's names on already existing graph(!) - nodes and edges are kept unchanged:\n", "\n", "names : \n", ">go to https://www.ssa.gov/oact/babynames/limits.html\n", " \n", " then download National data \n", " in ZIP file take yob2017.txt\n", " \n", "surnames : \n", ">go to https://github.com/smashew/NameDatabases/blob/master/NamesDatabases/surnames/us.txt\n", " \n", " download surnames as us.txt\n", "\n", ">\n", "> Or you can load all needed data from my repo: https://github.com/Mercurialll/tutors_and_projs/tree/master/jupyter_english/tutorials\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ">#### 1.1.4 (optional) Replacing real people's names and ID with random generated" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "names = pd.read_csv(\"yob2017.txt\", header=None)\n", "names.rename(columns={0: 'name', 1: 'sex', 2: 'Popularity'}, inplace=True)\n", "\n", "surnames = pd.read_table(\"us.txt\", header=None)\n", "surnames.rename(columns={0: 'surname'}, inplace=True)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def get_random_people(full_friends, names, surnames):\n", " n_people = len(full_friends)\n", " n_m = 0\n", " n_f = 0\n", " \n", " true_id_f = []\n", " true_id_m = []\n", " for friend in full_friends:\n", " if friend['sex'] == 2:\n", " n_m += 1\n", " true_id_m.append(friend['uid'])\n", " else:\n", " n_f += 1\n", " true_id_f.append(friend['uid'])\n", " print(\"people number: \", n_people, \", men: \", n_m, \", women: \", n_f)\n", "\n", " # take only top popular names for both Female and Male : \n", " names_f = names.query('sex == \"F\"')[:n_f].name.values\n", " names_m = names.query('sex == \"M\"')[:n_m].name.values\n", "\n", " # take random n_people surnames : \n", " random.seed(17)\n", " rand_indc = np.random.choice(a=range(len(surnames)), size=n_people, replace=False)\n", " s_names = surnames.surname.values[rand_indc]\n", " # separate on female/male\n", " s_names_f = s_names[:n_f]\n", " s_names_m = s_names[n_f:]\n", " \n", " # we will take from here random IDs of users:\n", " ids = np.random.choice(a=range(1001, 9999), size=n_people, replace=False)\n", " # separate on female/male\n", " id_f = ids[:n_f]\n", " id_m = ids[n_f:]\n", " \n", " random_f = pd.DataFrame(data={'uid': id_f, 'first_name': names_f, 'last_name': s_names_f, \n", " 'true_id': true_id_f, 'user_id': id_f, 'sex': 1})\n", " random_m = pd.DataFrame(data={'uid': id_m, 'first_name': names_m, 'last_name': s_names_m, \n", " 'true_id': true_id_m, 'user_id': id_m, 'sex': 2})\n", " \n", " # merge male and female random sets\n", " random_people = pd.concat([random_f, random_m])\n", " \n", " return(random_people)\n", " " ] }, { "cell_type": "code", "execution_count": 232, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "people number: 309 , men: 207 , women: 102\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namesexuiduser_id
0EmmaPlese194129412
1OliviaMckellan145034503
2AvaAbram186238623
3IsabellaBloomquist156585658
4SophiaBerkson180338033
\n", "