{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<center>\n",
    "<img src=\"https://habrastorage.org/files/fd4/502/43d/fd450243dd604b81b9713213a247aa20.jpg\"/>\n",
    "    \n",
    "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n",
    "### <center> Author: Syrovatskiy Ilya, ODS Slack nickname : bokomaru\n",
    "    \n",
    "## <center> Tutorial\n",
    "    \n",
    "### <center> \"Epidemics on networks with NetworkX and EoN\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With this tutorial, you'll tackle such an established problem in graph theory as **Epidemic dynamics models**.\n",
    "\n",
    "Firstly we'll have to deal with loading your own data from the **VKontakte network using it's API**, so we will go through some basic principles of requests and authentification. If you don't have account in this network - I'll give you already created graph on my own friends network (308 people), but with changed names and IDs. Probably, someone doesn't want to show his name and ID for OpenDataScience community (: . Also I will provide you the link to the graph based on social net with changed info for every person. Our main instrument for graph modeling will be the **NetworkX library** in Python.\n",
    "\n",
    "Since we get graph created, we are ready to start with somtething interesting. \n",
    "We'll go over the basic building blocks of graphs (nodes, edges, etc) and create **pseudo random graph** with the same depth and quantity of verteces. \n",
    "\n",
    "Then we are going to visualize created graphs - there will be some obvious differences between our graphs. \n",
    "\n",
    "Next point is to talk about main theme of this tutorial - Epidemic on Network. Thus, you'll know some new stuff about different models of epidemic's distributions.\n",
    "\n",
    "After you get to know basics it's time to go deeper into epidemic modeling. We'll explore the **most spread models** with code in two graphs (real and pseudo-random), and compare the results with python **library for epidemic modeling EoN** for each case. \n",
    "\n",
    "Since we have observed everything I planned in this tutorial, it'll be the time to look at results we got while getting in the world of network, and then - make a conclusion.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Here you can get familiarized with the content more properly:\n",
    "\n",
    ">> **TABLE OF CONTENTS** :\n",
    "\n",
    "`0`. **First meeting with graphs and libraries**\n",
    "\n",
    "    0.1 Intro    \n",
    "    0.2 Packages installation\n",
    "    0.3 Packages importing\n",
    "    \n",
    "> \n",
    "`1.` **Creation of a real Graph** :\n",
    "    \n",
    "    1.1 Complex long start:\n",
    "    \n",
    "        1.1.1 Fast (no) start with VK API \n",
    "        1.1.2 Loading your social net friends\n",
    "        1.1.3 Forming correct graph\n",
    "        1.1.4 (optional) Replacing real people's names and ID with random generated\n",
    "\n",
    "    1.2 Lazy fast start:\n",
    "        \n",
    "        1.2.1 Uploading data for building graph\n",
    "        1.2.2 Building Graph with NetworkX\n",
    "        1.2.3 Saving created Graph \n",
    "        \n",
    "> \n",
    "`2.` **Inspection of the Graph** \n",
    "\n",
    "    2.1 Loading graph from source\n",
    "    \n",
    "    2.2 Creation of a pseudo-random Graph \n",
    "\n",
    "    2.3 Graph Visualization\n",
    "> \n",
    "`3.` **Introduction in Epidemics on Networks**\n",
    "    \n",
    "    3.1 Basics of epidemic modeling\n",
    "    \n",
    "    3.2 Connected components\n",
    "\n",
    "> \n",
    "`4.` **SI Model** \n",
    "\n",
    "    4.1. Statement of the model \n",
    "    4.2. Implementation in Real Graph\n",
    "    4.3. Implementation in Pseudo-random Graph\n",
    "    4.4. Compare with EoN modeling\n",
    "    \n",
    "> \n",
    "`5.` **SIR Model** \n",
    "    \n",
    "    5.1. Statement of the model \n",
    "    5.2. Implementation in Real Graph\n",
    "    5.3. Implementation in Pseudo-random Graph\n",
    "    5.4. Compare with EoN modeling \n",
    "\n",
    "> \n",
    "`6.` **SIS Model**\n",
    "   \n",
    "    6.1. Statement of the model \n",
    "    6.2. Implementation in Real Graph\n",
    "    6.3. Implementation in Pseudo-random Graph\n",
    "    6.4. Compare with EoN modeling \n",
    "\n",
    "> \n",
    "`7.` **Conclusion**\n",
    "\n",
    "> \n",
    " \n",
    "> \n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "P.S. materials are based on :\n",
    "> Courses about networks in HSE(Higher School of Economics National Research University)\n",
    "\n",
    "> Couple of usefull ideas about EoN I got from the official EoN page https://media.readthedocs.org/pdf/epidemicsonnetworks/latest/epidemicsonnetworks.pdf\n",
    "\n",
    "> One example for SIR theory taken from :\n",
    "https://scipython.com/book/chapter-8-scipy/additional-examples/the-sir-epidemic-model/\n",
    "\n",
    "> One example for SIS theory taken from :\n",
    "https://chengjunwang.com/post/en/2013-03-14-learn-basic-epidemic-models-with-python/\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">>> ## 0. First meeting with graphs and libraries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 0.1 Intro"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "    Since we live in the 21th centure, almost all people have accounts in different networks, where they can be closer to their friends wherevere they are. \n",
    "    As it plays significant part of our lives, analysis in this sphere is an amazing opportunity to know something interesting about ourselves and our friendship.\n",
    "\n",
    "\n",
    "\n",
    "The nice thing about graphs is that the concepts and terminology are generally intuitive. Nevertheless, here's some basic lingo:\n",
    "\n",
    "Graphs are structures that map relations between objects. The objects are referred to as nodes and the connections between them as edges in this tutorial. Note that edges and nodes are commonly referred to by several names that generally mean exactly the same thing:\n",
    "\n",
    "node == vertex == point\n",
    "edge == arc == link\n",
    "\n",
    "    For implement graph in our analysis it's good idea to use some libraries. \n",
    "\n",
    "**Firstly**, it's NetworkX library. NetworkX is the most popular Python package for manipulating and analyzing graphs. Several packages offer the same basic level of graph manipulation, but, most likely, NetworkX is the best.\n",
    "\n",
    "**Secondly**, it's EoN library. EoN (Epidemics on Networks) is a Python module, that provides tools to study the spread of SIS and SIR diseases in networks (SIR and SIS definition I'll provide in the chapter 6). EoN is built on top of NetworkX.\n",
    "\n",
    "**Thirdly**, since we want to get our friendlist from VK, we have to use their API - that means we need some libraries for requests. If you are not VK user, you can change a bit code in this notebook to get your friends, for example, from Facebook. I am sure, that is pretty the same.\n",
    "\n",
    "**Finally**, we will need usual basic libraries you already know (I hope) such as matplotlib, Garbage Collector interface, pandas, etc. \n",
    "\n",
    "\n",
    "> Let's start from installing NetworkX and EoN and importing everything we will need : \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ###  0.2 Packages installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install networkx\n",
    "! pip install EoN\n",
    "\n",
    "# for python3 use: python3 -m pip\n",
    "# instead of pip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ###  0.3 Packages importing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now import all libraries that we will use in this tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import copy\n",
    "import gc\n",
    "import json\n",
    "# System\n",
    "import os\n",
    "import random\n",
    "import sys\n",
    "import time\n",
    "\n",
    "import EoN\n",
    "# Visualization\n",
    "import matplotlib.pyplot as plt\n",
    "# Graph analysis\n",
    "import networkx as nx\n",
    "import numpy as np\n",
    "# Basics\n",
    "import pandas as pd\n",
    "# Get friends from network\n",
    "import requests\n",
    "# Usefull modules/functions\n",
    "import scipy as sp\n",
    "import tqdm\n",
    "from numpy.linalg import eig\n",
    "from scipy.integrate import odeint\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">>> ## 1. Creation of a real Graph \n",
    "\n",
    ">> ### 1.1 Complex start\n",
    "\n",
    "If you are NOT VK user, you can skip this part and jump to loading already created data for graph (**Lazy fast start**). But probably, you can get some new really interesing information in this part for your future researches. There will be not only work with API, but also random generating people with saving their relationships!\n",
    "\n",
    "\n",
    ">#### 1.1.1 Fast start with VK API\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "API stands for Application Programming Interface, or an interface for programming applications. In the case of web applications, the API can provide data in a format other than the standard HTML, which makes it convenient to use while writing different applications. Third-party public APIs most often provide data in one of two formats: XML or JSON.\n",
    "\n",
    "Based on the API: various mobile and desktop clients for Twitter and Vkontakte are built. APIs have high-quality and well documented APIs.\n",
    "\n",
    "The VKontakte API is described in the https://vk.com/dev documentation and, more specifically, https://vk.com/dev/api_requests.\n",
    "\n",
    "For example : \n",
    "https://api.vk.com/method/getProfiles?uid=59249080. \n",
    "\n",
    "We received the answer in json format: (I was authenticated. And yes, it's my ID)\n",
    "\n",
    "`{\"response\":[{\"uid\":59249080,\"first_name\":\"Ilya\",\"last_name\":\"Syrovatskiy\",\"hidden\":1}]}`\n",
    "\n",
    "Else you got an error also in json: \n",
    "\n",
    "`{\"error\":{\"error_code\":5,\"error_msg\":\"User authorization failed: no access_token passed.\",\"request_params\":[{\"key\":\"oauth\",\"value\":\"1\"},{\"key\":\"method\",\"value\":\"getProfiles\"},{\"key\":\"uid\",\"value\":\"59249080\"}]}}`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to use all the features of the VK API, you need to get an access token account. To do this you will need to [create a standalone application](https://vk.com/editapp?act=create).\n",
    "\n",
    "After we created the application you can find access token in the [Applications](https://vk.com/apps?act=manage) section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many VK API methods assume the presence of a private token that must be passed as a parameter when executing the request. The process of obtaining a token is described in the documentation: https://vk.com/dev/access_token\n",
    "\n",
    ">Attention! Token is called private for a reason. The person possessing it can perform a variety of actions on your behalf. Do not show it to anyone.\n",
    "\n",
    "In short, you will be given the ID of your application and the list of access rights, that you want to provide to the user of the API. Then you need to specify this data as parameters in the URL of the following format \n",
    "\n",
    "https://oauth.vk.com/authorize?client_id={APP_ID}&scope={APP_PERMISSIONS}&response_type=token\n",
    "\n",
    ", confirm your intention to provide access and copy the current token from the URL in the opened window.\n",
    "\n",
    "For example: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your app ID here :\n",
    "APP_ID = 8888888\n",
    "\n",
    "# your additional permissions:  (here no additional permissions)\n",
    "PERMISSIONS = \"\"\n",
    "AUTH_URL = \"https://oauth.vk.com/authorize?client_id={}&scope={}&response_type=token\".format(\n",
    "    APP_ID, PERMISSIONS\n",
    ")\n",
    "AUTH_URL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Click on this link and you'll get to the page with address : \n",
    "\n",
    "https://oauth.vk.com/blank.html#access_token=5614afdcc2bcd42cea3d9c5edc130101dd4be6639b484131870dc12337e5b74b94411de69f0996379dd6b&expires_in=86400&user_id=59249080\n",
    "\n",
    "where string after access_token= \n",
    "\n",
    ">5614afdcc2bcd42cea3d9c5edc130101dd4be6639b484131870dc12337e5b74b94411de69f0996379dd6b \n",
    "\n",
    "your access token. \n",
    "\n",
    "Let's keep it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TOKEN = \"5614afdcc2bcd42cea3d9c5edc130101dd4be6639b484131870dc12337e5b74b94411de69f0996379dd6b\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Queryings to VK API**\n",
    "\n",
    "After receiving a private token, you can safely perform requests to the API using the methods you need (https://vk.com/dev/methods). The request format is as follows: \n",
    "\n",
    "https://api.vk.com/method/METHOD_NAME?PARAMETERS&access_token=ACCESS_TOKEN\n",
    "\n",
    "For example, to get information about a user with id 59249080, you need to run the following query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Paste here your user ID :\n",
    "uid = 59249080\n",
    "res = requests.get(\n",
    "    \"https://api.vk.com/method/users.get\",\n",
    "    params={\n",
    "        \"user_ids\": uid,\n",
    "        \"fields\": \"nickname, screen_name, sex, bdate, city, country, timezone, counters, photo_medium\",\n",
    "        \"access_token\": TOKEN,\n",
    "        \"version\": 5.85,\n",
    "    },\n",
    ").json()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can experiment here, just look into API documentation. Requests to API are really usefull: you can build your own web app (using Python and Django), then make correct Auth and connection to API server, and so you will be able to get almost all information you want automatically. For example, you can mining posts, people profiles, etc. with respect to your aims, and then do a research in something amazing in society.\n",
    "\n",
    "OK, let's continue:\n",
    "\n",
    "If token is not correct or it is already outdated, you will get an error : "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**VK API Restrictions**\n",
    "\n",
    "There are limited number of requests via VK API - no more than three requests per second. \n",
    ">There can be maximum 3 requests to API methods per second from a client. \n",
    "\n",
    ">Maximum amount of server requests depends on the app's users amount. \n",
    "If an app has less than 10 000 users, 5 requests per second, up to 100 000 – 8 requests, up to 1 000 000 – 20 requests, 1 000 000+ – 35 requests. \n",
    "\n",
    ">If one of this limits is exceeded, the server will return the following error: 'Too many requests per second'. \n",
    "\n",
    ">If your app's logic implies many requests in a row, check the execute method. \n",
    "\n",
    ">Except the frequency limits there are quantitative limits on calling the methods of the same type. By obvious reasons we don't provide the exact limits info. \n",
    "\n",
    ">Excess of a quantitative limit access to a particular method will require captcha (see captcha_error). After that it may be temporarily limited (in this case the server doesn't answer on particular method's requests but easily processes any other requests).\n",
    "\n",
    "You can pause when performing any operation in Python using the sleep function from the time module. To do so you must pass the number of seconds for which the program will be suspended:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(5):\n",
    "    time.sleep(0.5)\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We already saw that we can get response errors in JSON, so you have to check everything before and after querying to avoid getting false and incorrect information. \n",
    "\n",
    "Also, there are many different subtleties of usage API. For example, to get a list of friends of a user, you need to use the friends.get method, which can return both a simple friend list and detailed information about each friend, depending on whether the fields parameter is specified (if not specified, simply returns the ID list). And if the fields parameter is specified, then for one request you cannot get information about more than 5000 people.\n",
    "\n",
    "Since you've created your APP and got APP ID and token, you are ready to download your friends. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">#### 1.1.2 Loading your social net friends"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Let's define function for it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_friends_ids(user_id, fields=\"\"):\n",
    "    res = requests.get(\n",
    "        \"https://api.vk.com/method/friends.get\",\n",
    "        params={\n",
    "            \"user_id\": user_id,\n",
    "            \"fields\": fields,\n",
    "            \"access_token\": TOKEN,\n",
    "            \"version\": 5.85,\n",
    "        },\n",
    "    ).json()\n",
    "    # also you can add access token in the request, receiving it via OAuth 2.0\n",
    "    if res.get(\"error\"):\n",
    "        print(res.get(\"error\"))\n",
    "        return list()\n",
    "    return res[u\"response\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# asking for friends and their gender\n",
    "# notice that gender is in the format 1=female, 2=male\n",
    "\n",
    "# uid supposed to be here your user ID to get YOUR friends\n",
    "full_friends = get_friends_ids(uid, [\"name\", \"sex\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">#### 1.1.3 Forming correct graph\n",
    "\n",
    "After we've downloaded friends, now it's time to download all friends of your friends. \n",
    "\n",
    "We will only make our research in graph of your friends only, but for getting correct links between each other we have to load graph of depth 2 (your friends and friends of your friends).\n",
    "\n",
    "Loading will take some time, something about 10 minutes (depends on total quantity of people, your system and internet connection), so you can make a tea/coffee in this pause :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "full_graph = {}\n",
    "for i in tqdm.tqdm_notebook(full_friends):\n",
    "    full_graph[i[\"user_id\"]] = get_friends_ids(i[\"user_id\"])\n",
    "    time.sleep(0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I recommend you to save this data on your local storage to prevent repeating of loading and waiting : "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"full_graph_depth2.txt\", \"w+\") as f:\n",
    "    f.write(json.dumps(full_graph))\n",
    "\n",
    "with open(\"full_friends.txt\", \"w+\") as f:\n",
    "    f.write(json.dumps(full_friends))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can continue. The next step is optional, you can just read what is happening there without running a code.\n",
    "\n",
    "So I will replace real people's names and ID with random generated. \n",
    "\n",
    "Here I provide for you links to 2 sets : names and surnames. These sets I will use for random generating people's names on already existing graph(!) - nodes and edges are kept unchanged:\n",
    "\n",
    "names : \n",
    ">go to https://www.ssa.gov/oact/babynames/limits.html\n",
    "    \n",
    "    then download National data \n",
    "    in ZIP file take yob2017.txt\n",
    "   \n",
    "surnames : \n",
    ">go to https://github.com/smashew/NameDatabases/blob/master/NamesDatabases/surnames/us.txt\n",
    "    \n",
    "    download surnames as us.txt\n",
    "\n",
    ">\n",
    "> Or you can load all needed data from my repo: https://github.com/Mercurialll/tutors_and_projs/tree/master/jupyter_english/tutorials\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">#### 1.1.4 (optional) Replacing real people's names and ID with random generated"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "names = pd.read_csv(\"yob2017.txt\", header=None)\n",
    "names.rename(columns={0: \"name\", 1: \"sex\", 2: \"Popularity\"}, inplace=True)\n",
    "\n",
    "surnames = pd.read_table(\"us.txt\", header=None)\n",
    "surnames.rename(columns={0: \"surname\"}, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_random_people(full_friends, names, surnames):\n",
    "    n_people = len(full_friends)\n",
    "    n_m = 0\n",
    "    n_f = 0\n",
    "\n",
    "    true_id_f = []\n",
    "    true_id_m = []\n",
    "    for friend in full_friends:\n",
    "        if friend[\"sex\"] == 2:\n",
    "            n_m += 1\n",
    "            true_id_m.append(friend[\"uid\"])\n",
    "        else:\n",
    "            n_f += 1\n",
    "            true_id_f.append(friend[\"uid\"])\n",
    "    print(\"people number: \", n_people, \", men: \", n_m, \", women: \", n_f)\n",
    "\n",
    "    # take only top popular names for both Female and Male :\n",
    "    names_f = names.query('sex == \"F\"')[:n_f].name.values\n",
    "    names_m = names.query('sex == \"M\"')[:n_m].name.values\n",
    "\n",
    "    # take random n_people surnames :\n",
    "    random.seed(17)\n",
    "    rand_indc = np.random.choice(a=range(len(surnames)), size=n_people, replace=False)\n",
    "    s_names = surnames.surname.values[rand_indc]\n",
    "    # separate on female/male\n",
    "    s_names_f = s_names[:n_f]\n",
    "    s_names_m = s_names[n_f:]\n",
    "\n",
    "    # we will take from here random IDs of users:\n",
    "    ids = np.random.choice(a=range(1001, 9999), size=n_people, replace=False)\n",
    "    # separate on female/male\n",
    "    id_f = ids[:n_f]\n",
    "    id_m = ids[n_f:]\n",
    "\n",
    "    random_f = pd.DataFrame(\n",
    "        data={\n",
    "            \"uid\": id_f,\n",
    "            \"first_name\": names_f,\n",
    "            \"last_name\": s_names_f,\n",
    "            \"true_id\": true_id_f,\n",
    "            \"user_id\": id_f,\n",
    "            \"sex\": 1,\n",
    "        }\n",
    "    )\n",
    "    random_m = pd.DataFrame(\n",
    "        data={\n",
    "            \"uid\": id_m,\n",
    "            \"first_name\": names_m,\n",
    "            \"last_name\": s_names_m,\n",
    "            \"true_id\": true_id_m,\n",
    "            \"user_id\": id_m,\n",
    "            \"sex\": 2,\n",
    "        }\n",
    "    )\n",
    "\n",
    "    # merge male and female random sets\n",
    "    random_people = pd.concat([random_f, random_m])\n",
    "\n",
    "    return random_people"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random_people = get_random_people(full_friends, names, surnames)\n",
    "random_people.drop(columns=[\"true_id\"]).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So here everything is random except of true_id - which is a column of real users IDs (my friends). (I drop it just to show created dataset, but not real IDs).\n",
    "\n",
    "**Create new friend list according to the true_id column:**\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "full_friends_new = []\n",
    "for person in full_friends:\n",
    "    # taking new ID from random_people data set according to current user ID:\n",
    "    person_dict = {}\n",
    "    person_data = random_people[random_people[\"true_id\"] == person[\"uid\"]]\n",
    "\n",
    "    # keep all parameters from random_people according to current person\n",
    "    person_dict[\"first_name\"] = person_data.first_name.values[0]\n",
    "    person_dict[\"last_name\"] = person_data.last_name.values[0]\n",
    "    # retyping here because of problem with JSON serialization numpy int64\n",
    "    person_dict[\"sex\"] = int(person_data.sex.values[0])\n",
    "    person_dict[\"uid\"] = int(person_data.uid.values[0])\n",
    "    person_dict[\"user_id\"] = int(person_data.user_id.values[0])\n",
    "\n",
    "    full_friends_new.append(person_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# just printed first 2 \"new\" friends:\n",
    "full_friends_new[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\n",
    "    \"quantity of friends in my graph with real people : \",\n",
    "    len(full_friends),\n",
    "    \"\\nquantity of friends in my graph with random people : \",\n",
    "    len(full_friends_new),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, everything is fine. Let's continue with updating full graph, where should be friends and friends of friends:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Creating new graph according to random_people dataset:**\n",
    "\n",
    "Also here I will drop all people (just skip them), that are not in my friendlist, so this operation will reduce the size of dict. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "full_graph_new = {}\n",
    "\n",
    "for person in list(full_graph.keys()):\n",
    "    # taking new ID from random_people data set according to current user ID:\n",
    "    new_id = random_people[random_people[\"true_id\"] == int(person)].uid.values[0]\n",
    "\n",
    "    list_com_friends = []\n",
    "\n",
    "    for i in full_graph[person]:\n",
    "        # if person have friends in my friendlist, append them from random_people data set:\n",
    "        if i[\"uid\"] in random_people.true_id.values:\n",
    "            person_dict = {}\n",
    "\n",
    "            person_data = random_people[random_people[\"true_id\"] == i[\"uid\"]]\n",
    "\n",
    "            person_dict[\"first_name\"] = person_data.first_name.values[0]\n",
    "            person_dict[\"last_name\"] = person_data.last_name.values[0]\n",
    "            # retyping here because of problem with JSON serialization numpy int64\n",
    "            person_dict[\"sex\"] = int(person_data.sex.values[0])\n",
    "            person_dict[\"uid\"] = int(person_data.uid.values[0])\n",
    "            person_dict[\"user_id\"] = int(person_data.user_id.values[0])\n",
    "\n",
    "            list_com_friends.append(person_dict)\n",
    "    if list_com_friends != []:\n",
    "        full_graph_new[\"{}\".format(new_id)] = list_com_friends"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\n",
    "    \"quantity of people in full graph that have real friends from my list : \",\n",
    "    len(full_graph),\n",
    "    \"\\nquantity of people in full graph that have random 'new' friends : \",\n",
    "    len(full_graph_new),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's see someone's connections :\n",
    "full_graph_new[list(full_graph_new.keys())[1]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# also saving new data\n",
    "\n",
    "with open(\"full_graph_rand_people.txt\", \"w+\") as f:\n",
    "    f.write(json.dumps(full_graph_new))\n",
    "\n",
    "with open(\"full_friends_rand_people.txt\", \"w+\") as f:\n",
    "    f.write(json.dumps(full_friends_new))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yep! We went out from super private friendlist to super public - now you can generate people infinitly and save links between them! Nice.\n",
    "\n",
    " That was some kind of 'preprocessing' of our graph.\n",
    "\n",
    "The next step will be creating Python graph with NetworkX!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 1.2 Lazy fast start\n",
    "\n",
    "\n",
    "\n",
    ">#### 1.2.1 Uploading data for building graph\n",
    "\n",
    "As we remember, we downloaded the data for our future graph to the local storage. So you can use it.\n",
    "\n",
    "I will give you a real graph, but with random generated names.\n",
    "\n",
    "If you wasn't with us in previous part, you can load the necessary data from here:\n",
    "\n",
    "> [full_friends_rand_people](https://github.com/Mercurialll/tutors_and_projs/blob/master/jupyter_english/tutorials/full_friends_rand_people.txt)\n",
    "\n",
    "> [full_graph_rand_people](https://github.com/Mercurialll/tutors_and_projs/blob/master/jupyter_english/tutorials/full_graph_rand_people.txt)\n",
    "\n",
    "Now it's time to load it back, or as I do, to continue with new generated : "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If you have constracted your own graph withour renaming, load it from your storage:\n",
    "\n",
    "with open(\"full_graph_depth2.txt\") as f:\n",
    "    full_graph = json.loads(f.read())\n",
    "\n",
    "with open(\"full_friends.txt\") as f:\n",
    "    full_friends = json.loads(f.read())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If you've run every operation step by step with me. so load this :\n",
    "# pay attention that I will work with full_graph and full_friends, but meaning that sets,\n",
    "# that I generated in previous steps\n",
    "\n",
    "# or if you skipped everything, it's also for you:\n",
    "\n",
    "with open(\"full_graph_rand_people.txt\") as f:\n",
    "    full_graph = json.loads(f.read())\n",
    "\n",
    "with open(\"full_friends_rand_people.txt\") as f:\n",
    "    full_friends = json.loads(f.read())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"all friends: \", len(full_friends), \", nodes for graph: \", len(full_graph))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice, that there are 29 'lost' people.\n",
    "\n",
    "Fortunetly, they are Ok, they are absent for the pretty obvious reason:\n",
    "\n",
    "> They don't have in their friendlists anyone from my friends. And I will not appear in my graph for sure, so they have no any connection with somebody - and they were eliminated several steps ago.\n",
    "\n",
    "So we have reasons to cut out our friendlist also : "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "full_friends_cutted = []\n",
    "\n",
    "connected_people = [int(i) for i in list(full_graph.keys())]\n",
    "\n",
    "for person in full_friends:\n",
    "    if person[\"uid\"] in connected_people:\n",
    "        full_friends_cutted.append(person)\n",
    "\n",
    "full_friends = copy.copy(full_friends_cutted)\n",
    "\n",
    "del full_friends_cutted\n",
    "gc.collect()\n",
    "\n",
    "len(full_friends)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> #### 1.2.2 Building Graph with NetworkX\n",
    "   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# calling base class for undirected graphs and create empty graph:\n",
    "\n",
    "G = nx.Graph()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fullfil the nodes in graph :\n",
    "\n",
    "for i in full_friends:\n",
    "    G.add_node(i[\"uid\"], name=i[\"first_name\"] + \" \" + i[\"last_name\"], sex=i[\"sex\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# establish connections between people :\n",
    "\n",
    "my_friends = list(nx.nodes(G))\n",
    "for i in my_friends:\n",
    "    for j in full_graph[\"{}\".format(int(i))]:\n",
    "        if j[\"uid\"] in my_friends:\n",
    "            G.add_edge(i, j[\"uid\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> #### 1.2.3 Saving created Graph\n",
    "         "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nx.write_gpickle(G, \"my_graph.gpickle\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's move to next part! We'll explore some easy attributes of graph, that we've created. And using that knowledgement - build pseudo-random graph. \n",
    " \n",
    "Then we are going to visualize both of them - we will see huge difference."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">>> ## 2. Inspection of the Graph\n",
    "\n",
    ">> ### 2.1 Loading graph from source\n",
    "\n",
    "You can get the created graph from this [link](https://github.com/Mercurialll/tutors_and_projs/blob/master/jupyter_english/tutorials/my_graph.gpickle)\n",
    "\n",
    "Or if you created it properly with me, read from storage:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G = nx.read_gpickle(\"my_graph.gpickle\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 2.2 Getting deeper in Graph theory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Edges**\n",
    "\n",
    "Your graph edges are represented by a list of tuples of length 3. The first two elements are the node names linked by the edge. The third is the dictionary of edge attributes.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Preview first 5 edges\n",
    "list(G.edges(data=True))[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since here are no edges attributes - the 3rd element is empty.\n",
    "\n",
    "**Nodes**\n",
    "\n",
    "Similarly, your nodes are represented by a list of tuples of length 2. The first element is the node ID, followed by the dictionary of node attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Preview first 10 nodes\n",
    "list(G.nodes(data=True))[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Summary Stats**\n",
    "\n",
    "Print out some summary statistics before visualizing the graph.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"# of edges: {}\".format(G.number_of_edges()))\n",
    "print(\"# of nodes: {}\".format(G.number_of_nodes()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **degree (or valency) of a vertex** of a graph is the number of edges incident to the vertex, with loops counted twice. \n",
    "\n",
    "Look at degree of every vertex in Graph :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Preview first 10 nodes\n",
    "# node : degree\n",
    "list(G.degree())[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pay attention to hist of a **distribution of the graph's degrees**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.hist(list(dict(G.degree()).values()), 20, facecolor=\"blue\", alpha=0.5)\n",
    "plt.title(\"Degrees in th Graph\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's compute **the average clustering coefficient** for the graph G.\n",
    "\n",
    "The clustering coefficient for the graph is :\n",
    "\n",
    "$$C = \\frac{1}{n}\\sum_{v \\in G} c_v$$\n",
    "\n",
    "where **n** - is the number of nodes in Graph G. \n",
    "\n",
    "And **$c_v$** - the local clustering coefficient of a vertex in a graph  (quantifies how close its neighbours are to being a clique (complete graph))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"average clustering coefficient for the graph G : \", nx.average_clustering(G))\n",
    "plt.hist(list(nx.clustering(G).values()))\n",
    "plt.title(\"Clustering coefficients over the Graph\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now it's time to find out what will be changed, if we deal with random generated graphs :"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 2.2 Creation of a pseudo-random Graph\n",
    "   \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First thing we will do - creation of 100 random graphs with the same number of edges and vertices and look at the average clustering coefficient.\n",
    "\n",
    "nx.gnm_random_graph():\n",
    "\n",
    "Returns a  random graph. In the  model, a graph is chosen uniformly at random from the set of all graphs with  nodes and  edges.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "average_clust_coefs = []\n",
    "for i in range(100):\n",
    "    GR = nx.gnm_random_graph(len(G.nodes()), len(G.edges))\n",
    "    average_clust_coefs.append(nx.average_clustering(GR))\n",
    "print(\n",
    "    \"The average over average clustering coefficients random graphs: \",\n",
    "    np.mean(average_clust_coefs),\n",
    ")\n",
    "\n",
    "plt.hist(list(nx.clustering(GR).values()))\n",
    "plt.title(\"Clustering coefficients over the last random Graph\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, average clustering coefficient is around 10 times smaller than in our real graph, although the number of nodes and edges the same. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 2.3 Graphs Visualization\n",
    "     \n",
    "The easiest way to draw our graph is to use nx.draw_kamada_kawai() :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(1, 1, figsize=(10, 5))\n",
    "plt.title(\"My graph\", fontsize=20)\n",
    "nx.draw_kamada_kawai(G)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is a bit ugly, and without additional information is too simple, lazy and not interesting. \n",
    "\n",
    "So we will build our own function with good properties.\n",
    "\n",
    "You can play with different parameters. XKCD gives some nice effects, but not necessary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_graph(g, coloring=[], palette=plt.cm.Set2):\n",
    "    with plt.xkcd():\n",
    "        k = nx.degree(g)\n",
    "        plt.figure(1, figsize=(60, 45))\n",
    "        coord = nx.kamada_kawai_layout(g)\n",
    "        labels = {nd: g.node[nd][\"name\"] for (nd) in g.nodes()}\n",
    "        if len(coloring) > 0:\n",
    "            nx.draw_networkx(\n",
    "                g,\n",
    "                pos=coord,\n",
    "                nodelist=dict(k).keys(),\n",
    "                node_size=[v * 50 for v in dict(k).values()],\n",
    "                font_size=17,\n",
    "                node_color=coloring,\n",
    "                labels=labels,\n",
    "                cmap=palette,\n",
    "            )\n",
    "        else:\n",
    "            nx.draw_networkx(\n",
    "                g,\n",
    "                pos=coord,\n",
    "                nodelist=dict(k).keys(),\n",
    "                node_size=[v * 50 for v in dict(k).values()],\n",
    "                font_size=17,\n",
    "                labels=labels,\n",
    "            )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "plot_graph(G)\n",
    "\n",
    "# saving picture if you need it:\n",
    "# plt.savefig(\"../../img/my_detailed_graph.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> you will have to get something like this: \n",
    "<img src=\"https://habrastorage.org/webt/gi/g5/zt/gig5zthdxycrh4ezxbbf2k3q0s0.png\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So it's much better.\n",
    "\n",
    "Double click on it or just open in another window - you will be able to get familiar with all people and connections between them. \n",
    "\n",
    "But don't forget to look on random generated graph. Let's build it another way and visualize in small easy format.\n",
    "\n",
    "We will swap edges using built-in function of NetwrokX"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G_random = copy.deepcopy(G)\n",
    "np.random.seed(17)\n",
    "G_random = nx.algorithms.swap.double_edge_swap(G_random, nswap=1000, max_tries=100000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(1, 1, figsize=(10, 5))\n",
    "plt.title(\"Random graph\", fontsize=20)\n",
    "nx.draw_kamada_kawai(G_random)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pretty strange picture, isn't it? It's not similar to our real graph at all. As I already said, the number of nodes and edges the same as in the real graph, but the problem here is in clustering coefficients. \n",
    "\n",
    "We will use this graph on epidemic modeling too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">>> ## 3. Introduction in Epidemics on Networks\n",
    "\n",
    ">> ### 3.1 Basics of epidemic modeling \n",
    "\n",
    "    The epidemic model is intellectual source for information diffusion research. The first known mathematical model of epidemiology was formulated by Daniel Bernoulli (1760), when he studied the mortality rates in order to eradicate the smallpox. However, it was not until the early twentieth century, that deterministic modeling of epidemiology started. \n",
    "    \n",
    "    Ross (1911) developed differential equation models of epidemics in 1911. Later, Kermack and McKendrick (1927) found the epidemic threshold and they argued that the density of susceptible must exceed a critical value to make the outbreak of an epidemic happen.\n",
    "\n",
    "\n",
    "The mathematical models developed by epidemic research help clarify assumptions, variables, and parameters for diffusion research, lead to useful concepts (e.g., threshold, reproduction number), supply an experimental tool for testing theoretical conjectures, and forecast epidemic spreading in the future (Hethcote, 2009). Although epidemic models are simplifications of reality, they help us refine our understandings about the logic of diffusion beneath social realities (disease transmission, information diffusion through networks, and adoption of new technologies or behaviors). To understand the epidemic models in a better way, I will review **the basic epidemic models: SI, SIR, SIS**, and the applications in networks.\n",
    "\n",
    " However, despite the many advantages of deterministic models, it can be difficult to include realistic population networks, to incorporate realistic probability distributions for the time spent in the infectious period, and to assess the probability of an outbreak. Thus, the stochastic epidemic simulations, such as stochastic differential equations, Markov Chain Monte Carlo (MCMC), and agent based modeling, have been used to remedy the defect."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Giving definitions of models : \n",
    "\n",
    ">**SI model :**\n",
    "\n",
    "A simple mathematical model of the spread of a disease in a population :\n",
    "\n",
    "**S(t)** are those susceptible but not yet infected with the disease\n",
    "\n",
    "**I(t)** is the number of infectious individuals\n",
    "\n",
    "In this model a sustainable infection process is considered. Infected part of population has no chance to be healed... \n",
    "\n",
    ">\n",
    ">**SIR model :**\n",
    "\n",
    "The more realistic, then first one, mathematical model of the spread of a disease in a population.\n",
    "\n",
    "Here the population of N individuals divides into three \"compartments\" which may vary as a function of time, t:\n",
    "\n",
    "**S(t)** are those susceptible but not yet infected with the disease\n",
    "\n",
    "**I(t)** is the number of infectious individuals\n",
    "\n",
    "**R(t)** are those individuals who have recovered from the disease and now have immunity to it.\n",
    "\n",
    ">\n",
    ">**SIS model :**\n",
    "\n",
    "Another extension of the SI model is the one that allows for reinfection:\n",
    "\n",
    "**S(t)** are those susceptible but not yet infected with the disease\n",
    "\n",
    "**I(t)** is the number of infectious individuals\n",
    "\n",
    "**S(t)** infected individuals become susceptible after recovery.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Firstly, as we wanted to explore epidemic models, we have to be sure, that all nodes in our graph are connected to each other. And since it is, most probably, impossible to have all nodes connected in real graph, we will find the biggest **connected component**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 3.2 Connected components\n",
    "\n",
    "A connected component (or just component) of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which are connected to no additional vertices in the supergraph. For example, the graph shown in the illustration has three connected components. A vertex with no incident edges is itself a connected component. A graph that is itself connected has exactly one connected component, consisting of the whole graph.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# find the largest connected component:\n",
    "largest_cc = max(nx.connected_components(G), key=len)\n",
    "\n",
    "# take istead of our Graph this component:\n",
    "g = nx.Graph(G.subgraph(largest_cc))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Number of nodes are in the largest component of real graph: \", len(g.nodes))\n",
    "fig, ax = plt.subplots(1, 1, figsize=(10, 5))\n",
    "plt.title(\"the largest connected component of real graph\", fontsize=20)\n",
    "nx.draw_kamada_kawai(g)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Almost all nodes are here! In full graph I had 280 nodes, here : 261. Pretty large connected component!\n",
    "\n",
    "Getting the same for random graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "largest_cc = max(nx.connected_components(G_random), key=len)\n",
    "g_random = nx.Graph(G_random.subgraph(largest_cc))\n",
    "\n",
    "print(\n",
    "    \"Number of nodes are in the largest component of random graph: \",\n",
    "    len(g_random.nodes),\n",
    ")\n",
    "fig, ax = plt.subplots(1, 1, figsize=(10, 5))\n",
    "plt.title(\"the largest connected component of random graph\", fontsize=20)\n",
    "nx.draw_kamada_kawai(g_random)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, 279 of 280 here. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">>> ## 4. SI Model\n",
    "\n",
    ">> ### 4.1 Statement of the model \n",
    "   \n",
    "As I have mentioned, in this model infected part of population has no chance to be healed. \n",
    "\n",
    "Here are the equations:\n",
    "\n",
    "\\begin{equation}\n",
    "   \\begin{cases}\n",
    "   \\cfrac{ds(t)}{dt} = -\\beta\\,s(t)i(t)\\\\\n",
    "   \\cfrac{di(t)}{dt} = \\beta\\,s(t)i(t)\n",
    "  \\end{cases}\n",
    "  \\\\\n",
    "  i(t) + s(t) = 1\n",
    "\\end{equation}\n",
    "\n",
    "To solve this differential equation, we can get the cumulative growth curve as a function of time:\n",
    "\n",
    "$$I[t]= \\frac{x_{0} e^{\\beta t }}{1-x_{0}+ x_{0} e^{\\beta t }}$$\n",
    "\n",
    "Interestingly, this is a logistic growth featured by its S-shaped curve.\n",
    "$x_{0}$ - is the initial value of I[t].\n",
    "\n",
    ">\n",
    "\n",
    "> **odeint()** from scipy will solve a system of ordinary differential equations for us.\n",
    "\n",
    "For correct calling from documentation :\n",
    "\n",
    "dy/dt = func(y, t, ...), where y can be a vector.\n",
    "\n",
    "Parameters:\n",
    "func : callable(y, t, args …) Computes the derivative of y at t.\n",
    "\n",
    "y0 : array Initial condition on y (can be a vector).\n",
    "\n",
    "t : array A sequence of time points for which to solve for y. The initial value point should be the first element of this sequence.\n",
    "\n",
    "args : tuple, optional Extra arguments to pass to function.\n",
    "\n",
    "> let's use it and look on results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# spreading coefficient\n",
    "beta = 0.3\n",
    "\n",
    "# initial state\n",
    "# we will start from 0.01 infected :\n",
    "i0 = 0.01\n",
    "z0 = [1 - i0, i0]\n",
    "\n",
    "# time domain\n",
    "t = np.arange(35)\n",
    "\n",
    "# system of differential equations:\n",
    "def si(z, t, beta):\n",
    "    return np.array([-beta * z[1] * z[0], beta * z[1] * z[0]])\n",
    "\n",
    "\n",
    "# solve:\n",
    "z = odeint(si, z0, t, (beta,))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets plot our solution\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(14, 6))\n",
    "\n",
    "plt.title(\"SI epidemic theoretical\")\n",
    "lines = ax.plot(z)\n",
    "plt.setp(lines[0], color=\"blue\")\n",
    "plt.setp(lines[1], color=\"red\")\n",
    "\n",
    "ax.set_xlabel(\"Time\")\n",
    "ax.set_ylabel(\"Population\")\n",
    "ax.legend([\"$Susceptible$\", \"$Infected$\"])\n",
    "\n",
    "# ax[1].plot(z[:,1], z[:,0])\n",
    "# ax[1].set_xlabel('$I$')\n",
    "# ax[1].set_ylabel('$S$')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see, that after t=30 almost all are infected. The curve grows exponentially shortly after the system is infected, and then saturates as the number of susceptible shrinks which makes it harder to find the next victims. Thus, it could be used to model the classic diffusion of innovations. \n",
    "\n",
    "> It's time to implement our own SI model for our Graphs. \n",
    "\n",
    "We are going to start with creating usefull function:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_infection_quant(G, beta, random_state):\n",
    "    N = len(G.nodes)\n",
    "    Nodes = list(G.nodes.keys())\n",
    "\n",
    "    # dict for all people with values : 0 = ok, 1=ill\n",
    "    infection_dict = dict(zip(Nodes, [0 for i in range(N)]))\n",
    "\n",
    "    # initial infected random person:\n",
    "    np.random.seed(random_state)\n",
    "    i0 = Nodes[np.random.randint(0, N)]\n",
    "    infection_dict[i0] = 1\n",
    "\n",
    "    # array for infected people :\n",
    "    infected_nodes = []\n",
    "    infected_nodes.append(i0)\n",
    "\n",
    "    # array for (total quantity of infected)/N on each step :\n",
    "    infection_quant = []\n",
    "    infection_quant.append(len(infected_nodes) / N)\n",
    "\n",
    "    # do while all people get infected :\n",
    "    while infection_quant[-1] != 1:\n",
    "        # take every infected person:\n",
    "        for person in infected_nodes:\n",
    "            # take every person's friend:\n",
    "            for friend in nx.edges(G, person):\n",
    "                # if friend not infected, with proba beta get infected\n",
    "                if infection_dict[friend[1]] == 0:\n",
    "                    infection_dict[friend[1]] = int(np.random.rand() < beta)\n",
    "        # update list of infected people:\n",
    "        infected_nodes = [\n",
    "            indval2[1]\n",
    "            for indval1, indval2 in zip(\n",
    "                enumerate(list(infection_dict.values())),\n",
    "                enumerate(list(infection_dict.keys())),\n",
    "            )\n",
    "            if indval1[1] == 1\n",
    "        ]\n",
    "        # update list of quantity of infected:\n",
    "        infection_quant.append(len(infected_nodes) / N)\n",
    "\n",
    "    return infection_quant"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 4.2  Implementation in Real Graph\n",
    "\n",
    "Since we created the function, let's use it with the same beta as in theoretical implementation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "beta = 0.3\n",
    "seed = 17\n",
    "N_SIMULATIONS = 1000\n",
    "\n",
    "SI_my_graph = []\n",
    "for i in range(N_SIMULATIONS):\n",
    "    SI_my_graph.append(get_infection_quant(G=g, beta=beta, random_state=seed * (i + 1)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# take average over simulations :\n",
    "SI_my_graph_means = pd.DataFrame(SI_my_graph).mean(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplots(1, 1, figsize=(8, 4))\n",
    "SI_my_graph_means.plot()\n",
    "plt.title(\"SI_my_graph\")\n",
    "plt.xlabel(\"time\")\n",
    "plt.ylabel(\"Average infection speed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">> ### 4.3 Implementation in  Random Graph\n",
    "\n",
    "The same for random : "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "beta = 0.3\n",
    "seed = 17\n",
    "N_SIMULATIONS = 1000\n",
    "\n",
    "SI_random_graph = []\n",
    "for i in range(N_SIMULATIONS):\n",
    "    SI_random_graph.append(\n",
    "        get_infection_quant(G=g_random, beta=beta, random_state=seed * (i + 1))\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "SI_random_graph_means = pd.DataFrame(SI_random_graph).mean(axis=0)\n",
    "\n",
    "plt.subplots(1, 1, figsize=(8, 4))\n",
    "SI_random_graph_means.plot()\n",
    "plt.title(\"SI_random_graph\")\n",
    "plt.xlabel(\"time\")\n",
    "plt.ylabel(\"Average infection speed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, the growth of the random graph is faster, comparing with real.\n",
    "\n",
    "And as expected the curves grow exponentially shortly after the system is infected. \n",
    "\n",
    ">> ### 4.4 Compare with EoN modeling\n",
    "\n",
    "There is NO clean SI model in the library, but we may just as well take SIR model and set recovery rate as 0. \n",
    "\n",
    "Go for it and compare the results : "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplots(1, 1, figsize=(18, 8))\n",
    "\n",
    "# take my real graph\n",
    "G = g\n",
    "\n",
    "# parameters\n",
    "tmax = 50  # time ending\n",
    "iterations = 5  # run N simulations\n",
    "tau = 0.3  # transmission rate\n",
    "gamma = 0.0  # recovery rate\n",
    "rho = 0.01  # random fraction initially infected\n",
    "\n",
    "\n",
    "# ODE (Ordinary differential equation) predictions:\n",
    "# run simulations\n",
    "for counter in range(iterations):\n",
    "    t, S, I, R = EoN.fast_SIR(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "    if counter == 0:\n",
    "        plt.plot(t, I, color=\"k\", alpha=0.3, label=\"Simulation\")\n",
    "    plt.plot(t, I, color=\"k\", alpha=0.3)\n",
    "\n",
    "\n",
    "# we expect a homogeneous model to perform poorly because the degree\n",
    "# distribution is very heterogeneous\n",
    "t, S, I, R = EoN.SIR_homogeneous_pairwise_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, \"-.\", label=\"Homogeneous pairwise\", linewidth=5)\n",
    "\n",
    "\n",
    "# meanfield models will generally overestimate SIR growth because they\n",
    "# treat partnerships as constantly changing.\n",
    "t, S, I, R = EoN.SIR_heterogeneous_meanfield_from_graph(\n",
    "    G, tau, gamma, rho=rho, tmax=tmax\n",
    ")\n",
    "plt.plot(t, I, \":\", label=\"Heterogeneous meanfield\", linewidth=5)\n",
    "\n",
    "\n",
    "# The EBCM model does not account for degree correlations or clustering\n",
    "t, S, I, R = EoN.EBCM_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, \"--\", label=\"EBCM approximation\", linewidth=5)\n",
    "\n",
    "\n",
    "# the preferential mixing model captures degree correlations.\n",
    "t, S, I, R = EoN.EBCM_pref_mix_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, label=\"Pref mix EBCM\", linewidth=5, dashes=[4, 2, 1, 2, 1, 2])\n",
    "\n",
    "\n",
    "plt.xlabel(\"$t$\")\n",
    "plt.ylabel(\"Number infected\")\n",
    "plt.legend()\n",
    "\n",
    "plt.title(\"Infected SI distibution for real graph\")\n",
    "\n",
    "# save pic if you need\n",
    "# plt.savefig('../../img/SI_my_graph_EoN.png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> you will have to get something like this: \n",
    "<img src=\"https://habrastorage.org/webt/a8/yo/op/a8yoopxadmfqqy8mzr37ispl24q.png\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nice! We got very similar results with different implementations of model. \n",
    "\n",
    ">In the naive model of SI, once one is infected, it is always infectious. However, this is not realistic for many situations of disease spreading. For many diseases, people recover after a certain time because their immune systems act to fight with the diseases. \n",
    "\n",
    ">There is usually a status of recovery denoted by R. Let γ denote the removal or recovery rate. Usually, researchers are more interested in its reciprocal (1/γ) which determines the average infectious period."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">>> ## 5. SIR Model\n",
    "\n",
    ">> ### 5.1 Statement of the model\n",
    "\n",
    "We already know something abour SIR, let's repeat : in the first stage, susceptible individuals become infected by the infectious ones with who they contact. Similar to the SI model, β is the transmission rate between individuals; In the second stage, infected individuals recover at the average rate γ. Given the premise that underlying epidemiological rates are constant, the differential equations of simple SIR model (with no births, deaths, or migrations) are:\n",
    "\n",
    "\\begin{equation}\n",
    "   \\begin{cases}\n",
    "   \\cfrac{ds_i(t)}{dt} = -\\beta s_i(t)\\sum\\limits_j A_{ij} x_j(t)\\\\\n",
    "   \\cfrac{dx_i(t)}{dt} = \\beta s_i(t)\\sum\\limits_j A_{ij} x_j(t) - \\gamma x_i(t)\\\\\n",
    "   \\cfrac{dr_i(t)}{dt} = \\gamma x_i(t)\n",
    "  \\end{cases}\n",
    "  \\\\\n",
    "  x_i(t) + s_i(t) + r_i(t) = 1\n",
    "\\end{equation}\n",
    "\n",
    "> However, the differential equations above can not be analytically solved. In practice, researchers can evaluate SIR model numerically. We will do it.\n",
    "\n",
    "Not so complex view of ODE (for calculating without graph) : \n",
    "\n",
    "\\begin{align*}\n",
    "\\frac{\\mathrm{d}S}{\\mathrm{d}t} &= -\\frac{\\beta S I}{N},\\\\\n",
    "\\frac{\\mathrm{d}I}{\\mathrm{d}t} &= \\frac{\\beta S I}{N} - \\gamma I,\\\\\n",
    "\\frac{\\mathrm{d}R}{\\mathrm{d}t} &= \\gamma I.\n",
    "\\end{align*}\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Total population\n",
    "N = 1000\n",
    "\n",
    "# Initial number of infected and recovered individuals, I0 and R0\n",
    "I0, R0 = 1, 0\n",
    "# Everyone else, S0, is susceptible to infection initially\n",
    "S0 = N - I0 - R0\n",
    "\n",
    "# Contact rate, beta, and mean recovery rate, gamma\n",
    "beta, gamma = 0.2, 1.0 / 10\n",
    "# A grid of time points\n",
    "t = np.linspace(0, 160, 160)\n",
    "\n",
    "# The SIR model differential equations.\n",
    "def sir(y, t, N, beta, gamma):\n",
    "    S, I, R = y\n",
    "    dSdt = -beta * S * I / N\n",
    "    dIdt = beta * S * I / N - gamma * I\n",
    "    dRdt = gamma * I\n",
    "    return dSdt, dIdt, dRdt\n",
    "\n",
    "\n",
    "# Initial conditions vector\n",
    "y0 = S0, I0, R0\n",
    "# Integrate the SIR equations over the time grid, t.\n",
    "sir_ = odeint(sir, y0, t, args=(N, beta, gamma))\n",
    "S, I, R = sir_.T\n",
    "\n",
    "\n",
    "# Plot the data on three separate curves for S(t), I(t) and R(t)\n",
    "fig = plt.figure(figsize=(18, 8))\n",
    "ax = fig.add_subplot(\"111\", axisbelow=True)\n",
    "ax.plot(t, S / 1000, \"b\", alpha=0.5, lw=2, label=\"Susceptible\")\n",
    "ax.plot(t, I / 1000, \"r\", alpha=0.5, lw=2, label=\"Infected\")\n",
    "ax.plot(t, R / 1000, \"g\", alpha=0.5, lw=2, label=\"Recovered with immunity\")\n",
    "ax.set_xlabel(\"Time\")\n",
    "ax.set_ylabel(\"Population \")\n",
    "ax.set_ylim(0, 1.2)\n",
    "ax.yaxis.set_tick_params(length=0)\n",
    "ax.xaxis.set_tick_params(length=0)\n",
    "ax.grid(b=True, which=\"major\", c=\"w\", lw=2, ls=\"-\")\n",
    "legend = ax.legend()\n",
    "legend.get_frame().set_alpha(0.5)\n",
    "for spine in (\"top\", \"right\", \"bottom\", \"left\"):\n",
    "    ax.spines[spine].set_visible(False)\n",
    "\n",
    "plt.title(\"SIR epidemic theoretical\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n",
    ">> ### 5.2 Implementation in Real Graph\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# take my real graph\n",
    "n = len(g)\n",
    "G = g\n",
    "\n",
    "# Get adj. matrix\n",
    "A = np.array(nx.adj_matrix(G).todense())\n",
    "\n",
    "# Spreading\\restoring coefficient\n",
    "beta, gamma = 0.3, 0.2\n",
    "\n",
    "# Time domain\n",
    "t = np.arange(0, 15, 0.05)\n",
    "\n",
    "# Initial state\n",
    "idx = np.random.choice(range(n), 30)\n",
    "i0 = np.zeros((n,))\n",
    "i0[idx] = 1\n",
    "z0 = np.concatenate((1 - i0, i0, np.zeros((n,))))\n",
    "\n",
    "# System of differential equations:\n",
    "def sir(z, t, A, n, beta, gamma):\n",
    "    return np.concatenate(\n",
    "        (\n",
    "            -beta * z[0:n] * A.dot(z[n : 2 * n]),\n",
    "            beta * z[0:n] * A.dot(z[n : 2 * n]) - gamma * z[n : 2 * n],\n",
    "            gamma * z[n : 2 * n],\n",
    "        )\n",
    "    )\n",
    "\n",
    "\n",
    "# solve\n",
    "z = odeint(sir, z0, t, (A, n, beta, gamma))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot average over all nodes\n",
    "s = z[:, 0:n].mean(axis=1)\n",
    "x = z[:, n : 2 * n].mean(axis=1)\n",
    "r = z[:, 2 * n : 3 * n].mean(axis=1)\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(18, 8))\n",
    "ax.plot(s, color=\"blue\", label=\"Susceptible\")\n",
    "ax.plot(x, color=\"red\", label=\"Infected\")\n",
    "ax.plot(r, color=\"green\", label=\"Recovered with immunity\")\n",
    "\n",
    "ax.set_xlabel(\"Time\")\n",
    "ax.set_ylabel(\"Population\")\n",
    "ax.set_title(\"Average results for SIR over all nodes for real graph\", fontsize=15)\n",
    "\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see, that results seem to be similar to our theory implementation.\n",
    "\n",
    "Now it's time for random graph.\n",
    "\n",
    ">> ### 5.3 Implementation in  Random Graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# take random graph\n",
    "n = len(g_random)\n",
    "G = g_random\n",
    "\n",
    "# Get adj. matrix\n",
    "A = np.array(nx.adj_matrix(G).todense())\n",
    "\n",
    "# Spreading\\restoring coefficient\n",
    "beta, gamma = 0.3, 0.2\n",
    "\n",
    "# Time domain\n",
    "t = np.arange(0, 15, 0.05)\n",
    "\n",
    "# Initial state\n",
    "idx = np.random.choice(range(n), 30)\n",
    "i0 = np.zeros((n,))\n",
    "i0[idx] = 1\n",
    "z0 = np.concatenate((1 - i0, i0, np.zeros((n,))))\n",
    "\n",
    "# solve\n",
    "z = odeint(sir, z0, t, (A, n, beta, gamma))\n",
    "\n",
    "# Plot average over all nodes\n",
    "s = z[:, 0:n].mean(axis=1)\n",
    "x = z[:, n : 2 * n].mean(axis=1)\n",
    "r = z[:, 2 * n : 3 * n].mean(axis=1)\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(18, 8))\n",
    "ax.plot(s, color=\"blue\", label=\"Susceptible\")\n",
    "ax.plot(x, color=\"red\", label=\"Infected\")\n",
    "ax.plot(r, color=\"green\", label=\"Recovered with immunity\")\n",
    "\n",
    "ax.set_xlabel(\"Time\")\n",
    "ax.set_ylabel(\"Population\")\n",
    "ax.set_title(\"Average results for SIR over all nodes for random graph\", fontsize=15)\n",
    "\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we can notice, that random graph : \n",
    "\n",
    "Infected curve has more smoother peak\n",
    "\n",
    "Susceptible curve has more steeper descent.\n",
    "\n",
    ">> ### 5.4 Compare with EoN modeling\n",
    "\n",
    "Let's build only infected curve, it's pretty representative."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplots(1, 1, figsize=(18, 8))\n",
    "\n",
    "# take my real graph\n",
    "G = g\n",
    "\n",
    "# parameters\n",
    "tmax = 20  # time ending\n",
    "iterations = 5  # run N simulations\n",
    "tau = 0.3  # transmission rate\n",
    "gamma = 0.2  # recovery rate\n",
    "# I'll take here as in my models :\n",
    "rho = 30 / (len(g.nodes))  # random fraction initially infected\n",
    "\n",
    "\n",
    "# ODE (Ordinary differential equation) predictions:\n",
    "# run simulations\n",
    "for counter in range(iterations):\n",
    "    t, S, I, R = EoN.fast_SIR(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "    if counter == 0:\n",
    "        plt.plot(t, I, color=\"k\", alpha=0.3, label=\"Simulation\")\n",
    "    plt.plot(t, I, color=\"k\", alpha=0.3)\n",
    "\n",
    "\n",
    "# we expect a homogeneous model to perform poorly because the degree\n",
    "# distribution is very heterogeneous\n",
    "t, S, I, R = EoN.SIR_homogeneous_pairwise_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, \"-.\", label=\"Homogeneous pairwise\", linewidth=5)\n",
    "\n",
    "\n",
    "# meanfield models will generally overestimate SIR growth because they\n",
    "# treat partnerships as constantly changing.\n",
    "t, S, I, R = EoN.SIR_heterogeneous_meanfield_from_graph(\n",
    "    G, tau, gamma, rho=rho, tmax=tmax\n",
    ")\n",
    "plt.plot(t, I, \":\", label=\"Heterogeneous meanfield\", linewidth=5)\n",
    "\n",
    "\n",
    "# The EBCM model does not account for degree correlations or clustering\n",
    "t, S, I, R = EoN.EBCM_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, \"--\", label=\"EBCM approximation\", linewidth=5)\n",
    "\n",
    "\n",
    "# the preferential mixing model captures degree correlations.\n",
    "t, S, I, R = EoN.EBCM_pref_mix_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, label=\"Pref mix EBCM\", linewidth=5, dashes=[4, 2, 1, 2, 1, 2])\n",
    "\n",
    "\n",
    "plt.xlabel(\"$t$\")\n",
    "plt.ylabel(\"Number infected\")\n",
    "plt.legend()\n",
    "\n",
    "plt.title(\"Infected SIR distibution for real graph\")\n",
    "\n",
    "# save pic if you need\n",
    "# plt.savefig('../../img/SIR_my_graph_EoN.png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> you will have to get something like this: \n",
    "<img src=\"https://habrastorage.org/webt/3l/4w/jl/3l4wjlbqqfcprmut1uaiawq4xh0.png\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing the results, we can state, that the results are close to each other, and our model is good enough for this purpose. \n",
    "\n",
    "Now we are going to continue the next model. \n",
    "\n",
    ">>> ## 6. SIS Model\n",
    "\n",
    ">> ### 6.1 Statement of the model\n",
    "\n",
    "As we remember, SIS model allows for reinfection. If infected individuals are not immune to the diseases after their recovery, they can be infected more than once.\n",
    "\n",
    "There are only two states: susceptible and infected, and infected individuals become susceptible after recovery. The differential equations for the simple SIS epidemic model are: \n",
    "\n",
    "\\begin{equation}\n",
    "   \\begin{cases}\n",
    "   \\cfrac{ds_i(t)}{dt} = -\\beta s_i(t)\\sum\\limits_j A_{ij}x_j(t) + \\gamma x_i(t)\\\\\n",
    "   \\cfrac{dx_i(t)}{dt} = \\beta s_i(t)\\sum\\limits_j A_{ij}x_j(t) - \\gamma x_i(t)\n",
    "  \\end{cases}\n",
    "  \\\\\n",
    "  x_i(t) + s_i(t) = 1\n",
    "\\end{equation}\n",
    "where $x_i(t)$ and $s_i(t)$ are probabilities for a node $v_i$ to be infected or susceptable.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplots(1, 1, figsize=(18, 8))\n",
    "\n",
    "beta = 1.4247\n",
    "gamma = 0.14286\n",
    "I0 = 1e-6\n",
    "INPUT = (1.0 - I0, I0)\n",
    "t_range = np.arange(0, 21, 1)\n",
    "\n",
    "\n",
    "def sis(INP, t):\n",
    "    Y = np.zeros((2))\n",
    "    V = INP\n",
    "    Y[0] = -beta * V[0] * V[1] + gamma * V[1]\n",
    "    Y[1] = beta * V[0] * V[1] - gamma * V[1]\n",
    "    return Y  # For odeint\n",
    "\n",
    "\n",
    "sis_ = odeint(sis, INPUT, t_range)\n",
    "\n",
    "# Ploting\n",
    "plt.plot(sis_[:, 0], \"-bs\", label=\"Susceptible\")\n",
    "plt.plot(sis_[:, 1], \"-ro\", label=\"Infected\")\n",
    "plt.legend(loc=0)\n",
    "plt.title(\"SIS epidemic theoretical\")\n",
    "plt.xlabel(\"Time\")\n",
    "plt.ylabel(\"Population\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n",
    ">> ### 6.2 Implementation in Real Graph\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# take my real graph\n",
    "n = len(g)\n",
    "G = g\n",
    "\n",
    "# Get adj. matrix\n",
    "A = np.array(nx.adjacency_matrix(G).todense())\n",
    "\n",
    "# Spreading\\restoring coefficient\n",
    "beta, gamma = 0.3, 0.2\n",
    "\n",
    "# Time domain\n",
    "t = np.arange(0, 7, 0.05)\n",
    "\n",
    "# Initial state\n",
    "idx = np.random.choice(range(n), int(n * 0.1))\n",
    "i0 = np.zeros((n,))\n",
    "i0[idx] = 1\n",
    "# i0 = np.random.random_integers(0,1,[n,])\n",
    "z0 = np.concatenate((1 - i0, i0))\n",
    "\n",
    "# System of differential equations..\n",
    "def sis(z, t, A, n, beta, gamma):\n",
    "    return np.concatenate(\n",
    "        (\n",
    "            -beta * z[0:n] * A.dot(z[n : 2 * n]) + gamma * z[n : 2 * n],\n",
    "            beta * z[0:n] * A.dot(z[n : 2 * n]) - gamma * z[n : 2 * n],\n",
    "        )\n",
    "    )\n",
    "\n",
    "\n",
    "# solve\n",
    "z = odeint(sis, z0, t, (A, n, beta, gamma))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot average over all nodes\n",
    "s = z[:, 0:n].mean(axis=1)\n",
    "x = z[:, n : 2 * n].mean(axis=1)\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(18, 8))\n",
    "ax.plot(s, color=\"blue\", label=\"Susceptible\")\n",
    "ax.plot(x, color=\"red\", label=\"Infected\")\n",
    "\n",
    "ax.set_xlabel(\"Time\")\n",
    "ax.set_ylabel(\"Population\")\n",
    "ax.set_title(\"Average results for SIS over all nodes for real graph\", fontsize=15)\n",
    "\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now it's time for random graph.\n",
    "\n",
    ">> ### 6.3 Implementation in  Random Graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# take random graph\n",
    "n = len(g_random)\n",
    "G = g_random\n",
    "\n",
    "# Get adj. matrix\n",
    "A = np.array(nx.adjacency_matrix(G).todense())\n",
    "\n",
    "# Spreading\\restoring coefficient\n",
    "beta, gamma = 0.3, 0.2\n",
    "\n",
    "# Time domain\n",
    "t = np.arange(0, 7, 0.05)\n",
    "\n",
    "# Initial state\n",
    "idx = np.random.choice(range(n), int(n * 0.1))\n",
    "i0 = np.zeros((n,))\n",
    "i0[idx] = 1\n",
    "# i0 = np.random.random_integers(0,1,[n,])\n",
    "z0 = np.concatenate((1 - i0, i0))\n",
    "\n",
    "# solve\n",
    "z = odeint(sis, z0, t, (A, n, beta, gamma))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot average over all nodes\n",
    "s = z[:, 0:n].mean(axis=1)\n",
    "x = z[:, n : 2 * n].mean(axis=1)\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(18, 8))\n",
    "ax.plot(s, color=\"blue\", label=\"Susceptible\")\n",
    "ax.plot(x, color=\"red\", label=\"Infected\")\n",
    "\n",
    "ax.set_xlabel(\"Time\")\n",
    "ax.set_ylabel(\"Population\")\n",
    "ax.set_title(\"Average results for SIS over all nodes for random graph\", fontsize=15)\n",
    "\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again similar results and again a bit sharper curves. \n",
    "\n",
    "Let's compare with EoN.\n",
    "\n",
    "\n",
    ">> ### 6.4 Compare with EoN modeling\n",
    "\n",
    "As in last case, let's build only infected curve."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.clf()\n",
    "\n",
    "plt.subplots(1, 1, figsize=(18, 8))\n",
    "\n",
    "# take my real graph\n",
    "G = g\n",
    "\n",
    "# parameters\n",
    "tmax = 9  # time ending\n",
    "iterations = 5  # run N simulations\n",
    "tau = 0.3  # transmission rate\n",
    "gamma = 0.2  # recovery rate\n",
    "rho = 0.01  # random fraction initially infected\n",
    "\n",
    "for counter in range(iterations):\n",
    "    t, S, I = EoN.fast_SIS(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "    if counter == 0:\n",
    "        plt.plot(t, I, color=\"k\", alpha=0.3, label=\"Simulation\")\n",
    "    plt.plot(t, I, color=\"k\", alpha=0.3)\n",
    "\n",
    "\n",
    "# we expect a homogeneous model to perform poorly because the degree\n",
    "# distribution is very heterogeneous\n",
    "t, S, I = EoN.SIS_homogeneous_pairwise_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, \"-.\", label=\"Homogeneous pairwise\", linewidth=5)\n",
    "\n",
    "t, S, I = EoN.SIS_heterogeneous_meanfield_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, \":\", label=\"Heterogeneous meanfield\", linewidth=5)\n",
    "\n",
    "t, S, I = EoN.SIS_compact_pairwise_from_graph(G, tau, gamma, rho=rho, tmax=tmax)\n",
    "plt.plot(t, I, \"--\", label=\"Compact pairwise\", linewidth=5)\n",
    "\n",
    "plt.xlabel(\"Time\")\n",
    "plt.ylabel(\"Number infected\")\n",
    "plt.legend()\n",
    "plt.title(\"Infected SIS distibution for real graph\")\n",
    "\n",
    "# save pic if you need\n",
    "# plt.savefig('../../img/SIS_my_graph_EoN.png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> you will have to get something like this: \n",
    "<img src=\"https://habrastorage.org/webt/l9/fi/du/l9fidu7odmmgi9eqqo0-tj_szi8.png\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's some error in accuracy of modeling here, but if we take, for example, 100 iterations and get average over them - the result will be more precise. You can try to set even more iterations, for sure, but avoid the dirtyness of the picture : there will be a lot of simulation lines, or you can just plot only average one.\n",
    "\n",
    "So what about results? \n",
    "\n",
    "We got in this model values are close to expected theoretical whether it was our model or it was the EoN model.\n",
    "\n",
    "\n",
    "\n",
    "\n",
    ">>> ## 7. Conclusion\n",
    "\n",
    "\n",
    "\n",
    "That's all for today. I think, this information is enough for every newbie in networks.\n",
    "\n",
    "What have we done while going through this tutorial?\n",
    "\n",
    "> We have explored really different parts of working with Networks: \n",
    "\n",
    "`Firstly, we collected data:`\n",
    "- That was not so easy, perhaps, if you are new with API, or maybe you skipped that part, if you are not VK user. But I think, it's worth reading anyway - you can pick up some usefull ideas for your future possible tasks and researches. \n",
    "\n",
    "`Secondly, we made some kind of preprocessing of data:`\n",
    "- We changed all names and IDs of real people. For this mini-task we had to find and load third-party datasets, and than generate random people set and  map to our real data. \n",
    "\n",
    "`Thirdly, we met with cool library for network's analysis - NetworkX:`\n",
    "- We got acquainted with how to get started with correct implementation of a graph in NetworkX: creation graph from scratch, random graphs, saving, loading, etc. \n",
    "\n",
    "- Although, we didn't go deep into classes and functions of this package (that wasn't purpose of this tutorial), we explored many methods, that you can use right away. Also, after this quick start of using this library you are able to improve skills by your own, just read the documentation, compile other tutorials - and it won't be challenging for you now. \n",
    "\n",
    "- We created a nice function for visualizing graphs, and discovered another easy way. You can always play around with parameters and other built-in functions from the package - visit documentation - and you will find much more interesting techniques. \n",
    "\n",
    "`Fourthly, that was amazing trial to Epidemics modeling!`\n",
    "- We got a bit closer to be familiar with good library EoN: as with NetworkX, we didn't dig everything around this package, but we used if for all ideas that we needed in this tutorial! And for sure, don't be afraid of experimentation - here is a large place for self-improvment in every direction you can think. \n",
    "\n",
    "- We made research about 3 main models, but we did it from theoretical side, for real graph and for random one! We expanded the place for future analysis here - we got many useful instruments, and you can find even more if you are got interested in this theme! I hope so. \n",
    "\n",
    "`N-thly, some recommendations:`\n",
    "- To get deeper into EoN visit official page : https://media.readthedocs.org/pdf/epidemicsonnetworks/latest/epidemicsonnetworks.pdf\n",
    "- As you remember, we worked with real graph loaded from social network. I really encourage you to load your own social net and find something good, for beginning: dependencies and clusters. For sure it will be much more interesting, if you load graph of depth not only 2, but 3, 4...(super complex). And since you'll have to load friends - you'll also have to make requests to API - so I recommend you to get familiar with this sphere, if you are not yet.\n",
    "\n",
    "- And talking about social nets, it's totally worth reading for the start: Granovetter's paper \"The Strength of Weak Ties\""
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}