{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import networkx as nx\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy import stats\n",
    "import scipy as sp\n",
    "import datetime as dt\n",
    "\n",
    "from ei_net import * \n",
    "from ce_net import * \n",
    "\n",
    "from collections import Counter\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "##########################################\n",
    "############ PLOTTING SETUP ##############\n",
    "EI_cmap = \"Greys\"\n",
    "where_to_save_pngs = \"../figs/pngs/\"\n",
    "where_to_save_pdfs = \"../figs/pdfs/\"\n",
    "save = True\n",
    "plt.rc('axes', axisbelow=True)\n",
    "##########################################\n",
    "##########################################"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The emergence of informative higher scales in complex networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 07: Effective Information Differences in Real Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The presence and informativeness of macroscales should vary across real networks, dependent on connectivity. Here we investigate the disposition toward causal emergence of real networks across different domains. We draw from the same set of networks analyzed in Chapter 04. The network sizes span up to 40,000 nodes, thus making it unfeasible to find the the best macroscales for each of them. Therefore, we focus specifically on the two categories that previously showed the greatest divergence in terms of the $EI$: biological and technological. Since we are interested in the general question of whether biological or technological networks show a greater disposition or propensity for causal emergence, we approximate causal emergence by calculating the causal emergence of sampled subgraphs of growing sizes. Each sample is found using a \"snowball sampling\" procedure, wherein a node is chosen randomly and then a weakly connected subgraph of a specified size is found around it. This subgraph is then analyzed using the previously described greedy algorithmic approach to find macro-nodes that maximized the $EI$ in each network. Each available network is sampled 20 times for each size taken from it. \n",
    "\n",
    "Here, we show how the causal emergence of these real networks differentiates as we increase the sampled subgraph size, in a sequence of 50, 100, 150, and finally 200 nodes per sample. Networks of these sizes previously provided ample  evidence of causal emergence in simulated networks. Comparing the two categories of real networks, we observe a significantly greater propensity for causal emergence in biological networks, and that this is more articulated the larger the samples are. Note that constructing a random null model of these networks (e.g., a configuration model) would tend to create networks with minimal or negligible causal emergence, as is the case for ER networks.\n",
    "\n",
    "That subsets of biological systems show a high disposition toward causal emergence is consistent, and even explanatory, of many long-standing hypotheses surrounding the existence of noise and degeneracy in biological systems. It also explains the difficulty of understanding how the causal structure of biological systems function, since they are cryptic by containing certainty at one level and uncertainty at another."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.1 Sampling subgraphs to estimate causal emergence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def snowball_sample(G, n_seeds=1, n_total=0.2, n_waves=5):\n",
    "    \"\"\"\n",
    "    This function defines a procedure for \"snowball sampling\" a graph. The \n",
    "    algorithm starts from a (usually) single seed node, which it then expands\n",
    "    outward from, collecting the nodes in waves around it in a \"snowball\" \n",
    "    manner.\n",
    "    \n",
    "    Params\n",
    "    ------\n",
    "    G (nx.Graph): the graph to be sampled.\n",
    "    n_seeds (int): the number of seed nodes that the snowball sampling \n",
    "                   starts from.\n",
    "    n_total (float): the fraction of the total size that will be sampled.\n",
    "    n_waves (int): usually set high, this determines how many shells outward\n",
    "                   the snowballing will continue.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    (V_s, E_s) (list, list): the nodelist and edgelist of the sampled graph.\n",
    "    \n",
    "    \"\"\"\n",
    "\n",
    "    nodes = list(G.nodes())\n",
    "    np.random.shuffle(nodes)\n",
    "    n_total = int(len(nodes)*n_total)\n",
    "    if n_total < 5: \n",
    "        n_total = 5\n",
    "    \n",
    "    V = [set()]*(n_waves+1)\n",
    "    \n",
    "    V_s = set()\n",
    "    E_s = set()\n",
    "    \n",
    "    for k in range(n_waves+1):\n",
    "        if k==0:\n",
    "            V[k] = set(nodes[:n_seeds])\n",
    "            nodes = set(nodes)\n",
    "        else:\n",
    "            for node_i in V[k-1]:\n",
    "                for node_j in G.neighbors(node_i):\n",
    "                    edge = (node_i,node_j) if node_i >= node_j else (node_j,node_i)\n",
    "                    E_s.add(edge)\n",
    "                    V[k].add(node_j)\n",
    "                    if len(V_s.union(V[k].intersection(nodes-V_s))) > n_total:\n",
    "                        V[k] = V[k].intersection(nodes-V_s)\n",
    "                        V_s = V_s.union(V[k])\n",
    "                        break\n",
    "                        \n",
    "            V[k] = V[k].intersection(nodes-V_s)\n",
    "\n",
    "        V_s = V_s.union(V[k])\n",
    "        \n",
    "    E_s = list(E_s)\n",
    "    return list(V_s), E_s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 504x403.2 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "G = nx.barabasi_albert_graph(40,1)\n",
    "samp,_ = snowball_sample(G,1)\n",
    "Gs = G.subgraph(samp).copy()\n",
    "\n",
    "mult=1.4\n",
    "fig,ax=plt.subplots(1,1,figsize=(5*mult,4*mult))\n",
    "pos = nx.kamada_kawai_layout(G)\n",
    "pos = nx.spring_layout(G,pos=pos,iterations=1)\n",
    "nx.draw_networkx_nodes(G,pos,node_color='gainsboro',node_size=350,\n",
    "                       edgecolors='#999999',linewidths=2.5,ax=ax)\n",
    "nx.draw_networkx_edges(G,pos,edge_color='#999999',alpha=0.7,width=3.5,ax=ax)\n",
    "xxx = [i for i in G.nodes() if i not in samp]\n",
    "labs = dict(zip(xxx,xxx))\n",
    "nx.draw_networkx_labels(G,pos,labels=labs,ax=ax,font_size=10,zorder=0)\n",
    "\n",
    "nodes = nx.draw_networkx_nodes(Gs,pos,node_color='#f5d13f',edgecolors='#333333',\n",
    "                       node_size=500,linewidths=2.5,ax=ax)\n",
    "edges = nx.draw_networkx_edges(Gs,pos,edge_color='#333333',alpha=0.7,width=3.5,ax=ax)\n",
    "\n",
    "labs = dict(zip(Gs.nodes(),[r\"$s_{%i}$\"%i for i in Gs.nodes()]))\n",
    "nx.draw_networkx_labels(Gs,pos,labels=labs,ax=ax,font_size=12)\n",
    "\n",
    "titl = \"{\"+ \", \".join(list(labs.values()))+\"}\" + r\"$\\in \\mathbf{S}$\"\n",
    "ax.set_title(titl,loc='center',pad=-320,fontsize=14)\n",
    "\n",
    "ax.set_axis_off()\n",
    "\n",
    "if save:\n",
    "    plt.savefig(where_to_save_pngs+\"Network_snowball.png\", dpi=425, bbox_inches='tight')\n",
    "    plt.savefig(where_to_save_pdfs+\"Network_snowball.pdf\", bbox_inches='tight')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2 Estimating Causal Emergence in Biological and Technological Networks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "json_data = open('../data/sampled_causalemergence.json',\"r\").read()\n",
    "consolidata = json.loads(json_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "type_mapping = {'brains':'Biological', \n",
    "                'citations':'Information', \n",
    "                'coauthorship':'Information', \n",
    "                'communication':'Social', \n",
    "                'computer':'Technological',\n",
    "                'humancontact':'Social', \n",
    "                'humansocial':'Social', \n",
    "                'hyperlink':'Information', \n",
    "                'infrastructure':'Technological',\n",
    "                'lexical':'Information', \n",
    "                'metabolic':'Biological', \n",
    "                'onlinesocial':'Social', \n",
    "                'power':'Technological', \n",
    "                'software':'Information',\n",
    "                'technological':'Technological', \n",
    "                'trophic':'Biological'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig,ax = plt.subplots(1,1,figsize=(6,4))\n",
    "bio_noise = {50:[],100:[],150:[],200:[]}\n",
    "tec_noise = {50:[],100:[],150:[],200:[]}\n",
    "maxy = 0\n",
    "for name,net in consolidata.items():\n",
    "    col = net['Graph_Info']['Color']\n",
    "    if \"#\" not in net['Graph_Info']['Color']:\n",
    "        col = \"#\"+net['Graph_Info']['Color']\n",
    "    \n",
    "    \n",
    "    net_type = type_mapping[net['Graph_Info']['Type']]\n",
    "    yvals = net['CE']['CE']\n",
    "    xvals = np.array(net['CE']['N_sample'])[:len(yvals)]\n",
    "    if net_type=='Biological':\n",
    "        if max(yvals)>maxy:\n",
    "            maxy = max(yvals)\n",
    "        for yi,y in enumerate(yvals):\n",
    "            if y >= 0:\n",
    "                if xvals[yi] > 40:\n",
    "                    bio_noise[xvals[yi]].append(y)\n",
    "\n",
    "        xvals = xvals + np.random.uniform(-13,-3,len(yvals))            \n",
    "        ax.scatter(xvals, yvals, s=2, c=col,alpha=0.6,linewidths=0.15,edgecolors='k')\n",
    "\n",
    "    if net_type=='Technological':\n",
    "        if max(yvals)>maxy:\n",
    "            maxy = max(yvals)\n",
    "        for yi,y in enumerate(yvals):\n",
    "            if y >= 0:\n",
    "                if xvals[yi] > 40:\n",
    "                    tec_noise[xvals[yi]].append(y)\n",
    "        xvals = xvals + np.random.uniform(3,13,len(yvals))\n",
    "        ax.scatter(xvals, yvals, s=2, c=col,alpha=0.6,linewidths=0.15,edgecolors='k')\n",
    "\n",
    "plot_bio = [bio_noise[50], bio_noise[100], bio_noise[150], bio_noise[200]]\n",
    "plot_tec = [tec_noise[50], tec_noise[100], tec_noise[150], tec_noise[200]]\n",
    "\n",
    "parts = ax.violinplot(plot_bio, positions=[42, 92, 142, 192], \n",
    "                      showmeans=False, showmedians=False, \n",
    "                      showextrema=False, widths=15)\n",
    "ll = 0\n",
    "for i in range(len(parts['bodies'])):\n",
    "    pc = parts['bodies'][i]\n",
    "    pc.set_edgecolor(\"#ed4f44\")\n",
    "    pc.set_facecolor(\"#ed4f44\")\n",
    "    pc.set_alpha(0.5)\n",
    "    pc.set_linewidth(2.0)\n",
    "    ll += 1\n",
    "    if ll==2:\n",
    "        pc.set_label('Biological')\n",
    "\n",
    "parts = ax.violinplot(plot_tec, positions=[58, 108, 158, 208], \n",
    "                      showmeans=False, showmedians=False, \n",
    "                      showextrema=False, widths=15)\n",
    "ll = 0\n",
    "for i in range(len(parts['bodies'])):\n",
    "    pc = parts['bodies'][i]\n",
    "    pc.set_edgecolor('#00c6c5')\n",
    "    pc.set_facecolor('#00c6c5')\n",
    "    pc.set_alpha(0.5)\n",
    "    pc.set_linewidth(2.0)\n",
    "    ll += 1\n",
    "    if ll==2:\n",
    "        pc.set_label('Technological')\n",
    "        \n",
    "\n",
    "ax.hlines(1.6,138.8+50,162.325+50,color='k')\n",
    "ax.vlines(139.2+50,1.55,1.6,color='k')\n",
    "ax.vlines(162+50,1.55,1.6,color='k')\n",
    "ax.text(145.8+50,1.61,'***',fontsize=12)\n",
    "\n",
    "ax.hlines(1.4,138.8,162.325,color='k')\n",
    "ax.vlines(139.2,1.35,1.4,color='k')\n",
    "ax.vlines(162,1.35,1.4,color='k')\n",
    "ax.text(145.8,1.41,'***',fontsize=12)\n",
    "\n",
    "ax.hlines(1.4,138.8-50,162.325-50,color='k')\n",
    "ax.vlines(139.2-50,1.35,1.4,color='k')\n",
    "ax.vlines(162-50,1.35,1.4,color='k')\n",
    "ax.text(145.8-50,1.41,'***',fontsize=12)\n",
    "\n",
    "ax.hlines(1.44,138.8-100,162.325-100,color='k')\n",
    "ax.vlines(139.2-100,1.39,1.44,color='k')\n",
    "ax.vlines(162-100,1.39,1.44,color='k')\n",
    "ax.text(145.8-100,1.45,'***',fontsize=12)\n",
    "\n",
    "ax.scatter(-1,-1,alpha=0,label='  p < 1e-07 ***')\n",
    "\n",
    "ax.set_xticks([50,100,150,200])\n",
    "ax.grid(linewidth=2.5, color='#999999', alpha=0.2, linestyle='-')\n",
    "\n",
    "ax.set_xlim(30,220)\n",
    "ax.set_ylim(-maxy*0.02,maxy*1.3)\n",
    "ax.set_xlabel(r\"$N_s$\", fontsize=16)\n",
    "ax.set_ylabel(r\"Causal emergence\", fontsize=16)\n",
    "\n",
    "ax.legend(ncol=3,columnspacing=1.5)\n",
    "\n",
    "if save:\n",
    "    plt.savefig(where_to_save_pngs+\"SamplingCE.png\", dpi=425, bbox_inches='tight')\n",
    "    plt.savefig(where_to_save_pdfs+\"SamplingCE.pdf\", bbox_inches='tight')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "______________________"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## End of Chapter 07. In [Chapter 08](https://nbviewer.jupyter.org/github/jkbren/einet/blob/master/code/Chapter%2008%20-%20Miscellaneous.ipynb) we'll wrap up final details about causal emergence.\n",
    "_______________"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}