{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: In this workbook, we try to replicate the results from the classic paper \"Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth\", Goldenberg, Libai and Muller (2001). This is a self-didactic attempt.*" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m registry at `C:\\Users\\Thibaut\\.julia\\registries\\General`\n", "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m git-repo `https://github.com/JuliaRegistries/General.git`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m StatsFuns ─────────────── v1.0.1\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m DualNumbers ───────────── v0.6.8\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m PDMats ────────────────── v0.11.16\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m HypergeometricFunctions ─ v0.3.11\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m QuadGK ────────────────── v2.6.0\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m StatsModels ───────────── v0.6.33\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m FillArrays ────────────── v0.13.5\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m GLM ───────────────────── v1.8.1\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m ShiftedArrays ─────────── v2.0.0\n", "\u001b[32m\u001b[1m Installed\u001b[22m\u001b[39m Distributions ─────────── v0.25.78\n", "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m `C:\\Users\\Thibaut\\.julia\\environments\\v1.8\\Project.toml`\n", " \u001b[90m [31c24e10] \u001b[39m\u001b[92m+ Distributions v0.25.78\u001b[39m\n", " \u001b[90m [38e38edf] \u001b[39m\u001b[92m+ GLM v1.8.1\u001b[39m\n", "\u001b[32m\u001b[1m Updating\u001b[22m\u001b[39m `C:\\Users\\Thibaut\\.julia\\environments\\v1.8\\Manifest.toml`\n", " \u001b[90m [49dc2e85] \u001b[39m\u001b[92m+ Calculus v0.5.1\u001b[39m\n", " \u001b[90m [b429d917] \u001b[39m\u001b[92m+ DensityInterface v0.4.0\u001b[39m\n", " \u001b[90m [31c24e10] \u001b[39m\u001b[92m+ Distributions v0.25.78\u001b[39m\n", " \u001b[90m [fa6b7ba4] \u001b[39m\u001b[92m+ DualNumbers v0.6.8\u001b[39m\n", " \u001b[90m [1a297f60] \u001b[39m\u001b[92m+ FillArrays v0.13.5\u001b[39m\n", " \u001b[90m [38e38edf] \u001b[39m\u001b[92m+ GLM v1.8.1\u001b[39m\n", " \u001b[90m [34004b35] \u001b[39m\u001b[92m+ HypergeometricFunctions v0.3.11\u001b[39m\n", " \u001b[90m [90014a1f] \u001b[39m\u001b[92m+ PDMats v0.11.16\u001b[39m\n", " \u001b[90m [1fd47b50] \u001b[39m\u001b[92m+ QuadGK v2.6.0\u001b[39m\n", " \u001b[90m [79098fc4] \u001b[39m\u001b[92m+ Rmath v0.7.0\u001b[39m\n", " \u001b[90m [1277b4bf] \u001b[39m\u001b[92m+ ShiftedArrays v2.0.0\u001b[39m\n", " \u001b[90m [4c63d2b9] \u001b[39m\u001b[92m+ StatsFuns v1.0.1\u001b[39m\n", " \u001b[90m [3eaba693] \u001b[39m\u001b[92m+ StatsModels v0.6.33\u001b[39m\n", " \u001b[90m [f50d1b31] \u001b[39m\u001b[92m+ Rmath_jll v0.3.0+0\u001b[39m\n", " \u001b[90m [4607b0f0] \u001b[39m\u001b[92m+ SuiteSparse\u001b[39m\n", "\u001b[32m\u001b[1mPrecompiling\u001b[22m\u001b[39m project...\n", "\u001b[32m ✓ \u001b[39m\u001b[90mShiftedArrays\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mPDMats\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mRmath_jll\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mDensityInterface\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mFillArrays\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mQuadGK\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mDualNumbers\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mRmath\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mHypergeometricFunctions\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mStatsFuns\u001b[39m\n", "\u001b[32m ✓ \u001b[39m\u001b[90mStatsModels\u001b[39m\n", "\u001b[32m ✓ \u001b[39mDistributions\n", "\u001b[32m ✓ \u001b[39mGLM\n", " 13 dependencies successfully precompiled in 15 seconds. 224 already precompiled. 1 skipped during auto due to previous errors.\n" ] } ], "source": [ "] add Graphs Distributions DataFrames GLM ProgressMeter" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "using Graphs\n", "\n", "using Distributions, DataFrames, GLM, ProgressMeter\n", "using Dates\n", "using Random: shuffle, seed!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "seed!(20130810);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Introduction \n", "\n", "In [Talk of the Network](https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/3391/TalkofNetworks.pdf), the authors explore the pattern of personal communication between an individual's core friends group (strong ties) and a wider set of acquaintances (weak ties). This remarkable study is one of the first ones in marketing that explored the influence of social networks on the diffusion of marketing messages. The key questions investigated in this paper are:\n", "\n", "- What matters more - strong ties or weak ties?\n", "- What effect does the size of an average individuals network have?\n", "- How does advertising interact with the diffusion through weak ties and that through strong ties\n", "\n", "In this workbook, we focus on replicating the efforts of the authors to answer the first question: do strong ties or weak ties influence the speed of information dissemination in a network?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Initializing the network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This study employs a large number of synthetic networks as substrates to study the diffusion of information diffusion. To quote the authors logic to create and initialize the networks:\n", "\n", "> *\"Each individual belongs to a single personal network. Each network consists of individuals who are connected by strong ties. In each period, individuals also conduct a finte number of weak tie interactions outside their personal networks... We divide the entire market equally into personal networks, in which each individual can belong to one network. In addition, in each period, every individual conducts random meetings with individuals external to his personal network.\"*\n", "\n", "Given this specification, we utilize the built-in complete graph generator from [Graphs.jl](https://juliagraphs.org/Graphs.jl/dev/core_functions/simplegraphs_generators/) to build several mini-regular networks and then allow individuals in each of these mini-networks to mingle. Our final data structure is hence a vector of several complete networks that are built based on the number of strong ties for each individual. Note that each individual in the network has a fixed number of strong ties ($s$) and weak ties ($w$)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "initialize_network (generic function with 1 method)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function initialize_network(n_nodes::Int, n_strong_ties::Int)\n", " G = [complete_graph(n_strong_ties) for g in 1:floor(Int, n_nodes/n_strong_ties)]\n", " return G\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Model\n", "\n", "## 3.1 Assumptions\n", "\n", "The probability of activation of a node, i.e., an uninformed individual turning to informed can happen in three ways: through a strong tie with probability $\\beta_s$, through a weak tie with probability $\\beta_w$ or through external marketing efforts with probability $\\alpha$. In line with conventional wisdom, the authors assume $\\alpha < \\beta_w < \\beta_s$. \n", "\n", "At timestep $t$, if an individual is connected to $m$ strong ties and $j$ weak ties, the probability of the individual being informed in this time step is:\n", "\n", "$$\n", "p(t) = 1 - (1- \\alpha)(1 - \\beta_w)^j(1 - \\beta_s)^m\n", "$$\n", "\n", "The outcome variable of interest is the number of time steps elapsed till 95% of the network engages." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.2 Execution\n", "\n", "Following our earlier discussion on the construction of substrate networks, each node in the network belongs to a complete sub-network. In addition, at each time step each node interacts with a fixed number of weak ties chosen at random from sub-networks other than its own.\n", "\n", "*Step 1:* At $t = 0$, the status of all nodes is set to `false`\n", "\n", "*Step 2:* For each node, the probability $p(t)$ of being informed is calculated using the above equation. A random draw $U$ is made from a standard uniform distribution and compared with $p(t)$. If $U < p(t)$ the status of the node is changed to `true`\n", "\n", "*Step 3:* In each successive time step, Step 2 is repeated till 95% of the total network (of size 3000) engages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now look at several helper functions that execute the above logic\n", "\n", "### 3.2.1 Reset node status\n", "\n", "The node status is stored as a vector of `BitVector`'s. At the beginning of each simulation run, we call the following function to set the status of all the nodes to `false`. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "reset_node_status (generic function with 1 method)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function reset_node_status(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}})\n", " node_status = [falses(nv(g)) for g in G]\n", " return node_status\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2.2 Updating status of the nodes\n", "\n", "At each time step, we execute two tasks. First, we allow the nodes to mingle randomly with their strong ties and with weak ties from other sub-networks. At this point, we count the number of active strong and weak ties for each node. Then, we use this information to update the status of all the nodes in the network.\n", "\n", "The first function counts the number of active strong ties within the node's sub-network. The second function executes the \"random meetings\" with weak ties as discussed in the paper. For each node we generate a random sample (without replacement) of size $w$ from sub-networks other than its own. We then count the number of active ties in its own sub-network and among the random sample taken from the rest of the network." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count_active_str_ties (generic function with 1 method)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function count_active_str_ties(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}},\n", " node_network_id::Int,\n", " node::Int,\n", " node_status::Vector{BitVector})\n", " n_active_str_ties = sum([node_status[node_network_id][nbr] for nbr in neighbors(G[node_network_id], node)])\n", " return n_active_str_ties\n", "end" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "random_meetings (generic function with 1 method)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function random_meetings(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}},\n", " node_network_id::Int,\n", " node::Int,\n", " node_status::Vector{BitVector},\n", " n_weak_ties::Int)\n", " # Choose a random sample of size `n_weak_ties` from the other sub-networks and query\n", " # their status. We first sample the network id, and use this to sample a random node\n", " # in the sub-network defined by this id.\n", "\n", " all_network_ids = 1:length(G)\n", "\n", " other_network_ids = all_network_ids[all_network_ids .!= node_network_id]\n", " possible_weak_ties = []\n", " nsamples = 1\n", "\n", " while nsamples < n_weak_ties\n", " rand_network_id = sample(other_network_ids)\n", " rand_nbr = sample(vertices(G[rand_network_id]))\n", " if !((rand_network_id, rand_nbr) in possible_weak_ties)\n", " push!(possible_weak_ties, (rand_network_id, rand_nbr))\n", " nsamples += 1\n", " end\n", " end\n", "\n", " n_active_wk_ties = sum([node_status[network_id][weak_tie] for (network_id, weak_tie) in possible_weak_ties])\n", " return n_active_wk_ties\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, the function below conducts the updation of the status of all the nodes at each time step by calculating the probability of activation. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "update_status! (generic function with 1 method)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function update_status!(G::Vector{Graphs.SimpleGraphs.SimpleGraph{Int}},\n", " node_status::Vector{BitVector},\n", " n_weak_ties::Int,\n", " alpha::Float64, beta_w::Float64, beta_s::Float64)\n", " # assuming that the nodes update in random order\n", "\n", " for node_network_id in shuffle(1:length(G))\n", " for node in shuffle(vertices(G[node_network_id]))\n", " n_active_str_ties = count_active_str_ties(G, node_network_id, node, node_status)\n", " n_active_wk_ties = random_meetings(G, node_network_id, node, node_status, n_weak_ties)\n", "\n", " activation_prob = 1 - (1 - alpha) * (1 - beta_w)^n_active_wk_ties * (1 - beta_s)^n_active_str_ties\n", "\n", " if rand(Uniform()) < activation_prob\n", " node_status[node_network_id][node] = true\n", " end\n", " end\n", " end\n", "\n", " return nothing\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2.4 Simulation on the parameter space\n", "\n", "The function `execute_simulation` puts together the scaffolding to set up the parameter space $(s, w, \\alpha, \\beta_w, \\beta_s)$ and execute diffusion along the network. From what I can gather from the paper, one simulation was carried out at each point on the parameter space. No further details regarding the execution are mentioned except that since each parameter has 7 levels, a total of $7^5 = 16,807$ simulations were executed in a factorial design. In this workbook, we work on a smaller parameter space using 3 levels for each parameter.\n", "\n", "Also, I am assuming that the network is drawn at random for each run of the simulation.\n", "\n", "One more interesting thing to note: The authors mention that their simulations were written in C, it would be interesting to compare the execution times with Julia. This is a non-standard problem that tests both the robustness of Julia types and its execution speed (maybe this will prompt someone to make a pull request!)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of strong ties per node (s): [5, 17, 29]\n", "Number of weak ties per node(w): [5, 17, 29]\n", "Effect of advertising (α): [0.0005, 0.00525, 0.01]\n", "Effect of weak ties (β_w): [0.005, 0.01, 0.015]\n", "Effect of strong ties (β_s): [0.01, 0.04, 0.07]\n" ] } ], "source": [ "println(\"Number of strong ties per node (s): \", floor.(Int, range(5, stop=29, length=3)))\n", "println(\"Number of weak ties per node(w): \", floor.(Int, range(5, stop=29, length=3)))\n", "println(\"Effect of advertising (α): \", collect(range(0.0005, stop=0.01, length=3)))\n", "println(\"Effect of weak ties (β_w): \", collect(range(0.005, stop=0.015, length=3)))\n", "println(\"Effect of strong ties (β_s): \", collect(range(0.01, stop=0.07, length=3)))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((3, 3, 3, 3, 3), 243)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parameter_space = [(s, w, alpha, beta_w, beta_s) for s in floor.(Int, range(5, stop=29, length=3)), \n", " w in floor.(Int, range(5, stop=29, length=3)),\n", " alpha in range(0.0005, stop=0.01, length=3),\n", " beta_w in range(0.005, stop=0.015, length=3),\n", " beta_s in range(0.01, stop=0.07, length=3)]\n", "\n", "size(parameter_space), length(parameter_space)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "execute_simulation (generic function with 1 method)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function execute_simulation(parameter_space, n_nodes::Int)\n", " # n_nodes dictates how big the network will be\n", " # We cannot pre-allocate the output since we do not know for how many time steps the simulation will\n", " # run at each setting\n", "\n", " output = DataFrame(s = Int[], w = Int[], alpha = Float64[],\n", " beta_w = Float64[], beta_s = Float64[],\n", " t = Int[], num_engaged = Int[])\n", "\n", " println(\"Beginning simulation at : \", Dates.format(now(), \"HH:MM\"))\n", " println(\"You might want to grab a cup of coffee while Julia brews the simulation...\")\n", "\n", " @showprogress 1 \"Crunching numbers while you munch...\" for (s, w, alpha, beta_w, beta_s) in parameter_space[1:end]\n", " G = initialize_network(n_nodes, s)\n", " node_status = reset_node_status(G)\n", " num_engaged = sum(sum(node_status))\n", "\n", " # Continue updates at each setting till 95% of the network engages\n", " t = 1\n", " while num_engaged < floor(Int, 0.95 * n_nodes)\n", " update_status!(G, node_status, w, alpha, beta_w, beta_s)\n", " num_engaged = sum(sum(node_status))\n", " push!(output, [s, w, alpha, beta_w, beta_s, t, num_engaged])\n", " t += 1\n", " end\n", " end\n", "\n", " return output\n", "end" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Beginning simulation at : 16:05\n", "You might want to grab a cup of coffee while Julia brews the simulation...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mCrunching numbers while you munch... 100%|███████████████| Time: 0:02:44\u001b[39m\n" ] }, { "data": { "text/html": [ "
5654×7 DataFrame
5629 rows omitted
Rowswalphabeta_wbeta_stnum_engaged
Int64Int64Float64Float64Float64Int64Int64
1550.00050.0050.0111
2550.00050.0050.0126
3550.00050.0050.0139
4550.00050.0050.01411
5550.00050.0050.01511
6550.00050.0050.01611
7550.00050.0050.01714
8550.00050.0050.01816
9550.00050.0050.01919
10550.00050.0050.011021
11550.00050.0050.011123
12550.00050.0050.011229
13550.00050.0050.011331
56435290.010.0150.07102781
56445290.010.0150.07112890
564517290.010.0150.07163
564617290.010.0150.072340
564717290.010.0150.0731002
564817290.010.0150.0741949
564917290.010.0150.0752613
565017290.010.0150.0762902
565129290.010.0150.071238
565229290.010.0150.0721146
565329290.010.0150.0732373
565429290.010.0150.0742921
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& s & w & alpha & beta\\_w & beta\\_s & t & num\\_engaged\\\\\n", "\t\\hline\n", "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Int64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 1 & 1 \\\\\n", "\t2 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 2 & 6 \\\\\n", "\t3 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 3 & 9 \\\\\n", "\t4 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 4 & 11 \\\\\n", "\t5 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 5 & 11 \\\\\n", "\t6 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 6 & 11 \\\\\n", "\t7 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 7 & 14 \\\\\n", "\t8 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 8 & 16 \\\\\n", "\t9 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 9 & 19 \\\\\n", "\t10 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 10 & 21 \\\\\n", "\t11 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 11 & 23 \\\\\n", "\t12 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 12 & 29 \\\\\n", "\t13 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 13 & 31 \\\\\n", "\t14 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 14 & 33 \\\\\n", "\t15 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 15 & 35 \\\\\n", "\t16 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 16 & 40 \\\\\n", "\t17 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 17 & 45 \\\\\n", "\t18 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 18 & 47 \\\\\n", "\t19 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 19 & 52 \\\\\n", "\t20 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 20 & 52 \\\\\n", "\t21 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 21 & 57 \\\\\n", "\t22 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 22 & 61 \\\\\n", "\t23 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 23 & 68 \\\\\n", "\t24 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 24 & 74 \\\\\n", "\t25 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 25 & 79 \\\\\n", "\t26 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 26 & 85 \\\\\n", "\t27 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 27 & 91 \\\\\n", "\t28 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 28 & 97 \\\\\n", "\t29 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 29 & 102 \\\\\n", "\t30 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 30 & 111 \\\\\n", "\t$\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ & $\\dots$ \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "\u001b[1m5654×7 DataFrame\u001b[0m\n", "\u001b[1m Row \u001b[0m│\u001b[1m s \u001b[0m\u001b[1m w \u001b[0m\u001b[1m alpha \u001b[0m\u001b[1m beta_w \u001b[0m\u001b[1m beta_s \u001b[0m\u001b[1m t \u001b[0m\u001b[1m num_engaged \u001b[0m\n", " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n", "──────┼─────────────────────────────────────────────────────────────\n", " 1 │ 5 5 0.0005 0.005 0.01 1 1\n", " 2 │ 5 5 0.0005 0.005 0.01 2 6\n", " 3 │ 5 5 0.0005 0.005 0.01 3 9\n", " 4 │ 5 5 0.0005 0.005 0.01 4 11\n", " 5 │ 5 5 0.0005 0.005 0.01 5 11\n", " 6 │ 5 5 0.0005 0.005 0.01 6 11\n", " 7 │ 5 5 0.0005 0.005 0.01 7 14\n", " 8 │ 5 5 0.0005 0.005 0.01 8 16\n", " 9 │ 5 5 0.0005 0.005 0.01 9 19\n", " 10 │ 5 5 0.0005 0.005 0.01 10 21\n", " 11 │ 5 5 0.0005 0.005 0.01 11 23\n", " ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮\n", " 5645 │ 17 29 0.01 0.015 0.07 1 63\n", " 5646 │ 17 29 0.01 0.015 0.07 2 340\n", " 5647 │ 17 29 0.01 0.015 0.07 3 1002\n", " 5648 │ 17 29 0.01 0.015 0.07 4 1949\n", " 5649 │ 17 29 0.01 0.015 0.07 5 2613\n", " 5650 │ 17 29 0.01 0.015 0.07 6 2902\n", " 5651 │ 29 29 0.01 0.015 0.07 1 238\n", " 5652 │ 29 29 0.01 0.015 0.07 2 1146\n", " 5653 │ 29 29 0.01 0.015 0.07 3 2373\n", " 5654 │ 29 29 0.01 0.015 0.07 4 2921\n", "\u001b[36m 5633 rows omitted\u001b[0m" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results = execute_simulation(parameter_space, 3000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Discussion\n", "\n", "To answer the research questions, the authors resort to simple linear regression. \n", "\n", "Since our focus in this workbook is on highlighting the strengths of the JuliaGraphs ecosystem, we keep the regression modeling at the most basic level.\n", "\n", "As discussed earlier, the outcome is the time taken for 95% of the network to engage with the message. The features used to predict this outcome are $s$, $w$, $\\alpha$, $\\beta_w$ and $\\beta_S$. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
10×7 DataFrame
Rowswalphabeta_wbeta_stnum_engaged
Int64Int64Float64Float64Float64Int64Int64
1550.00050.0050.0111
2550.00050.0050.0126
3550.00050.0050.0139
4550.00050.0050.01411
5550.00050.0050.01511
6550.00050.0050.01611
7550.00050.0050.01714
8550.00050.0050.01816
9550.00050.0050.01919
10550.00050.0050.011021
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& s & w & alpha & beta\\_w & beta\\_s & t & num\\_engaged\\\\\n", "\t\\hline\n", "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Int64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 1 & 1 \\\\\n", "\t2 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 2 & 6 \\\\\n", "\t3 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 3 & 9 \\\\\n", "\t4 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 4 & 11 \\\\\n", "\t5 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 5 & 11 \\\\\n", "\t6 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 6 & 11 \\\\\n", "\t7 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 7 & 14 \\\\\n", "\t8 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 8 & 16 \\\\\n", "\t9 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 9 & 19 \\\\\n", "\t10 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 10 & 21 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "\u001b[1m10×7 DataFrame\u001b[0m\n", "\u001b[1m Row \u001b[0m│\u001b[1m s \u001b[0m\u001b[1m w \u001b[0m\u001b[1m alpha \u001b[0m\u001b[1m beta_w \u001b[0m\u001b[1m beta_s \u001b[0m\u001b[1m t \u001b[0m\u001b[1m num_engaged \u001b[0m\n", " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n", "─────┼─────────────────────────────────────────────────────────────\n", " 1 │ 5 5 0.0005 0.005 0.01 1 1\n", " 2 │ 5 5 0.0005 0.005 0.01 2 6\n", " 3 │ 5 5 0.0005 0.005 0.01 3 9\n", " 4 │ 5 5 0.0005 0.005 0.01 4 11\n", " 5 │ 5 5 0.0005 0.005 0.01 5 11\n", " 6 │ 5 5 0.0005 0.005 0.01 6 11\n", " 7 │ 5 5 0.0005 0.005 0.01 7 14\n", " 8 │ 5 5 0.0005 0.005 0.01 8 16\n", " 9 │ 5 5 0.0005 0.005 0.01 9 19\n", " 10 │ 5 5 0.0005 0.005 0.01 10 21" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first(results, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To build the data required for the linear modeling, we group the data by each parameter setting and calculate the time the network takes to reach 95% activation." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
10×6 DataFrame
Rowswalphabeta_wbeta_sT95
Int64Int64Float64Float64Float64Int64
1550.00050.0050.01154
21750.00050.0050.0164
32950.00050.0050.0147
45170.00050.0050.0178
517170.00050.0050.0143
629170.00050.0050.0132
75290.00050.0050.0147
817290.00050.0050.0133
929290.00050.0050.0125
10550.005250.0050.0193
" ], "text/latex": [ "\\begin{tabular}{r|cccccc}\n", "\t& s & w & alpha & beta\\_w & beta\\_s & T95\\\\\n", "\t\\hline\n", "\t& Int64 & Int64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 5 & 5 & 0.0005 & 0.005 & 0.01 & 154 \\\\\n", "\t2 & 17 & 5 & 0.0005 & 0.005 & 0.01 & 64 \\\\\n", "\t3 & 29 & 5 & 0.0005 & 0.005 & 0.01 & 47 \\\\\n", "\t4 & 5 & 17 & 0.0005 & 0.005 & 0.01 & 78 \\\\\n", "\t5 & 17 & 17 & 0.0005 & 0.005 & 0.01 & 43 \\\\\n", "\t6 & 29 & 17 & 0.0005 & 0.005 & 0.01 & 32 \\\\\n", "\t7 & 5 & 29 & 0.0005 & 0.005 & 0.01 & 47 \\\\\n", "\t8 & 17 & 29 & 0.0005 & 0.005 & 0.01 & 33 \\\\\n", "\t9 & 29 & 29 & 0.0005 & 0.005 & 0.01 & 25 \\\\\n", "\t10 & 5 & 5 & 0.00525 & 0.005 & 0.01 & 93 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "\u001b[1m10×6 DataFrame\u001b[0m\n", "\u001b[1m Row \u001b[0m│\u001b[1m s \u001b[0m\u001b[1m w \u001b[0m\u001b[1m alpha \u001b[0m\u001b[1m beta_w \u001b[0m\u001b[1m beta_s \u001b[0m\u001b[1m T95 \u001b[0m\n", " │\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Float64 \u001b[0m\u001b[90m Int64 \u001b[0m\n", "─────┼────────────────────────────────────────────────\n", " 1 │ 5 5 0.0005 0.005 0.01 154\n", " 2 │ 17 5 0.0005 0.005 0.01 64\n", " 3 │ 29 5 0.0005 0.005 0.01 47\n", " 4 │ 5 17 0.0005 0.005 0.01 78\n", " 5 │ 17 17 0.0005 0.005 0.01 43\n", " 6 │ 29 17 0.0005 0.005 0.01 32\n", " 7 │ 5 29 0.0005 0.005 0.01 47\n", " 8 │ 17 29 0.0005 0.005 0.01 33\n", " 9 │ 29 29 0.0005 0.005 0.01 25\n", " 10 │ 5 5 0.00525 0.005 0.01 93" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_engaged = combine(groupby(results, [:s, :w, :alpha, :beta_w, :beta_s]), df -> DataFrame(T95 = maximum(df[!,:t])));\n", "first(all_engaged, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then run a simple linear model on the data" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}\n", "\n", "T95 ~ 1 + s + w + alpha + beta_s + beta_w\n", "\n", "Coefficients:\n", "────────────────────────────────────────────────────────────────────────────────────\n", " Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%\n", "────────────────────────────────────────────────────────────────────────────────────\n", "(Intercept) 84.3588 3.04531 27.70 <1e-75 78.3594 90.3581\n", "s -1.01132 0.0742558 -13.62 <1e-30 -1.1576 -0.865031\n", "w -0.824588 0.0742558 -11.10 <1e-22 -0.970874 -0.678303\n", "alpha -1374.92 187.594 -7.33 <1e-11 -1744.48 -1005.35\n", "beta_s -292.798 29.7023 -9.86 <1e-18 -351.313 -234.284\n", "beta_w -1095.06 178.214 -6.14 <1e-08 -1446.15 -743.976\n", "────────────────────────────────────────────────────────────────────────────────────" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ols = lm(@formula(T95 ~ s + w + alpha + beta_s + beta_w), all_engaged)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6773101916389891" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r2(ols)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a rather strong finding. The speed of information diffusion is impacted equally strongly by both strong ties and weak ties. As the authors note, the surprising aspect of this strudy is that the effect of weak ties is rather strong despite the inferiority of the weak ties parameter in the model assumptions." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.8.2", "language": "julia", "name": "julia-1.8" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.8.2" } }, "nbformat": 4, "nbformat_minor": 2 }