{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Resource usage of the StellarGraph class\n",
    "\n",
    "> This notebooks records the time and memory (both peak and long-term) required to construct a StellarGraph object for several datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden",
    "tags": [
     "CloudRunner"
    ]
   },
   "source": [
    "<table><tr><td>Run the latest release of this notebook:</td><td><a href=\"https://mybinder.org/v2/gh/stellargraph/stellargraph/master?urlpath=lab/tree/demos/internal-developers/graph-resource-usage.ipynb\" alt=\"Open In Binder\" target=\"_parent\"><img src=\"https://mybinder.org/badge_logo.svg\"/></a></td><td><a href=\"https://colab.research.google.com/github/stellargraph/stellargraph/blob/master/demos/internal-developers/graph-resource-usage.ipynb\" alt=\"Open In Colab\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\"/></a></td></tr></table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook is aimed at helping contributors to the StellarGraph library itself understand how their changes affect the resource usage of the `StellarGraph` object.\n",
    "\n",
    "Various measures of resource usage for several \"real world\" graphs of various sizes are recorded:\n",
    "\n",
    "- time for construction\n",
    "- memory usage of the final `StellarGraph` object\n",
    "- peak memory usage during `StellarGraph` construction (both absolute, and additional compared to the raw input data)\n",
    "\n",
    "These are recorded both with explicit nodes (and node features if they exist), and implicit/inferred nodes.\n",
    "\n",
    "The memory usage is recorded end-to-end. That is, the recording starts from data on disk and continues until the `StellarGraph` object has been constructed and other data has been cleaned up. This is important for accurately recording the total memory usage, as NumPy arrays can often share data with existing arrays in memory and so retroactive or partial (starting from data in memory) analysis can miss significant amounts of data. The parsing code in `stellargraph.datasets` doesn't allow determining the memory usage of the intermediate nodes and edges input to the `StellarGraph` constructor, and so cannot be used here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "nbsphinx": "hidden",
    "tags": [
     "CloudRunner"
    ]
   },
   "outputs": [],
   "source": [
    "# install StellarGraph if running on Google Colab\n",
    "import sys\n",
    "if 'google.colab' in sys.modules:\n",
    "  %pip install -q stellargraph[demos]==1.1.0b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "nbsphinx": "hidden",
    "tags": [
     "VersionCheck"
    ]
   },
   "outputs": [],
   "source": [
    "# verify that we're using the correct version of StellarGraph for this notebook\n",
    "import stellargraph as sg\n",
    "\n",
    "try:\n",
    "    sg.utils.validate_notebook_version(\"1.1.0b\")\n",
    "except AttributeError:\n",
    "    raise ValueError(\n",
    "        f\"This notebook requires StellarGraph version 1.1.0b, but a different version {sg.__version__} is installed.  Please see <https://github.com/stellargraph/stellargraph/issues/1172>.\"\n",
    "    ) from None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import stellargraph as sg\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "import gc\n",
    "import json\n",
    "import os\n",
    "import timeit\n",
    "import tempfile\n",
    "import tracemalloc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional reddit data\n",
    "\n",
    "The original GraphSAGE paper evaluated on a reddit dataset, available at <http://snap.stanford.edu/graphsage/#datasets>. This dataset is large (1.3GB compressed) and so there is not automatic download support for it. The following `reddit_path` variable controls whether and how the reddit dataset is included:\n",
    "\n",
    "- to ignore the dataset: set the variable to `None`\n",
    "- to include the dataset: download the dataset zip, decompress it, and set the variable to the decompressed directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "reddit_path = os.path.expanduser(\"~/data/reddit\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cora"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "cora = sg.datasets.Cora()\n",
    "cora.download()\n",
    "\n",
    "cora_cites_path = os.path.join(cora.data_directory, \"cora.cites\")\n",
    "cora_content_path = os.path.join(cora.data_directory, \"cora.content\")\n",
    "cora_dtypes = {0: int, **{i: np.float32 for i in range(1, 1433 + 1)}}\n",
    "\n",
    "\n",
    "def cora_parts(include_nodes):\n",
    "    if include_nodes:\n",
    "        nodes = pd.read_csv(\n",
    "            cora_content_path,\n",
    "            header=None,\n",
    "            sep=\"\\t\",\n",
    "            index_col=0,\n",
    "            usecols=range(0, 1433 + 1),\n",
    "            dtype=cora_dtypes,\n",
    "            na_filter=False,\n",
    "        )\n",
    "    else:\n",
    "        nodes = None\n",
    "    edges = pd.read_csv(\n",
    "        cora_cites_path,\n",
    "        header=None,\n",
    "        sep=\"\\t\",\n",
    "        names=[\"source\", \"target\"],\n",
    "        dtype=int,\n",
    "        na_filter=False,\n",
    "    )\n",
    "    return nodes, edges, {}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### BlogCatalog3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "blogcatalog3 = sg.datasets.BlogCatalog3()\n",
    "blogcatalog3.download()\n",
    "\n",
    "blogcatalog3_edges = os.path.join(blogcatalog3.data_directory, \"edges.csv\")\n",
    "blogcatalog3_group_edges = os.path.join(blogcatalog3.data_directory, \"group-edges.csv\")\n",
    "blogcatalog3_groups = os.path.join(blogcatalog3.data_directory, \"groups.csv\")\n",
    "blogcatalog3_nodes = os.path.join(blogcatalog3.data_directory, \"nodes.csv\")\n",
    "\n",
    "\n",
    "def blogcatalog3_parts(include_nodes):\n",
    "    if include_nodes:\n",
    "        raw_nodes = pd.read_csv(blogcatalog3_nodes, header=None)[0]\n",
    "        raw_groups = pd.read_csv(blogcatalog3_groups, header=None)[0]\n",
    "        nodes = {\n",
    "            \"user\": pd.DataFrame(index=raw_nodes),\n",
    "            \"group\": pd.DataFrame(index=-raw_groups),\n",
    "        }\n",
    "    else:\n",
    "        nodes = None\n",
    "\n",
    "    edges = pd.read_csv(blogcatalog3_edges, header=None, names=[\"source\", \"target\"])\n",
    "\n",
    "    group_edges = pd.read_csv(\n",
    "        blogcatalog3_group_edges, header=None, names=[\"source\", \"target\"]\n",
    "    )\n",
    "    group_edges[\"target\"] *= -1\n",
    "    start = len(edges)\n",
    "    group_edges.index = range(start, start + len(group_edges))\n",
    "\n",
    "    edges = {\"friend\": edges, \"belongs\": group_edges}\n",
    "    return nodes, edges, {}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### FB15k"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "fb15k = sg.datasets.FB15k()\n",
    "fb15k.download()\n",
    "fb15k_files = [\n",
    "    os.path.join(fb15k.data_directory, f\"freebase_mtr100_mte100-{x}.txt\")\n",
    "    for x in [\"train\", \"test\", \"valid\"]\n",
    "]\n",
    "\n",
    "\n",
    "def fb15k_parts(include_nodes, usecols=None):\n",
    "    loaded = [\n",
    "        pd.read_csv(\n",
    "            name,\n",
    "            header=None,\n",
    "            names=[\"source\", \"label\", \"target\"],\n",
    "            sep=\"\\t\",\n",
    "            dtype=str,\n",
    "            na_filter=False,\n",
    "            usecols=usecols,\n",
    "        )\n",
    "        for name in fb15k_files\n",
    "    ]\n",
    "    edges = pd.concat(loaded, ignore_index=True)\n",
    "\n",
    "    if include_nodes:\n",
    "        # infer the set of nodes manually, in a memory-minimal way\n",
    "        raw_nodes = set(edges.source)\n",
    "        raw_nodes.update(edges.target)\n",
    "        nodes = pd.DataFrame(index=raw_nodes)\n",
    "    else:\n",
    "        nodes = None\n",
    "\n",
    "    return nodes, edges, {\"edge_type_column\": \"label\"}\n",
    "\n",
    "\n",
    "def fb15k_no_edge_types_parts(include_nodes):\n",
    "    nodes, edges, _ = fb15k_parts(include_nodes, usecols=[\"source\", \"target\"])\n",
    "    return nodes, edges, {}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reddit\n",
    "\n",
    "As discussed above, the reddit dataset is large and optional. It is also slow to parse, as the graph structure is a huge JSON file. Thus, we prepare the dataset by converting that JSON file into a NumPy edge list array, of shape `(num_edges, 2)`. This is significantly faster to load from disk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 16.6 s, sys: 1.75 s, total: 18.4 s\n",
      "Wall time: 18.4 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "# if requested, prepare the reddit dataset by saving the slow-to-read JSON to a temporary .npy file\n",
    "if reddit_path is not None:\n",
    "    reddit_graph_path = os.path.join(reddit_path, \"reddit-G.json\")\n",
    "    reddit_feats_path = os.path.join(reddit_path, \"reddit-feats.npy\")\n",
    "\n",
    "    with open(reddit_graph_path) as f:\n",
    "        reddit_g = json.load(f)\n",
    "    reddit_numpy_edges = np.array([[x[\"source\"], x[\"target\"]] for x in reddit_g[\"links\"]])\n",
    "    \n",
    "    reddit_edges_file = tempfile.NamedTemporaryFile(suffix=\".npy\")\n",
    "    np.save(reddit_edges_file, reddit_numpy_edges)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reddit_parts(include_nodes):\n",
    "    if include_nodes:\n",
    "        raw_nodes = np.load(reddit_feats_path)\n",
    "        nodes = pd.DataFrame(raw_nodes)\n",
    "    else:\n",
    "        nodes = None\n",
    "\n",
    "    raw_edges = np.load(reddit_edges_file.name)\n",
    "    edges = pd.DataFrame(raw_edges, columns=[\"source\", \"target\"])\n",
    "    return nodes, edges, {}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Collected"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "datasets = {\n",
    "    \"Cora\": cora_parts,\n",
    "    \"BlogCatalog3\": blogcatalog3_parts,\n",
    "    \"FB15k (no edge types)\": fb15k_no_edge_types_parts,\n",
    "    \"FB15k\": fb15k_parts,\n",
    "}\n",
    "if reddit_path is not None:\n",
    "    datasets[\"reddit\"] = reddit_parts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Measurement"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mem_snapshot_diff(after, before):\n",
    "    \"\"\"Total memory difference between two tracemalloc.snapshot objects\"\"\"\n",
    "    return sum(elem.size_diff for elem in after.compare_to(before, \"lineno\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# names of columns computed by the measurement code\n",
    "def measurement_columns(title):\n",
    "    names = [\n",
    "        \"time\",\n",
    "        \"memory (graph)\",\n",
    "        \"memory (graph, not shared with data)\",\n",
    "        \"peak memory (graph)\",\n",
    "        \"peak memory (graph, ignoring data)\",\n",
    "        \"memory (data)\",\n",
    "        \"peak memory (data)\",\n",
    "    ]\n",
    "    return [(title, x) for x in names]\n",
    "\n",
    "\n",
    "columns = pd.MultiIndex.from_tuples(\n",
    "    [\n",
    "        (\"graph\", \"nodes\"),\n",
    "        (\"graph\", \"node feat size\"),\n",
    "        (\"graph\", \"edges\"),\n",
    "        *measurement_columns(\"explicit nodes\"),\n",
    "        *measurement_columns(\"inferred nodes (no features)\"),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def measure_time(f, include_nodes):\n",
    "    nodes, edges, args = f(include_nodes)\n",
    "    start = timeit.default_timer()\n",
    "    sg.StellarGraph(nodes, edges, **args)\n",
    "    end = timeit.default_timer()\n",
    "    return end - start"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "def measure_memory(f, include_nodes):\n",
    "    \"\"\"\n",
    "    Measure exactly what it takes to load the data.\n",
    "    \n",
    "    - the size of the original edge data (as a baseline)\n",
    "    - the size of the final graph\n",
    "    - the peak memory use of both\n",
    "    \n",
    "    This uses a similar technique to the 'allocation_benchmark' fixture in tests/test_utils/alloc.py.\n",
    "    \"\"\"\n",
    "    gc.collect()\n",
    "    # ensure we're measuring the worst-case peak, when no GC happens\n",
    "    gc.disable()\n",
    "\n",
    "    tracemalloc.start()\n",
    "    snapshot_start = tracemalloc.take_snapshot()\n",
    "\n",
    "    nodes, edges, args = f(include_nodes)\n",
    "\n",
    "    gc.collect()\n",
    "    _, data_memory_peak = tracemalloc.get_traced_memory()\n",
    "    snapshot_data = tracemalloc.take_snapshot()\n",
    "\n",
    "    if include_nodes:\n",
    "        assert nodes is not None, f\n",
    "        sg_g = sg.StellarGraph(nodes, edges, **args)\n",
    "    else:\n",
    "        assert nodes is None, f\n",
    "        sg_g = sg.StellarGraph(edges=edges, **args)\n",
    "\n",
    "    gc.collect()\n",
    "    snapshot_graph = tracemalloc.take_snapshot()\n",
    "\n",
    "    # clean up the input data and anything else leftover, so that the snapshot\n",
    "    # includes only the long-lasting data: the StellarGraph.\n",
    "    del edges\n",
    "    del nodes\n",
    "    del args\n",
    "    gc.collect()\n",
    "\n",
    "    _, graph_memory_peak = tracemalloc.get_traced_memory()\n",
    "    snapshot_end = tracemalloc.take_snapshot()\n",
    "    tracemalloc.stop()\n",
    "\n",
    "    gc.enable()\n",
    "\n",
    "    data_memory = mem_snapshot_diff(snapshot_data, snapshot_start)\n",
    "    graph_memory = mem_snapshot_diff(snapshot_end, snapshot_start)\n",
    "    graph_over_data_memory = mem_snapshot_diff(snapshot_graph, snapshot_data)\n",
    "\n",
    "    return (\n",
    "        sg_g,\n",
    "        graph_memory,\n",
    "        graph_over_data_memory,\n",
    "        graph_memory_peak,\n",
    "        graph_memory_peak - data_memory,\n",
    "        data_memory,\n",
    "        data_memory_peak,\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def measure(f):\n",
    "    time_nodes = measure_time(f, include_nodes=True)\n",
    "    time_no_nodes = measure_time(f, include_nodes=False)\n",
    "\n",
    "    sg_g, *mem_nodes = measure_memory(f, include_nodes=True)\n",
    "    _, *mem_no_nodes = measure_memory(f, include_nodes=False)\n",
    "\n",
    "    feat_sizes = sg_g.node_feature_sizes()\n",
    "    try:\n",
    "        feat_sizes = feat_sizes[sg_g.unique_node_type()]\n",
    "    except ValueError:\n",
    "        pass\n",
    "\n",
    "    return [\n",
    "        sg_g.number_of_nodes(),\n",
    "        feat_sizes,\n",
    "        sg_g.number_of_edges(),\n",
    "        time_nodes,\n",
    "        *mem_nodes,\n",
    "        time_no_nodes,\n",
    "        *mem_no_nodes,\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 24.1 s, sys: 4.75 s, total: 28.8 s\n",
      "Wall time: 29 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "recorded = [measure(f) for f in datasets.values()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">graph</th>\n",
       "      <th colspan=\"7\" halign=\"left\">explicit nodes</th>\n",
       "      <th colspan=\"7\" halign=\"left\">inferred nodes (no features)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>nodes</th>\n",
       "      <th>node feat size</th>\n",
       "      <th>edges</th>\n",
       "      <th>time</th>\n",
       "      <th>memory (graph)</th>\n",
       "      <th>memory (graph, not shared with data)</th>\n",
       "      <th>peak memory (graph)</th>\n",
       "      <th>peak memory (graph, ignoring data)</th>\n",
       "      <th>memory (data)</th>\n",
       "      <th>peak memory (data)</th>\n",
       "      <th>time</th>\n",
       "      <th>memory (graph)</th>\n",
       "      <th>memory (graph, not shared with data)</th>\n",
       "      <th>peak memory (graph)</th>\n",
       "      <th>peak memory (graph, ignoring data)</th>\n",
       "      <th>memory (data)</th>\n",
       "      <th>peak memory (data)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Cora</th>\n",
       "      <td>2708</td>\n",
       "      <td>1433</td>\n",
       "      <td>5429</td>\n",
       "      <td>0.027746</td>\n",
       "      <td>15607738</td>\n",
       "      <td>15587033</td>\n",
       "      <td>46764081</td>\n",
       "      <td>31079432</td>\n",
       "      <td>15684649</td>\n",
       "      <td>31995281</td>\n",
       "      <td>0.003061</td>\n",
       "      <td>94074</td>\n",
       "      <td>96441</td>\n",
       "      <td>374259</td>\n",
       "      <td>284126</td>\n",
       "      <td>90133</td>\n",
       "      <td>197529</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BlogCatalog3</th>\n",
       "      <td>10351</td>\n",
       "      <td>{'user': 0, 'group': 0}</td>\n",
       "      <td>348459</td>\n",
       "      <td>0.037114</td>\n",
       "      <td>6069823</td>\n",
       "      <td>8864526</td>\n",
       "      <td>21775036</td>\n",
       "      <td>16105955</td>\n",
       "      <td>5669081</td>\n",
       "      <td>10805733</td>\n",
       "      <td>0.029837</td>\n",
       "      <td>6068735</td>\n",
       "      <td>8861638</td>\n",
       "      <td>21689836</td>\n",
       "      <td>16106851</td>\n",
       "      <td>5582985</td>\n",
       "      <td>10711633</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FB15k (no edge types)</th>\n",
       "      <td>14951</td>\n",
       "      <td>0</td>\n",
       "      <td>592213</td>\n",
       "      <td>0.117117</td>\n",
       "      <td>6398526</td>\n",
       "      <td>5412944</td>\n",
       "      <td>33209916</td>\n",
       "      <td>17529813</td>\n",
       "      <td>15680103</td>\n",
       "      <td>25831842</td>\n",
       "      <td>0.187143</td>\n",
       "      <td>6398542</td>\n",
       "      <td>5535640</td>\n",
       "      <td>34645128</td>\n",
       "      <td>19090289</td>\n",
       "      <td>15554839</td>\n",
       "      <td>25050795</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FB15k</th>\n",
       "      <td>14951</td>\n",
       "      <td>0</td>\n",
       "      <td>592213</td>\n",
       "      <td>0.653126</td>\n",
       "      <td>12071938</td>\n",
       "      <td>15677213</td>\n",
       "      <td>60081827</td>\n",
       "      <td>39179000</td>\n",
       "      <td>20902827</td>\n",
       "      <td>35792614</td>\n",
       "      <td>0.730613</td>\n",
       "      <td>12071938</td>\n",
       "      <td>15799909</td>\n",
       "      <td>60082291</td>\n",
       "      <td>39304744</td>\n",
       "      <td>20777547</td>\n",
       "      <td>35011567</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reddit</th>\n",
       "      <td>232965</td>\n",
       "      <td>602</td>\n",
       "      <td>11606919</td>\n",
       "      <td>4.354689</td>\n",
       "      <td>712112941</td>\n",
       "      <td>712119784</td>\n",
       "      <td>3551635605</td>\n",
       "      <td>2243949792</td>\n",
       "      <td>1307685813</td>\n",
       "      <td>1307694920</td>\n",
       "      <td>0.688833</td>\n",
       "      <td>153908015</td>\n",
       "      <td>153909618</td>\n",
       "      <td>622398852</td>\n",
       "      <td>436684187</td>\n",
       "      <td>185714665</td>\n",
       "      <td>185723196</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        graph                                     \\\n",
       "                        nodes           node feat size     edges   \n",
       "Cora                     2708                     1433      5429   \n",
       "BlogCatalog3            10351  {'user': 0, 'group': 0}    348459   \n",
       "FB15k (no edge types)   14951                        0    592213   \n",
       "FB15k                   14951                        0    592213   \n",
       "reddit                 232965                      602  11606919   \n",
       "\n",
       "                      explicit nodes                 \\\n",
       "                                time memory (graph)   \n",
       "Cora                        0.027746       15607738   \n",
       "BlogCatalog3                0.037114        6069823   \n",
       "FB15k (no edge types)       0.117117        6398526   \n",
       "FB15k                       0.653126       12071938   \n",
       "reddit                      4.354689      712112941   \n",
       "\n",
       "                                                            \\\n",
       "                      memory (graph, not shared with data)   \n",
       "Cora                                              15587033   \n",
       "BlogCatalog3                                       8864526   \n",
       "FB15k (no edge types)                              5412944   \n",
       "FB15k                                             15677213   \n",
       "reddit                                           712119784   \n",
       "\n",
       "                                                                              \\\n",
       "                      peak memory (graph) peak memory (graph, ignoring data)   \n",
       "Cora                             46764081                           31079432   \n",
       "BlogCatalog3                     21775036                           16105955   \n",
       "FB15k (no edge types)            33209916                           17529813   \n",
       "FB15k                            60081827                           39179000   \n",
       "reddit                         3551635605                         2243949792   \n",
       "\n",
       "                                                        \\\n",
       "                      memory (data) peak memory (data)   \n",
       "Cora                       15684649           31995281   \n",
       "BlogCatalog3                5669081           10805733   \n",
       "FB15k (no edge types)      15680103           25831842   \n",
       "FB15k                      20902827           35792614   \n",
       "reddit                   1307685813         1307694920   \n",
       "\n",
       "                      inferred nodes (no features)                 \\\n",
       "                                              time memory (graph)   \n",
       "Cora                                      0.003061          94074   \n",
       "BlogCatalog3                              0.029837        6068735   \n",
       "FB15k (no edge types)                     0.187143        6398542   \n",
       "FB15k                                     0.730613       12071938   \n",
       "reddit                                    0.688833      153908015   \n",
       "\n",
       "                                                            \\\n",
       "                      memory (graph, not shared with data)   \n",
       "Cora                                                 96441   \n",
       "BlogCatalog3                                       8861638   \n",
       "FB15k (no edge types)                              5535640   \n",
       "FB15k                                             15799909   \n",
       "reddit                                           153909618   \n",
       "\n",
       "                                                                              \\\n",
       "                      peak memory (graph) peak memory (graph, ignoring data)   \n",
       "Cora                               374259                             284126   \n",
       "BlogCatalog3                     21689836                           16106851   \n",
       "FB15k (no edge types)            34645128                           19090289   \n",
       "FB15k                            60082291                           39304744   \n",
       "reddit                          622398852                          436684187   \n",
       "\n",
       "                                                        \n",
       "                      memory (data) peak memory (data)  \n",
       "Cora                          90133             197529  \n",
       "BlogCatalog3                5582985           10711633  \n",
       "FB15k (no edge types)      15554839           25050795  \n",
       "FB15k                      20777547           35011567  \n",
       "reddit                    185714665          185723196  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw = pd.DataFrame(recorded, columns=columns, index=datasets.keys())\n",
    "raw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pretty results\n",
    "\n",
    "This shows the results in a prettier way, such as memory in MB instead of bytes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">graph</th>\n",
       "      <th colspan=\"7\" halign=\"left\">explicit nodes</th>\n",
       "      <th colspan=\"7\" halign=\"left\">inferred nodes (no features)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>nodes</th>\n",
       "      <th>node feat size</th>\n",
       "      <th>edges</th>\n",
       "      <th>time</th>\n",
       "      <th>memory (graph)</th>\n",
       "      <th>memory (graph, not shared with data)</th>\n",
       "      <th>peak memory (graph)</th>\n",
       "      <th>peak memory (graph, ignoring data)</th>\n",
       "      <th>memory (data)</th>\n",
       "      <th>peak memory (data)</th>\n",
       "      <th>time</th>\n",
       "      <th>memory (graph)</th>\n",
       "      <th>memory (graph, not shared with data)</th>\n",
       "      <th>peak memory (graph)</th>\n",
       "      <th>peak memory (graph, ignoring data)</th>\n",
       "      <th>memory (data)</th>\n",
       "      <th>peak memory (data)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Cora</th>\n",
       "      <td>2708</td>\n",
       "      <td>1433</td>\n",
       "      <td>5429</td>\n",
       "      <td>0.027746</td>\n",
       "      <td>15.608</td>\n",
       "      <td>15.587</td>\n",
       "      <td>46.764</td>\n",
       "      <td>31.079</td>\n",
       "      <td>15.685</td>\n",
       "      <td>31.995</td>\n",
       "      <td>0.003061</td>\n",
       "      <td>0.094</td>\n",
       "      <td>0.096</td>\n",
       "      <td>0.374</td>\n",
       "      <td>0.284</td>\n",
       "      <td>0.090</td>\n",
       "      <td>0.198</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BlogCatalog3</th>\n",
       "      <td>10351</td>\n",
       "      <td>{'user': 0, 'group': 0}</td>\n",
       "      <td>348459</td>\n",
       "      <td>0.037114</td>\n",
       "      <td>6.070</td>\n",
       "      <td>8.865</td>\n",
       "      <td>21.775</td>\n",
       "      <td>16.106</td>\n",
       "      <td>5.669</td>\n",
       "      <td>10.806</td>\n",
       "      <td>0.029837</td>\n",
       "      <td>6.069</td>\n",
       "      <td>8.862</td>\n",
       "      <td>21.690</td>\n",
       "      <td>16.107</td>\n",
       "      <td>5.583</td>\n",
       "      <td>10.712</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FB15k (no edge types)</th>\n",
       "      <td>14951</td>\n",
       "      <td>0</td>\n",
       "      <td>592213</td>\n",
       "      <td>0.117117</td>\n",
       "      <td>6.399</td>\n",
       "      <td>5.413</td>\n",
       "      <td>33.210</td>\n",
       "      <td>17.530</td>\n",
       "      <td>15.680</td>\n",
       "      <td>25.832</td>\n",
       "      <td>0.187143</td>\n",
       "      <td>6.399</td>\n",
       "      <td>5.536</td>\n",
       "      <td>34.645</td>\n",
       "      <td>19.090</td>\n",
       "      <td>15.555</td>\n",
       "      <td>25.051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FB15k</th>\n",
       "      <td>14951</td>\n",
       "      <td>0</td>\n",
       "      <td>592213</td>\n",
       "      <td>0.653126</td>\n",
       "      <td>12.072</td>\n",
       "      <td>15.677</td>\n",
       "      <td>60.082</td>\n",
       "      <td>39.179</td>\n",
       "      <td>20.903</td>\n",
       "      <td>35.793</td>\n",
       "      <td>0.730613</td>\n",
       "      <td>12.072</td>\n",
       "      <td>15.800</td>\n",
       "      <td>60.082</td>\n",
       "      <td>39.305</td>\n",
       "      <td>20.778</td>\n",
       "      <td>35.012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reddit</th>\n",
       "      <td>232965</td>\n",
       "      <td>602</td>\n",
       "      <td>11606919</td>\n",
       "      <td>4.354689</td>\n",
       "      <td>712.113</td>\n",
       "      <td>712.120</td>\n",
       "      <td>3551.636</td>\n",
       "      <td>2243.950</td>\n",
       "      <td>1307.686</td>\n",
       "      <td>1307.695</td>\n",
       "      <td>0.688833</td>\n",
       "      <td>153.908</td>\n",
       "      <td>153.910</td>\n",
       "      <td>622.399</td>\n",
       "      <td>436.684</td>\n",
       "      <td>185.715</td>\n",
       "      <td>185.723</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        graph                                     \\\n",
       "                        nodes           node feat size     edges   \n",
       "Cora                     2708                     1433      5429   \n",
       "BlogCatalog3            10351  {'user': 0, 'group': 0}    348459   \n",
       "FB15k (no edge types)   14951                        0    592213   \n",
       "FB15k                   14951                        0    592213   \n",
       "reddit                 232965                      602  11606919   \n",
       "\n",
       "                      explicit nodes                 \\\n",
       "                                time memory (graph)   \n",
       "Cora                        0.027746         15.608   \n",
       "BlogCatalog3                0.037114          6.070   \n",
       "FB15k (no edge types)       0.117117          6.399   \n",
       "FB15k                       0.653126         12.072   \n",
       "reddit                      4.354689        712.113   \n",
       "\n",
       "                                                            \\\n",
       "                      memory (graph, not shared with data)   \n",
       "Cora                                                15.587   \n",
       "BlogCatalog3                                         8.865   \n",
       "FB15k (no edge types)                                5.413   \n",
       "FB15k                                               15.677   \n",
       "reddit                                             712.120   \n",
       "\n",
       "                                                                              \\\n",
       "                      peak memory (graph) peak memory (graph, ignoring data)   \n",
       "Cora                               46.764                             31.079   \n",
       "BlogCatalog3                       21.775                             16.106   \n",
       "FB15k (no edge types)              33.210                             17.530   \n",
       "FB15k                              60.082                             39.179   \n",
       "reddit                           3551.636                           2243.950   \n",
       "\n",
       "                                                        \\\n",
       "                      memory (data) peak memory (data)   \n",
       "Cora                         15.685             31.995   \n",
       "BlogCatalog3                  5.669             10.806   \n",
       "FB15k (no edge types)        15.680             25.832   \n",
       "FB15k                        20.903             35.793   \n",
       "reddit                     1307.686           1307.695   \n",
       "\n",
       "                      inferred nodes (no features)                 \\\n",
       "                                              time memory (graph)   \n",
       "Cora                                      0.003061          0.094   \n",
       "BlogCatalog3                              0.029837          6.069   \n",
       "FB15k (no edge types)                     0.187143          6.399   \n",
       "FB15k                                     0.730613         12.072   \n",
       "reddit                                    0.688833        153.908   \n",
       "\n",
       "                                                            \\\n",
       "                      memory (graph, not shared with data)   \n",
       "Cora                                                 0.096   \n",
       "BlogCatalog3                                         8.862   \n",
       "FB15k (no edge types)                                5.536   \n",
       "FB15k                                               15.800   \n",
       "reddit                                             153.910   \n",
       "\n",
       "                                                                              \\\n",
       "                      peak memory (graph) peak memory (graph, ignoring data)   \n",
       "Cora                                0.374                              0.284   \n",
       "BlogCatalog3                       21.690                             16.107   \n",
       "FB15k (no edge types)              34.645                             19.090   \n",
       "FB15k                              60.082                             39.305   \n",
       "reddit                            622.399                            436.684   \n",
       "\n",
       "                                                        \n",
       "                      memory (data) peak memory (data)  \n",
       "Cora                          0.090              0.198  \n",
       "BlogCatalog3                  5.583             10.712  \n",
       "FB15k (no edge types)        15.555             25.051  \n",
       "FB15k                        20.778             35.012  \n",
       "reddit                      185.715            185.723  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mem_columns = raw.columns[[\"memory\" in x[1] for x in raw.columns]]\n",
    "\n",
    "memory_mb = raw.copy()\n",
    "memory_mb[mem_columns] = (memory_mb[mem_columns] / 10 ** 6).round(3)\n",
    "memory_mb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden",
    "tags": [
     "CloudRunner"
    ]
   },
   "source": [
    "<table><tr><td>Run the latest release of this notebook:</td><td><a href=\"https://mybinder.org/v2/gh/stellargraph/stellargraph/master?urlpath=lab/tree/demos/internal-developers/graph-resource-usage.ipynb\" alt=\"Open In Binder\" target=\"_parent\"><img src=\"https://mybinder.org/badge_logo.svg\"/></a></td><td><a href=\"https://colab.research.google.com/github/stellargraph/stellargraph/blob/master/demos/internal-developers/graph-resource-usage.ipynb\" alt=\"Open In Colab\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\"/></a></td></tr></table>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}