{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 用 GraphScope 像 NetworkX 一样进行图分析\n",
    "\n",
    "GraphScope 在大规模图分析基础上，提供了一套兼容 NetworkX 的图分析接口。\n",
    "本文中我们将介绍如何用 GraphScope 像 NetworkX 一样进行图分析。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NetworkX 是如何进行图分析的\n",
    "\n",
    "NetworkX 的图分析过程一般首先进行图的构建，示例中我们首先建立一个空图，然后通过 NetworkX 的接口逐步扩充图的数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install graphscope package if you are NOT in the Playground\n",
    "\n",
    "!pip3 install graphscope"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import networkx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 初始化一个空的无向图\n",
    "G = networkx.Graph()\n",
    "# 通过 add_edges_from 接口添加边列表，此处添加了两条边(1, 2)和（1， 3）\n",
    "G.add_edges_from([(1, 2), (1, 3)])\n",
    "# 通过 add_node 添加点4\n",
    "G.add_node(4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接着使用 NetworkX 来查询一些图的信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用 G.number_of_nodes 查询图G目前点的数目\n",
    "G.number_of_nodes()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 类似地，G.number_of_edges 可以查询图G中边的数量\n",
    "G.number_of_edges()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 G.degree 来查看图G中每个点的度数\n",
    "sorted(d for n, d in G.degree())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后通过调用 NetworkX 内置的算法对图G进行分析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 调用 connected_components 算法分析图G的联通分量\n",
    "list(networkx.connected_components(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 调用 clustering 算法分析图G的聚类情况\n",
    "networkx.clustering(G)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 如何使用 GraphScope 的 NetworkX 接口"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "图的构建\n",
    "\n",
    "使用 GraphScope 的 NetworkX 兼容接口，我们只需要简单地将教程中的`import netwokx as nx`替换为`import graphscope.nx as nx`即可。\n",
    "GraphScope 支持与 NetworkX 完全相同的载图语法，这里我们使用`nx.Graph()`来建立一个空的无向图。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import graphscope.nx as nx\n",
    "# 我们可以建立一个空的无向图\n",
    "G = nx.Graph()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "增加节点和边\n",
    "\n",
    "GraphScope 的图操作接口也保持了与 NetworkX 的兼容，用户可以通过`add_node`和 `add_nodes_from`来添加节点，通过`add_edge`和`add_edges_from`来添加边。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 add_node 一次添加一个节点\n",
    "G.add_node(1)\n",
    "\n",
    "# 或从任何 iterable 容器中添加节点，如列表\n",
    "G.add_nodes_from([2, 3])\n",
    "\n",
    "# 如果容器中是元组的形式，还可以在添加节点的同时，添加节点属性\n",
    "G.add_nodes_from([(4, {\"color\": \"red\"}), (5, {\"color\": \"green\"})])\n",
    "\n",
    "# 对于边，可以通过 add_edge 的一次添加一条边\n",
    "G.add_edge(1, 2)\n",
    "e = (2, 3)\n",
    "G.add_edge(*e)\n",
    "\n",
    "# 通过 add_edges_from 添加边列表\n",
    "G.add_edges_from([(1, 2), (1, 3)])\n",
    "\n",
    "# 或者通过边元组的方式，在添加边的同时，添加边的属性\n",
    "G.add_edges_from([(1, 2), (2, 3, {'weight': 3.1415})])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查询图的元素\n",
    "\n",
    "GraphScope 支持兼容 NetworkX 的图查询接口。用户可以通过`number_of_nodes`和`number_of_edges`来获取图点和边的数量，通过`nodes`, `edges`,`adj`和`degree`等接口来获取图当前的点和边，以及点的邻居和度数等信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查询目前图中点和边的数目\n",
    "G.number_of_nodes()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G.number_of_edges()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 列出目前图中的点和边\n",
    "list(G.nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "list(G.edges)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查询某个点的邻居\n",
    "list(G.adj[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查询某个点的度\n",
    "G.degree(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从图中删除元素\n",
    "\n",
    "像 NetworkX 一样, GraphScope 也可以使用与添加元素相类似的方式从图中删除点和边，对图进行修改。例如可以通过`remove_node`和`remove_nodes_from`来删除图中的节点，通过`remove_edge`和`remove_edges_from`来删除图中的边。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 remove_node 删除一个点\n",
    "G.remove_node(5)\n",
    "list(G.nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G.remove_nodes_from([4, 5])\n",
    "list(G.nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 remove_edge 删除一条边\n",
    "G.remove_edge(1, 2)\n",
    "list(G.edges)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 remove_edges_from 删除多条边\n",
    "G.remove_edges_from([(1, 3), (2, 3)])\n",
    "list(G.edges)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 我们再来看一下现在的点和边的数目\n",
    "G.number_of_nodes()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G.number_of_edges()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "图分析\n",
    "\n",
    "GraphScope 可以通过兼容 NetworkX 的接口来对图进行各种算法的分析，示例里我们构建了一个简单图，然后分别使用`connected_components`分析图的联通分量，使用`clustering`来得到图中每个点的聚类系数，以及使用`all_pairs_shortest_path`来获取节点两两之间的最短路径。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 首先构建图\n",
    "G = nx.Graph()\n",
    "G.add_edges_from([(1, 2), (1, 3)])\n",
    "G.add_node(4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 connected_components 算法找出图中的联通分量\n",
    "list(nx.connected_components(G))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 clustering 算法计算每个点的聚类系数\n",
    "nx.clustering(G)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sp = dict(nx.all_pairs_shortest_path(G))\n",
    "sp[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "图的简单绘制\n",
    "\n",
    "同 NetworkX 一样，GraphScope支持通过`draw`将图进行简单地绘制出来，底层依赖的是`matplotlib`的绘图功能。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果系统未安装`matplotlib`， 我们首先需要安装一下`matplotlib`包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 GraphScope 来进行简单地绘制图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 创建一个5个点的 star graph\n",
    "G = nx.star_graph(5)\n",
    "# 使用 nx.draw 绘制图\n",
    "nx.draw(G, with_labels=True, font_weight='bold')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GraphScope 相对 NetworkX 算法性能上有着数量级的提升\n",
    "我们通过一个简单的实验来看一下 GraphScope 对比 NetworkX 在算法性能上到底提升多少。\n",
    "\n",
    "实验使用来自 snap 的 [twitter] 图数据(https://snap.stanford.edu/data/ego-Twitter.html), 测试算法是 NetworkX 内置的 [clustering](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html#networkx.algorithms.cluster.clustering) 算法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们首先准备下数据，使用 wget 将数据集下载到本地"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://raw.githubusercontent.com/GraphScope/gstest/master/twitter.e -P /tmp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接着我们分别使用 GraphScope 和 NetworkX 载入 snap-twitter 数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import graphscope.nx as gs_nx\n",
    "import networkx as nx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用 NetworkX 载入 snap-twitter 图数据\n",
    "g1 = nx.read_edgelist(\n",
    "     os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=nx.Graph\n",
    ")\n",
    "type(g1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用 GraphScope 载入相同的 snap-twitter 图数据\n",
    "g2 = gs_nx.read_edgelist(\n",
    "     os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=gs_nx.Graph\n",
    ")\n",
    "type(g2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后我们使用 clustering 算法来对图进行聚类分析，来看一下 GraphScope 对比 NetworkX 在算法性能上有多少提升"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "# 使用 GraphScope 计算图中每个点的聚类系数\n",
    "ret_gs = gs_nx.clustering(g2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "# 使用 NetworkX 计算图中每个点的聚类系数\n",
    "ret_nx = nx.clustering(g1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 对比下两者的结果是否一致\n",
    "ret_gs == ret_nx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}