{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 使用兼容 NetworkX 的 API 进行图操作\n", "\n", "GraphScope 支持使用兼容 NetworkX 的 API 进行图操作。\n", "本次教程参考了 [tutorial in NetworkX](https://networkx.org/documentation/stable/tutorial.html) 的组织方式来介绍这些 API。\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install graphscope package if you are NOT in the Playground\n", "\n", "!pip3 install graphscope" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the graphscope and graphscope networkx module.\n", "\n", "import graphscope\n", "import graphscope.nx as nx\n", "\n", "graphscope.set_option(show_log=False) # enable logging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 创建图\n", "\n", "创建一个空图,只需要简单地创建一个 Graph 对象。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 点\n", "\n", "图 `G` 可以通过多种方式进行扩充。 在 graphscope.nx 中, 支持一些可哈希的 Python object 作为图的点, 其中包括 int,str,float,tuple,bool 对象。\n", "首先,我们从空图和简单的图操作开始,如下所示,你可以一次增加一个顶点," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_node(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "也可以从任何[可迭代](https://docs.python.org/3/glossary.html#term-iterable)的容器中增加顶点,例如一个列表" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_nodes_from([2, 3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "你也可以通过格式为`(node, node_attribute_dict)`的二元组的容器, 将点的属性和点一起添加,如下所示:\n", "\n", "点属性我们将在后面进行讨论。\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_nodes_from([\n", " (4, {\"color\": \"red\"}),\n", " (5, {\"color\": \"green\"}),\n", "])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一个图的节点也可以直接添加到另一个图中:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H = nx.path_graph(10)\n", "G.add_nodes_from(H)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "经过上面的操作后,现在图 `G` 中包含了图 `H` 的节点。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes.data()) # shows the node attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 边\n", "\n", "图 `G` 也可以一次增加一条边来进行扩充," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 2)\n", "e = (2, 3)\n", "G.add_edge(*e) # unpack edge tuple*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "或者通过一次增加包含多条边的list," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from([(1, 2), (1, 3)])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "或者通过增加任意 `ebunch` 的边。 *ebunch* 表示任意一个可迭代的边元组的容器。一个边元组可以是一个只包含首尾两个顶点的二元组,例如 `(1, 3)` ,或者一个包含顶点和边属性字典的三元组,例如 `(2, 3, {'weight': 3.1415})`。\n", "\n", "边属性我们会在后面进行讨论" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from([(2, 3, {'weight': 3.1415})])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges.data()) # shows the edge arrtibutes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "用户也可以通过 `.update(nodes, edges)` 同时增加点和边" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.update(edges=[(10, 11), (11, 12)], nodes=[10, 11, 12])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "当增加已存在的点或边时,这些点和边会被忽略,不会产生报错。如下所示,在去除掉所有的点和边之后," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.clear()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "这里我们增加点和边,graphscope.nx 会忽略掉已经存在的点和边。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from([(1, 2), (1, 3)])\n", "G.add_node(1)\n", "G.add_edge(1, 2)\n", "G.add_node(\"spam\") # adds node \"spam\"\n", "G.add_nodes_from(\"spam\") # adds 4 nodes: 's', 'p', 'a', 'm'\n", "G.add_edge(3, 'm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "目前图 `G` 共包含8个顶点和3条边,可以使用如下所示方法进行查看:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.number_of_nodes()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.number_of_edges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## 查看图的元素\n", "\n", "我们可以查看图的顶点和边。可以使用四种基本的图属性来查看图元素:`G.nodes`,`G.edges`,`G.adj` 和 `G.degree`。这些属性都是 `set-like` 的视图,分别表示图中点,边,点邻居和度数。这些接口提供了一个只读的关于图结构的视图。这些视图也可以像字典一样,用户可以查看点和边的属性,然后通过方法 `.items()`,`.data('span')` 遍历数据属性。\n", "\n", "用户可以指定使用一个特定的容器类型,而不是一个视图。这里我们使用了lists,然而sets, dicts, tuples和其他容器可能在其他情况下更合适。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.adj[1]) # or list(G.neighbors(1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.degree[1] # the number of edges incident to 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "用户可以使用一个 *nbunch* 来查看一个点子集的边和度。一个 *nbunch* 可以是 `None` (表示全部节点),一个节点或者一个可迭代的顶点容器。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges([2, 'm'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.degree([2, 3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## 删除图元素\n", "\n", "用户可以使用类似增加节点和边的方式来从图中删除顶点和边。\n", "相关方法\n", "`Graph.remove_node()`,\n", "`Graph.remove_nodes_from()`,\n", "`Graph.remove_edge()`\n", "和\n", "`Graph.remove_edges_from()`, 例如" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.remove_node(2)\n", "G.remove_nodes_from(\"spam\")\n", "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.remove_edge(1, 3)\n", "G.remove_edges_from([(1, 2), (2, 3)])\n", "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## 使用图构造函数来构建图\n", "\n", "图对象并不一定需要以增量的方式构建 - 用户可以直接将图数据传给 Graph/DiGraph 的构造函数来构建图对象。\n", "当通过实例化一个图类来创建一个图结构时,用户可以使用多种格式来指定图数据, 如下所示。\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 2)\n", "H = nx.DiGraph(G) # create a DiGraph using the connections from G\n", "list(H.edges())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "edgelist = [(0, 1), (1, 2), (2, 3)]\n", "H = nx.Graph(edgelist)\n", "list(H.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 访问边和邻居\n", "\n", "除了通过 `Graph.edges` 和 `Graph.adj` 视图外,用户也可以通过下标来访问边和顶点的邻居;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph([(1, 2, {\"color\": \"yellow\"})])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G[1] # same as G.adj[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G[1][2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges[1, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当边已经存在时,可以通过下标来获取或设置边的属性:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 3)\n", "G[1][3]['color'] = \"blue\"\n", "G.edges[1, 3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges[1, 2]['color'] = \"red\"\n", "G.edges[1, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "用户可以通过 `G.adjacency()`, 或 `G.adj.items()` 快速地查看所有点的 `(节点,邻居)`对。如下所示:\n", "\n", "注意当图是无向图时,每条边会在遍历时出现两次。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FG = nx.Graph()\n", "FG.add_weighted_edges_from([(1, 2, 0.125), (1, 3, 0.75), (2, 4, 1.2), (3, 4, 0.375)])\n", "for n, nbrs in FG.adj.items():\n", " for nbr, eattr in nbrs.items():\n", " wt = eattr[\"weight\"]\n", " if wt < 0.5:\n", " print(f\"({n}, {nbr}, {wt:.3})\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如下所示,可以方便地访问所有边和边的属性。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for (u, v, wt) in FG.edges.data('weight'):\n", " if wt < 0.5:\n", " print(f\"({u}, {v}, {wt:.3})\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 添加图属性,顶点属性和边属性\n", "\n", "属性如权重、标签、颜色等可以被attach到图、点或者边上。\n", "\n", "每个图、节点和边都可以保存 key/value 属性,默认属性是空。属性可以通过 `add_edge`, `add_node` 或直接对属性字典进行操作来增加或修改属性。\n", "\n", "### 图属性\n", "\n", "在创建新图的时候定义图属性" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph(day=\"Friday\")\n", "G.graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "或者在创建后修改图属性" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.graph['day'] = \"Monday\"\n", "G.graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 节点属性\n", "\n", "可以使用 `add_node()`, `add_nodes_from()`, or `G.nodes` 等方法增加节点属性。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_node(1, time='5pm')\n", "G.add_nodes_from([3], time='2pm')\n", "G.nodes[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.nodes[1]['room'] = 714\n", "G.nodes.data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "注意 向 `G.nodes` 增加一个节点并不会真正增加节点到图中,如果需要增加新节点,应该使用 `G.add_node()`. 边的使用同样如此。\n", "\n", "\n", "### 边属性\n", "\n", "\n", "可以通过 `add_edge()`, `add_edges_from()` 或下标来增加或修改边属性。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 2, weight=4.7 )\n", "G.add_edges_from([(3, 4), (4, 5)], color='red')\n", "G.add_edges_from([(1, 2, {'color': 'blue'}), (2, 3, {'weight': 8})])\n", "G[1][2]['weight'] = 4.7\n", "G.edges[3, 4]['weight'] = 4.2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges.data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "特殊的属性如 `weight` 的值应该是数值型,因为一些需要带权重的边的算法会使用到这一属性。\n", "\n", "\n", "## 抽取子图和边子图\n", "\n", "graphscope.nx 支持通过传入一个点集或边集来抽取一个 `deepcopy` 的子图。\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.path_graph(10)\n", "# induce a subgraph by nodes\n", "H = G.subgraph([0, 1, 2])\n", "list(H.nodes)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# induce a edge subgraph by edges\n", "K = G.edge_subgraph([(1, 2), (3, 4)])\n", "list(K.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(K.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "需要注意的是,这里抽取子图与NetworkX的实现有一些区别,NetworkX返回的是一个子图的视图,但 graphscope.nx 返回的子图是一个独立于原始图的子图或边子图。\n", "\n", "## 图的拷贝\n", "\n", "用户可以使用 `to_directed` 方法来获取一个图的有向表示。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG = G.to_directed() # here would return a \"deepcopy\" directed representation of G.\n", "list(DG.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "DGv = G.to_directed(as_view=True) # return a view.\n", "list(DGv.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "DG = nx.DiGraph(G) # return a \"deepcopy\" of directed representation of G.\n", "list(DG.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "或者通过 `copy` 方法得到一个图的拷贝。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H = G.copy() # return a view of copy\n", "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "H = G.copy(as_view=False) # return a \"deepcopy\" copy\n", "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "H = nx.Graph(G) # return a \"deepcopy\" copy\n", "list(H.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "注意,graphscope.nx 不支持浅拷贝。\n", "\n", "\n", "## 有向图\n", "\n", "`DiGraph` 类提供了额外的方法和属性来指定有向边,如:`DiGraph.out_edges`, `DiGraph.in_degree`,\n", "`DiGraph.predecessors()`, `DiGraph.successors()` etc.\n", "\n", "为了让算法方便地在两种图类型上运行,有向版本的 `neighbors` 等同于 `successors()` ,`degree` 返回 `in_degree` 和 `out_degree` 的和。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG = nx.DiGraph()\n", "DG.add_weighted_edges_from([(1, 2, 0.5), (3, 1, 0.75)])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG.out_degree(1, weight='weight')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG.degree(1, weight='weight')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(DG.successors(1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(DG.neighbors(1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(DG.predecessors(1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "在 graphscope.nx 中,存在有些算法仅能用于有向图的分析,而另一些算法仅能用于无向图的分析。如果你想将一个有向图转化为无向图,你可以使用 `Graph.to_undirected()`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H = DG.to_undirected() # return a \"deepcopy\" of undirected represetation of DG.\n", "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "H = nx.Graph(DG) # create an undirected graph H from a directed graph G\n", "list(H.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "DiGraph 也可以通过 `DiGraph.reverse()` 来反转边。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K = DG.reverse() # retrun a \"deepcopy\" of reversed copy.\n", "list(K.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "K = DG.reverse(copy=False) # return a view of reversed copy.\n", "list(K.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 图分析\n", "图 `G` 的结构可以通过使用各式各样的图理论函数进行分析,例如:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "G.add_edges_from([(1, 2), (1, 3)])\n", "G.add_node(4)\n", "sorted(d for n, d in G.degree())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.builtin.clustering(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "在 graphscope.nx 中,我们支持用于图分析的内置算法,算法的详细内容可以参考 [builtin algorithm](https://graphscope.io/docs/reference/networkx/builtin.html)\n", "\n", "\n", "## 通过 GraphScope graph object来创建图\n", "\n", "除了通过networkx的方式创建图之外,我们也可以使用标准的 `GraphScope` 的方式创建图,这一部分将会在下一个教程中进行介绍,下面我们展示一个简单的示例:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# we load a GraphScope graph with load_ldbc\n", "from graphscope.dataset import load_ldbc\n", "graph = load_ldbc(directed=False)\n", "\n", "# create graph with the GraphScope graph object\n", "G = nx.Graph(graph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## 将图转化为graphscope.graph\n", "\n", "\n", "正如同 graphscope.nx Graph 可以从 GraphScope graph 转化而来,graphscope.nx Graph也可以转化为 GraphScope graph. 例如:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nodes = [(0, {\"foo\": 0}), (1, {\"foo\": 1}), (2, {\"foo\": 2})]\n", "edges = [(0, 1, {\"weight\": 0}), (0, 2, {\"weight\": 1}), (1, 2, {\"weight\": 2})]\n", "G = nx.Graph()\n", "G.update(edges, nodes)\n", "g = graphscope.g(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], 