{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Graph Manipulations with NetowrkX Compatible APIs\n", "\n", "GraphScope supports graph manipulations with NetworkX compatible APIs.\n", "This tutorial go through these APIs following the organization of [tutorial in NetworkX](https://networkx.org/documentation/stable/tutorial.html).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install graphscope package if you are NOT in the Playground\n", "\n", "!pip3 install graphscope" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the graphscope and graphscope networkx module.\n", "\n", "import graphscope\n", "import graphscope.nx as nx\n", "\n", "graphscope.set_option(show_log=True) # enable logging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a graph\n", "\n", "Create an empty graph with no nodes and no edges, simply new a `Graph` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nodes\n", "\n", "The graph `G` can be grown in several ways. In graphscope.nx, nodes can be some hashable Python object, including int, str, float, tuple, bool object.\n", "To get started though we’ll look at simple manipulations and start from empty graph. You can add one node\n", "at a time," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_node(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or add nodes from any [iterable](https://docs.python.org/3/glossary.html#term-iterable) container, such as a list" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_nodes_from([2, 3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also add nodes along with node\n", "attributes if your container yields 2-tuples of the form\n", "`(node, node_attribute_dict)`:\n", "\n", "Node attributes are discussed further below.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_nodes_from(\n", " [\n", " (4, {\"color\": \"red\"}),\n", " (5, {\"color\": \"green\"}),\n", " ]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nodes from one graph can be incorporated into another:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H = nx.path_graph(10)\n", "G.add_nodes_from(H)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`G` now contains the nodes of `H` as nodes of `G`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes.data()) # shows the node attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Edges\n", "\n", "`G` can also be grown by adding one edge at a time," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 2)\n", "e = (2, 3)\n", "G.add_edge(*e) # unpack edge tuple*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "by adding a list of edges," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from([(1, 2), (1, 3)])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or by adding any ebunch of edges. An *ebunch* is any iterable\n", "container of edge-tuples. An edge-tuple can be a 2-tuple of nodes or a 3-tuple\n", "with 2 nodes followed by an edge attribute dictionary, e.g.,\n", "`(2, 3, {'weight': 3.1415})`. Edge attributes are discussed further\n", "below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from([(2, 3, {\"weight\": 3.1415})])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges.data()) # shows the edge arrtibutes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or one can add new nodes and edges once with `.update(nodes, edges)`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.update(edges=[(10, 11), (11, 12)], nodes=[10, 11, 12])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are no complaints when adding existing nodes or edges. For example,\n", "after removing all nodes and edges," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.clear()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we add new nodes/edges and graphscope.nx quietly ignores any that are\n", "already present." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edges_from([(1, 2), (1, 3)])\n", "G.add_node(1)\n", "G.add_edge(1, 2)\n", "G.add_node(\"spam\") # adds node \"spam\"\n", "G.add_nodes_from(\"spam\") # adds 4 nodes: 's', 'p', 'a', 'm'\n", "G.add_edge(3, \"m\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this stage the graph `G` consists of 8 nodes and 3 edges, as can be seen by:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.number_of_nodes()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.number_of_edges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examining elements of a graph\n", "\n", "We can examine the nodes and edges. Four basic graph properties facilitate\n", "reporting: `G.nodes`, `G.edges`, `G.adj` and `G.degree`. These\n", "are set-like views of the nodes, edges, neighbors (adjacencies), and degrees\n", "of nodes in a graph. They offer a continually updated read-only view into\n", "the graph structure. They are also dict-like in that you can look up node\n", "and edge data attributes via the views and iterate with data attributes\n", "using methods `.items()`, `.data('span')`.\n", "If you want a specific container type instead of a view, you can specify one.\n", "Here we use lists, though sets, dicts, tuples and other containers may be\n", "better in other contexts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.adj[1]) # or list(G.neighbors(1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.degree[1] # the number of edges incident to 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can specify to report the edges and degree from a subset of all nodes\n", "using an nbunch. An *nbunch* is any of: `None` (meaning all nodes),\n", "a node, or an iterable container of nodes that is not itself a node in the\n", "graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges([2, \"m\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.degree([2, 3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Removing elements from a graph\n", "\n", "One can remove nodes and edges from the graph in a similar fashion to adding.\n", "Use methods\n", "`Graph.remove_node()`,\n", "`Graph.remove_nodes_from()`,\n", "`Graph.remove_edge()`\n", "and\n", "`Graph.remove_edges_from()`, e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.remove_node(2)\n", "G.remove_nodes_from(\"spam\")\n", "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(G.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.remove_edge(1, 3)\n", "G.remove_edges_from([(1, 2), (2, 3)])\n", "list(G.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the graph constructors\n", "\n", "Graph objects do not have to be built up incrementally - data specifying\n", "graph structure can be passed directly to the constructors of the various\n", "graph classes.\n", "When creating a graph structure by instantiating one of the graph\n", "classes you can specify data in several formats." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 2)\n", "H = nx.DiGraph(G) # create a DiGraph using the connections from G\n", "list(H.edges())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "edgelist = [(0, 1), (1, 2), (2, 3)]\n", "H = nx.Graph(edgelist)\n", "list(H.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Accessing edges and neighbors\n", "\n", "In addition to the views `Graph.edges`, and `Graph.adj`,\n", "access to edges and neighbors is possible using subscript notation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph([(1, 2, {\"color\": \"yellow\"})])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G[1] # same as G.adj[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G[1][2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges[1, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can get/set the attributes of an edge using subscript notation\n", "if the edge already exists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 3)\n", "G[1][3][\"color\"] = \"blue\"\n", "G.edges[1, 3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges[1, 2][\"color\"] = \"red\"\n", "G.edges[1, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fast examination of all (node, adjacency) pairs is achieved using\n", "`G.adjacency()`, or `G.adj.items()`.\n", "Note that for undirected graphs, adjacency iteration sees each edge twice." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FG = nx.Graph()\n", "FG.add_weighted_edges_from([(1, 2, 0.125), (1, 3, 0.75), (2, 4, 1.2), (3, 4, 0.375)])\n", "for n, nbrs in FG.adj.items():\n", " for nbr, eattr in nbrs.items():\n", " wt = eattr[\"weight\"]\n", " if wt < 0.5:\n", " print(f\"({n}, {nbr}, {wt:.3})\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convenient access to all edges is achieved with the edges property." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for (u, v, wt) in FG.edges.data(\"weight\"):\n", " if wt < 0.5:\n", " print(f\"({u}, {v}, {wt:.3})\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding attributes to graphs, nodes, and edges\n", "\n", "Attributes such as weights, labels, colors,\n", "can be attached to graphs, nodes, or edges.\n", "\n", "Each graph, node, and edge can hold key/value attribute. By default these are empty,\n", "but attributes can be added or changed using `add_edge`, `add_node` or direct\n", "manipulation of the attribute dictionaries named `G.graph`, `G.nodes`, and\n", "`G.edges` for a graph `G`.\n", "\n", "### Graph attributes\n", "\n", "Assign graph attributes when creating a new graph" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph(day=\"Friday\")\n", "G.graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can modify attributes later" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.graph[\"day\"] = \"Monday\"\n", "G.graph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Node attributes\n", "\n", "Add node attributes using `add_node()`, `add_nodes_from()`, or `G.nodes`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_node(1, time=\"5pm\")\n", "G.add_nodes_from([3], time=\"2pm\")\n", "G.nodes[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.nodes[1][\"room\"] = 714\n", "G.nodes.data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that adding a node to `G.nodes` does not add it to the graph, use\n", "`G.add_node()` to add new nodes. Similarly for edges.\n", "\n", "### Edge Attributes\n", "\n", "Add/change edge attributes using `add_edge()`, `add_edges_from()`,\n", "or subscript notation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.add_edge(1, 2, weight=4.7)\n", "G.add_edges_from([(3, 4), (4, 5)], color=\"red\")\n", "G.add_edges_from([(1, 2, {\"color\": \"blue\"}), (2, 3, {\"weight\": 8})])\n", "G[1][2][\"weight\"] = 4.7\n", "G.edges[3, 4][\"weight\"] = 4.2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.edges.data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The special attribute `weight` should be numeric as it is used by algorithms requiring weighted edges.\n", "\n", "## Induce deepcopy subgraph and edge_subgraph\n", "\n", "graphscope.nx support induce a `deepcopy` subgraph by given node set or edge set.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.path_graph(10)\n", "# induce a subgraph by nodes\n", "H = G.subgraph([0, 1, 2])\n", "list(H.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# induce a edge subgraph by edges\n", "K = G.edge_subgraph([(1, 2), (3, 4)])\n", "list(K.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(K.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: different with subgraph/edge_subgraph api in NetworkX which return a view, graphscope.nx return a deepcopy of subgraph/edge_subgraph.\n", "\n", "## Making copies\n", "\n", "One can use to_directed to return a directed representaion of the graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG = G.to_directed() # here would return a \"deepcopy\" directed representation of G.\n", "list(DG.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "DGv = G.to_directed(as_view=True) # return a view.\n", "list(DGv.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "DG = nx.DiGraph(G) # return a \"deepcopy\" of directed representation of G.\n", "list(DG.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or get a copy of the graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H = G.copy() # return a view of copy\n", "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "H = G.copy(as_view=False) # return a \"deepcopy\" copy\n", "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "H = nx.Graph(G) # return a \"deepcopy\" copy\n", "list(H.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: graphscope.nx not support shallow copy of the graph.\n", "\n", "\n", "## Directed graphs\n", "\n", "The `DiGraph` class provides additional methods and properties specific\n", "to directed edges, e.g.,\n", "`DiGraph.out_edges`, `DiGraph.in_degree`,\n", "`DiGraph.predecessors()`, `DiGraph.successors()` etc.\n", "To allow algorithms to work with both classes easily, the directed versions of\n", "`neighbors()` is equivalent to `successors()` while `degree` reports\n", "the sum of `in_degree` and `out_degree` even though that may feel\n", "inconsistent at times." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG = nx.DiGraph()\n", "DG.add_weighted_edges_from([(1, 2, 0.5), (3, 1, 0.75)])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG.out_degree(1, weight=\"weight\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DG.degree(1, weight=\"weight\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(DG.successors(1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(DG.neighbors(1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(DG.predecessors(1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some algorithms work only for directed graphs and others are not well\n", "defined for directed graphs. Indeed the tendency to lump directed\n", "and undirected graphs together is dangerous. If you want to treat\n", "a directed graph as undirected for some measurement you should probably\n", "convert it using `Graph.to_undirected()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "H = DG.to_undirected() # return a \"deepcopy\" of undirected represetation of DG.\n", "list(H.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "H = nx.Graph(DG) # create an undirected graph H from a directed graph G\n", "list(H.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Directed Graph also support to reverse edge using `DiGraph.reverse()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K = DG.reverse() # retrun a \"deepcopy\" of reversed copy.\n", "list(K.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# or with\n", "K = DG.reverse(copy=False) # return a view of reversed copy.\n", "list(K.edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyzing graphs\n", "\n", "The structure of `G` can be analyzed using various graph-theoretic\n", "functions such as:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.Graph()\n", "G.add_edges_from([(1, 2), (1, 3)])\n", "G.add_node(4)\n", "sorted(d for n, d in G.degree())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.builtin.clustering(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In graphscope.nx, we support some builtin algorithm for analyzing graph, see [builtin algorithm](https://graphscope.io/docs/reference/networkx/builtin.html) for details on graph algorithms supported.\n", "\n", "\n", "## Create graph from GraphScope graph object\n", "\n", "In addition, we can create a graph in the `GraphScope` way, which is created from GraphScope graph object and will be introduced in the next tutorial." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# we load a GraphScope graph with load_ldbc\n", "from graphscope.dataset import load_ldbc\n", "\n", "graph = load_ldbc(directed=False)\n", "\n", "# create graph with the GraphScope graph object\n", "G = nx.Graph(graph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transform to GraphScope graph object\n", "\n", "As graphscope.nx Graph can create from GraphScope graph, the graphscope.nx graph can transform to GraphScope graph too. e.g" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nodes = [(0, {\"foo\": 0}), (1, {\"foo\": 1}), (2, {\"foo\": 2})]\n", "edges = [(0, 1, {\"weight\": 0}), (0, 2, {\"weight\": 1}), (1, 2, {\"weight\": 2})]\n", "G = nx.Graph()\n", "G.update(edges, nodes)\n", "\n", "# transform to GraphScope graph\n", "g = graphscope.g(G)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }