{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Run GraphScope like NetworkX\n", "\n", "Graphscope provides a set of graph analysis interfaces compatible with Networkx.\n", "\n", "In this article, we will show how to use graphscope to perform graph analysis like Networkx." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How does Networkx perform graph analysis ?\n", "\n", "Usually, the graph analysis process of NetworkX starts with the construction of a graph.\n", "\n", "In the following example, we create an empty graph first, and then expand the data through the interface of NetworkX." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install graphscope package if you are NOT in the Playground\n", "\n", "!pip3 install graphscope" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialize an empty graph\n", "G = networkx.Graph()\n", "\n", "# Add edges (1, 2)and(1 3) by `add_edges_from` interface\n", "G.add_edges_from([(1, 2), (1, 3)])\n", "\n", "# Add vertex \"4\" by `add_node` interface \n", "G.add_node(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can query the graph information." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Query the number of vertices by `number_of_nodes` interface.\n", "G.number_of_nodes()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Similarly, query the number of edges by `number_of_edges` interface.\n", "G.number_of_edges()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Query the degree of each vertex by `degree` interface.\n", "sorted(d for n, d in G.degree())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, calling the builtin algorithm of NetworkX to analysis the graph `G`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run 'connected components' algorithm\n", "list(networkx.connected_components(G))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run 'clustering' algorithm\n", "networkx.clustering(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How to use NetworkX interface from GraphScope" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Graph Building**\n", "\n", "\n", "To use NetworkX interface from graphscope, we just need to replace `import networkx as nx` with `import graphscope.nx as nx `.\n", "\n", "Here we use `nx.Graph()` interace to create an empty undirected graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import graphscope\n", "graphscope.set_option(show_log=True)\n", "import graphscope.nx as nx\n", "\n", "# Initialize an empty graph\n", "G = nx.Graph()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Add edges and vertices**\n", "\n", "Just like operating NetworkX, you can add vertices by `add_node` `add_nodes_from` and add edges by `add_edge` `add_edges_from`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add one vertex by `add_node` interface\n", "G.add_node(1)\n", "\n", "# Or add a batch of vertices from iterable list\n", "G.add_nodes_from([2, 3])\n", "\n", "# Also you can add attributes while adding vertices\n", "G.add_nodes_from([(4, {\"color\": \"red\"}), (5, {\"color\": \"green\"})])\n", "\n", "# Similarly, add one edge by `add_edge` interface\n", "G.add_edge(1, 2)\n", "e = (2, 3)\n", "G.add_edge(*e)\n", "\n", "# Or add a batch of edges from iterable list\n", "G.add_edges_from([(1, 2), (1, 3)])\n", "\n", "# Add attributes while adding edges\n", "G.add_edges_from([(1, 2), (2, 3, {'weight': 3.1415})])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Query Graph**\n", "\n", "\n", "Just like operating NetworkX, you can search the number of vertices/edge by `number_of_nodes`/`number_of_edges` interface, or query the neighbor of vertex by `adj` interface." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Query the number of vertices by `number_of_nodes` interface.\n", "G.number_of_nodes()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Similarly, query the number of edges by `number_of_edges` interface.\n", "G.number_of_edges()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# list the vertices in graph `G`\n", "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# list the edges in graph `G`\n", "list(G.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# query the nerghbors of vertex '1'\n", "list(G.adj[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# search the degree of vertex '1'\n", "G.degree(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Delete**\n", "\n", "Just like operating NetworkX, you can remove vertices by `remove_node`or `remove_nodes_from` interface, and remove edges by `remove_edge` or `remove_edges_from` interface." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# remove one vertex by `remove_node` interface\n", "G.remove_node(5)\n", "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# remove a batch of vertices by `remove_nodes_from` interface\n", "G.remove_nodes_from([4, 5])\n", "list(G.nodes)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# remove one edge by `remove_edge` interface\n", "G.remove_edge(1, 2)\n", "list(G.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# remove a batch of edges by `remove_edges_from` interface\n", "G.remove_edges_from([(1, 3), (2, 3)])\n", "list(G.edges)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# query the number of vertices after removal\n", "G.number_of_nodes()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# query the number of edges after removal\n", "G.number_of_edges()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Graph Analysis**\n", "\n", "The interface of graph analysis module in graphscope is also compatible with NetworkX.\n", "\n", "In following examples, we use `connected_components` to analyze the connected components of the graph, use `clustering` to get the clustering coefficient of each vertex, and `all_pairs_shortest_path` to compute the shortest path between any two vertices." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Building graph\n", "G = nx.Graph()\n", "G.add_edges_from([(1, 2), (1, 3)])\n", "G.add_node(4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run connected_components\n", "list(nx.connected_components(G))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run clustering\n", "nx.clustering(G)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run all_pairs_shortest_path\n", "sp = dict(nx.all_pairs_shortest_path(G))\n", "sp[3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Graph Display**\n", "\n", "Like NetworkX, you can draw a graph by `draw` interface, which relies on the drawing function of 'Matplotlib'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should install `matplotlib` first if you are not in playground environment.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip3 install matplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用 GraphScope 来进行简单地绘制图" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a star graph with 5 vertices\n", "G = nx.star_graph(5)\n", "\n", "# Sraw\n", "nx.draw(G, with_labels=True, font_weight='bold')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The performance speed-up of GraphScope over NetworkX can reach up to several orders of magnitudes.\n", "\n", "Let's see how much GraphScope improves the algorithm performance compared with NetworkX by a simple experiment.\n", "\n", "We run [clustering](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html#networkx.algorithms.cluster.clustering) algorithm on [twitter datasets](https://snap.stanford.edu/data/ego-Twitter.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download dataset if you are not in playground environment" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!wget https://raw.githubusercontent.com/GraphScope/gstest/master/twitter.e -P /tmp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loading dataset both in GraphScope and NetwrokX." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import graphscope.nx as gs_nx\n", "import networkx as nx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# loading graph in NetworkX\n", "g1 = nx.read_edgelist(\n", " os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=nx.Graph\n", ")\n", "type(g1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Loading graph in GraphScope\n", "g2 = gs_nx.read_edgelist(\n", " os.path.expandvars('/tmp/twitter.e'), nodetype=int, data=False, create_using=gs_nx.Graph\n", ")\n", "type(g2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run algorithm and display time both in GraphScope and NetworkX." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "# GraphScope\n", "ret_gs = gs_nx.clustering(g2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "# NetworkX\n", "ret_nx = nx.clustering(g1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Result comparison\n", "ret_gs == ret_nx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 5 }