{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Graph Analytical Algorithms\n", "\n", "Graph analytics is widely used in real world. Many algorithms, like community detection, paths and connectivity, centrality are proven to be very useful in various businesses. \n", "GraphScope ships with a set of built-in algorithms, enables users easily analysis their graph data.\n", "\n", "This tutorials demostrate how to use built-in algorithms to process analytics tasks.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install graphscope package if you are NOT in the Playground\n", "\n", "!pip3 install graphscope" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparation\n", "\n", "We start by loading a property graph into GraphScope.\n", "\n", "Here we take a peer-to-peer dataset derived from [Gnutella peer-to-peer network, August 31 2002](http://snap.stanford.edu/data/p2p-Gnutella31.html), with generated data on vertices and edges." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the graphscope module.\n", "\n", "import graphscope\n", "\n", "graphscope.set_option(show_log=False) # enable logging" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load p2p network dataset\n", "\n", "from graphscope.dataset import load_p2p_network\n", "\n", "graph = load_p2p_network(directed=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the schema of the graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(graph.schema)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown above, we loaded a property graph has vertices labeled \"host\", with 2 properties ( namely \"weight\" and \"id\"), and edges labeled \"knows\" with 3 properties ( namely, \"src_label_id\", \"dst_label_id\" and \"dist\")." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Project to Simple Graph\n", "\n", "Most graph analytical algorithms are defined on **simple graph**, which has only \n", "one kind of vertices and edges, edges and vertices have at most one property as its attribute.\n", "\n", "GraphScope provides a function `project` to convert a property graph to a simple graph, by selecting\n", " one kind of label for vertices/edges, and each with at most one of their properties." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "simple_graph = graph.project(vertices={\"host\": []}, edges={\"connect\": [\"dist\"]})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run Algorithms\n", "\n", "In the following sections, we will run several algorithms over the graph and inspect the result.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Single Source Shortest Path\n", "\n", "Algorithm `sssp` takes two arguments, a `graph`, and the `src` for query as source.\n", "\n", "In the example, we are quering the shortest path from source node id=6, over the projected subgraph in the previous step.\n", "\n", "Behind the scenes, the algorithm will codegen a compatible version for the loaded graph, and compile to a executable binary. During the process, some validation will be conduct. e.g., in this case, the sssp algorithm requires the graph has a int or double value on edges. It may take a bit longer to building the library. However, this step only take once for the same algorithm on a typed graph. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graphscope import sssp\n", "\n", "sssp_context = sssp(simple_graph, src=6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After the computation, the results are distributed on the vineyard instances on the cluster.\n", "The returned object is a `Context`, which has several methods to retrieve or persist the results.\n", "\n", "Please refer to [Context](https://graphscope.io/docs/reference/context.html) to get more details.\n", "\n", "In this case, the results represent the shortest distance from the source node. \n", "We use this to fetch the results and display with its vertex id." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sssp_context.to_dataframe(\n", " selector={\"id\": \"v.id\", \"dist\": \"r\"}, vertex_range={\"begin\": 1, \"end\": 10}\n", ").sort_values(by=\"id\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can save the results to local file system." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sssp_context.output_to_client(\"./sssp_result.csv\", selector={\"id\": \"v.id\", \"dist\": \"r\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may want to take a look at the file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!head ./sssp_result.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### PageRank\n", "\n", "PageRank may be the most famous and commonly used graph algorithm. \n", "Let's see how to run PageRank in GraphScope just in 2 lines." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graphscope import pagerank\n", "\n", "pr_context = pagerank(simple_graph, delta=0.85, max_round=10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pr_context.to_dataframe(\n", " selector={\"id\": \"v.id\", \"rank\": \"r\"}, vertex_range={\"begin\": 1, \"end\": 10}\n", ").sort_values(by=\"id\")" ] }, { "cell_type": "markdown", "metadata": { "jupyter": { "outputs_hidden": true } }, "source": [ "Save the results to local." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pr_context.output_to_client(\n", " \"./pagerank_result.csv\", selector={\"id\": \"v.id\", \"rank\": \"r\"}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Weakly Connected Components\n", "\n", "In graph theory, a component, sometimes called a connected component, of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph.\n", "\n", "Algorithm weakly connected components (WCC) determines the weakly connected component each vertex belongs to.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from graphscope import wcc\n", "\n", "wcc_context = wcc(simple_graph)\n", "wcc_context.to_dataframe(\n", " selector={\"id\": \"v.id\", \"cc\": \"r\"}, vertex_range={\"begin\": 1, \"end\": 10}\n", ").sort_values(by=\"id\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more analytical algorithms, please check [Builtin algorithms](https://graphscope.io/docs/analytics_engine.html#built-in-algorithms) and try something new.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }