{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Unsupervised Graph Learning with GraphSage\n", "\n", "\n", "GraphScope provides the capability to process learning tasks. In this tutorial, we demonstrate how GraphScope trains a model with GraphSage.\n", "\n", "The task is link prediction, which estimates the probability of links between nodes in a graph.\n", "\n", "In this task, we use our implementation of GraphSAGE algorithm to build a model that predicts protein-protein links in the [PPI](https://humgenomics.biomedcentral.com/articles/10.1186/1479-7364-3-3-291) dataset. In which every node represents a protein. The task can be treated as a unsupervised link prediction on a homogeneous link network.\n", "\n", "In this task, GraphSage algorithm would compress both structural and attribute information in the graph into low-dimensional embedding vectors on each node. These embeddings can be further used to predict links between nodes.\n", "\n", "This tutorial has following steps:\n", "- Launching the learning engine and attaching to loaded graph.\n", "- Defining train process with builtin GraphSage model and hyper-parameters\n", "- Training and evaluating\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install graphscope package if you are NOT in the Playground\n", "\n", "!pip3 install graphscope\n", "!pip3 uninstall -y importlib_metadata # Address an module conflict issue on colab.google. Remove this line if you are not on colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the graphscope module.\n", "\n", "import graphscope\n", "\n", "graphscope.set_option(show_log=False) # enable logging" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load ppi dataset\n", "\n", "from graphscope.dataset import load_ppi\n", "\n", "graph = load_ppi()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch learning engine \n", "Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose all the properties prefix with \"feat-\" as the training features.\n", "\n", "With the featrue list, next we launch a learning engine with the [graphlearn](https://graphscope.io/docs/reference/session.html#graphscope.Session.graphlearn) method of graphscope.\n", "\n", "In this case, we specify the GCN training over \"protein\" nodes and \"link\" edges.\n", "\n", "With gen_labels, we take protein nodes as training set.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define the features for learning\n", "paper_features = []\n", "for i in range(50):\n", " paper_features.append(\"feat-\" + str(i))\n", "\n", "# launch a learning engine.\n", "lg = graphscope.graphlearn(\n", " graph,\n", " nodes=[(\"protein\", paper_features)],\n", " edges=[(\"protein\", \"link\", \"protein\")],\n", " gen_labels=[\n", " (\"train\", \"protein\", 100, (0, 100)),\n", " ],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We use the builtin GraphSage model to define the training process.You can find more detail about all the builtin learning models on [Graph Learning Model](https://graphscope.io/docs/learning_engine.html#data-model)\n", "\n", "In the example, we use tensorflow as \"NN\" backend trainer.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from graphscope.learning.examples import GraphSage\n", "from graphscope.learning.graphlearn.python.model.tf.optimizer import get_tf_optimizer\n", "from graphscope.learning.graphlearn.python.model.tf.trainer import LocalTFTrainer\n", "\n", "# unsupervised GraphSage.\n", "\n", "\n", "def train(config, graph):\n", " def model_fn():\n", " return GraphSage(\n", " graph,\n", " config[\"class_num\"],\n", " config[\"features_num\"],\n", " config[\"batch_size\"],\n", " categorical_attrs_desc=config[\"categorical_attrs_desc\"],\n", " hidden_dim=config[\"hidden_dim\"],\n", " in_drop_rate=config[\"in_drop_rate\"],\n", " neighs_num=config[\"neighs_num\"],\n", " hops_num=config[\"hops_num\"],\n", " node_type=config[\"node_type\"],\n", " edge_type=config[\"edge_type\"],\n", " full_graph_mode=config[\"full_graph_mode\"],\n", " unsupervised=config[\"unsupervised\"],\n", " )\n", "\n", " trainer = LocalTFTrainer(\n", " model_fn,\n", " epoch=config[\"epoch\"],\n", " optimizer=get_tf_optimizer(\n", " config[\"learning_algo\"], config[\"learning_rate\"], config[\"weight_decay\"]\n", " ),\n", " )\n", " trainer.train()\n", " embs = trainer.get_node_embedding()\n", " np.save(config[\"emb_save_dir\"], embs)\n", "\n", "\n", "# define hyperparameters\n", "config = {\n", " \"class_num\": 128, # output dimension\n", " \"features_num\": 50,\n", " \"batch_size\": 512,\n", " \"categorical_attrs_desc\": \"\",\n", " \"hidden_dim\": 128,\n", " \"in_drop_rate\": 0.5,\n", " \"hops_num\": 2,\n", " \"neighs_num\": [5, 5],\n", " \"full_graph_mode\": False,\n", " \"agg_type\": \"gcn\", # mean, sum\n", " \"learning_algo\": \"adam\",\n", " \"learning_rate\": 0.01,\n", " \"weight_decay\": 0.0005,\n", " \"unsupervised\": True,\n", " \"epoch\": 1,\n", " \"emb_save_dir\": \"./id_emb\",\n", " \"node_type\": \"protein\",\n", " \"edge_type\": \"link\",\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run training process\n", "\n", "After define training process and hyperparameters,\n", "\n", "Now we can start the traning process with learning engine \"lg\" and the hyperparameters configurations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train(config, lg)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }