{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Supervised Learning with GCN\n", "\n", "Graph neural networks (GNNs) combines superiority of both graph analytics and machine learning. \n", "GraphScope provides the capability to process learning tasks. In this tutorial, we demostrate \n", "how GraphScope trains a model with GCN.\n", "\n", "The learning task is node classification on a citation network. In this task, the algorithm has \n", "to determine the label of the nodes in [Cora](https://linqs.soe.ucsc.edu/data) dataset. \n", "The dataset consists of academic publications as the nodes and the citations between them as the links: if publication A cites publication B, then the graph has an edge from A to B. The nodes are classified into one of seven subjects, and our model will learn to predict this subject.\n", "\n", "In this task, we use Graph Convolution Network (GCN) to train the model. The core of the GCN neural network model is a \"graph convolution\" layer. This layer is similar to a conventional dense layer, augmented by the graph adjacency matrix to use information about a node's connections.\n", "\n", "This tutorial has the following steps:\n", "- Creating a session and loading graph\n", "- Launching learning engine and attaching the loaded graph.\n", "- Defining train process with builtin GCN model and config hyperparameters\n", "- Training and evaluating\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's create a session and load the dataset as a graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import graphscope\n", "\n", "k8s_volumes = {\n", " \"data\": {\n", " \"type\": \"hostPath\",\n", " \"field\": {\n", " \"path\": \"/testingdata\",\n", " \"type\": \"Directory\"\n", " },\n", " \"mounts\": {\n", " \"mountPath\": \"/home/jovyan/datasets\",\n", " \"readOnly\": True\n", " }\n", " }\n", "}\n", "\n", "# create session\n", "graphscope.set_option(show_log=True)\n", "sess = graphscope.session(k8s_volumes=k8s_volumes)\n", "\n", "# loading cora graph\n", "graph = sess.g()\n", "graph = graph.add_vertices(\"/home/jovyan/datasets/cora/node.csv\", \"paper\")\n", "graph = graph.add_edges(\"/home/jovyan/datasets/cora/edge.csv\", \"cites\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose all the properties prefix with \"feat_\" as the training features.\n", "\n", "With the featrue list, next we launch a learning engine with the `learning` method of session. \n", "(You may find the detail of the method on [Session](https://graphscope.io/docs/reference/session.html).) \n", "\n", "In this case, we specify the GCN training over `paper` nodes and `cites` edges.\n", "\n", "With `gen_labels`, we split the `paper` nodes into three parts, 75% are used as training set, 10% are used for validation and 15% used for testing.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# define the features for learning\n", "paper_features = []\n", "for i in range(1433):\n", " paper_features.append(\"feat_\" + str(i))\n", "\n", "# launch a learning engine.\n", "lg = sess.learning(graph, nodes=[(\"paper\", paper_features)],\n", " edges=[(\"paper\", \"cites\", \"paper\")],\n", " gen_labels=[\n", " (\"train\", \"paper\", 100, (0, 75)),\n", " (\"val\", \"paper\", 100, (75, 85)),\n", " (\"test\", \"paper\", 100, (85, 100))\n", " ])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the builtin GCN model to define the training process. You can find more detail about all the builtin learning models on [Graph Learning Model](https://graphscope.io/docs/learning_engine.html#data-model)\n", "\n", "In the example, we use tensorflow as NN backend trainer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "from graphscope.learning.examples import GCN\n", "from graphscope.learning.graphlearn.python.model.tf.trainer import LocalTFTrainer\n", "from graphscope.learning.graphlearn.python.model.tf.optimizer import get_tf_optimizer\n", "\n", "# supervised GCN.\n", "\n", "def train(config, graph):\n", " def model_fn():\n", " return GCN(\n", " graph,\n", " config[\"class_num\"],\n", " config[\"features_num\"],\n", " config[\"batch_size\"],\n", " val_batch_size=config[\"val_batch_size\"],\n", " test_batch_size=config[\"test_batch_size\"],\n", " categorical_attrs_desc=config[\"categorical_attrs_desc\"],\n", " hidden_dim=config[\"hidden_dim\"],\n", " in_drop_rate=config[\"in_drop_rate\"],\n", " neighs_num=config[\"neighs_num\"],\n", " hops_num=config[\"hops_num\"],\n", " node_type=config[\"node_type\"],\n", " edge_type=config[\"edge_type\"],\n", " full_graph_mode=config[\"full_graph_mode\"],\n", " )\n", " trainer = LocalTFTrainer(\n", " model_fn,\n", " epoch=config[\"epoch\"],\n", " optimizer=get_tf_optimizer(\n", " config[\"learning_algo\"], config[\"learning_rate\"], config[\"weight_decay\"]\n", " ),\n", " )\n", " trainer.train_and_evaluate()\n", "\n", "# define hyperparameters\n", "config = {\n", " \"class_num\": 7, # output dimension\n", " \"features_num\": 1433,\n", " \"batch_size\": 140,\n", " \"val_batch_size\": 300,\n", " \"test_batch_size\": 1000,\n", " \"categorical_attrs_desc\": \"\",\n", " \"hidden_dim\": 128,\n", " \"in_drop_rate\": 0.5,\n", " \"hops_num\": 2,\n", " \"neighs_num\": [5, 5],\n", " \"full_graph_mode\": False,\n", " \"agg_type\": \"gcn\", # mean, sum\n", " \"learning_algo\": \"adam\",\n", " \"learning_rate\": 0.01,\n", " \"weight_decay\": 0.0005,\n", " \"epoch\": 5,\n", " \"node_type\": \"paper\",\n", " \"edge_type\": \"cites\",\n", "}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After define training process and hyperparameters,\n", "\n", "Now we can start the traning process with learning engine `lg` and the hyperparameters configurations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train(config, lg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, don't forget to close the session." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sess.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }