{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unsupervised Graph Learning with GraphSage\n",
    "\n",
    "\n",
    "GraphScope provides the capability to process learning tasks. In this tutorial, we demonstrate how GraphScope trains a model with GraphSage.\n",
    "\n",
    "The task is link prediction, which estimates the probability of links between nodes in a graph.\n",
    "\n",
    "In this task, we use our implementation of GraphSAGE algorithm to build a model that predicts protein-protein links in the [PPI](https://humgenomics.biomedcentral.com/articles/10.1186/1479-7364-3-3-291) dataset. In which every node represents a protein. The task can be treated as a unsupervised link prediction on a homogeneous link network.\n",
    "\n",
    "In this task, GraphSage algorithm would compress both structural and attribute information in the graph into low-dimensional embedding vectors on each node. These embeddings can be further used to predict links between nodes.\n",
    "\n",
    "This tutorial has following steps:\n",
    "- Launching the learning engine and attaching to loaded graph.\n",
    "- Defining train process with builtin GraphSage model and hyper-parameters\n",
    "- Training and evaluating\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install graphscope package if you are NOT in the Playground\n",
    "\n",
    "!pip3 install graphscope\n",
    "!pip3 uninstall -y importlib_metadata  # Address an module conflict issue on colab.google. Remove this line if you are not on colab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the graphscope module.\n",
    "\n",
    "import graphscope\n",
    "\n",
    "graphscope.set_option(show_log=False)  # enable logging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load ppi dataset\n",
    "\n",
    "from graphscope.dataset import load_ppi\n",
    "\n",
    "graph = load_ppi()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Launch learning engine \n",
    "Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose all the properties prefix with \"feat-\" as the training features.\n",
    "\n",
    "With the featrue list, next we launch a learning engine with the [graphlearn](https://graphscope.io/docs/reference/session.html#graphscope.Session.graphlearn) method of graphscope.\n",
    "\n",
    "In this case, we specify the GCN training over \"protein\" nodes and \"link\" edges.\n",
    "\n",
    "With gen_labels, we take protein nodes as training set.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the features for learning\n",
    "paper_features = []\n",
    "for i in range(50):\n",
    "    paper_features.append(\"feat-\" + str(i))\n",
    "\n",
    "# launch a learning engine.\n",
    "lg = graphscope.graphlearn(\n",
    "    graph,\n",
    "    nodes=[(\"protein\", paper_features)],\n",
    "    edges=[(\"protein\", \"link\", \"protein\")],\n",
    "    gen_labels=[\n",
    "        (\"train\", \"protein\", 100, (0, 100)),\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "We use the builtin GraphSage model to define the training process.You can find more detail about all the builtin learning models on [Graph Learning Model](https://graphscope.io/docs/learning_engine.html#data-model)\n",
    "\n",
    "In the example, we use tensorflow as \"NN\" backend trainer.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from graphscope.learning.examples import GraphSage\n",
    "from graphscope.learning.graphlearn.python.model.tf.optimizer import get_tf_optimizer\n",
    "from graphscope.learning.graphlearn.python.model.tf.trainer import LocalTFTrainer\n",
    "\n",
    "# unsupervised GraphSage.\n",
    "\n",
    "\n",
    "def train(config, graph):\n",
    "    def model_fn():\n",
    "        return GraphSage(\n",
    "            graph,\n",
    "            config[\"class_num\"],\n",
    "            config[\"features_num\"],\n",
    "            config[\"batch_size\"],\n",
    "            categorical_attrs_desc=config[\"categorical_attrs_desc\"],\n",
    "            hidden_dim=config[\"hidden_dim\"],\n",
    "            in_drop_rate=config[\"in_drop_rate\"],\n",
    "            neighs_num=config[\"neighs_num\"],\n",
    "            hops_num=config[\"hops_num\"],\n",
    "            node_type=config[\"node_type\"],\n",
    "            edge_type=config[\"edge_type\"],\n",
    "            full_graph_mode=config[\"full_graph_mode\"],\n",
    "            unsupervised=config[\"unsupervised\"],\n",
    "        )\n",
    "\n",
    "    trainer = LocalTFTrainer(\n",
    "        model_fn,\n",
    "        epoch=config[\"epoch\"],\n",
    "        optimizer=get_tf_optimizer(\n",
    "            config[\"learning_algo\"], config[\"learning_rate\"], config[\"weight_decay\"]\n",
    "        ),\n",
    "    )\n",
    "    trainer.train()\n",
    "    embs = trainer.get_node_embedding()\n",
    "    np.save(config[\"emb_save_dir\"], embs)\n",
    "\n",
    "\n",
    "# define hyperparameters\n",
    "config = {\n",
    "    \"class_num\": 128,  # output dimension\n",
    "    \"features_num\": 50,\n",
    "    \"batch_size\": 512,\n",
    "    \"categorical_attrs_desc\": \"\",\n",
    "    \"hidden_dim\": 128,\n",
    "    \"in_drop_rate\": 0.5,\n",
    "    \"hops_num\": 2,\n",
    "    \"neighs_num\": [5, 5],\n",
    "    \"full_graph_mode\": False,\n",
    "    \"agg_type\": \"gcn\",  # mean, sum\n",
    "    \"learning_algo\": \"adam\",\n",
    "    \"learning_rate\": 0.01,\n",
    "    \"weight_decay\": 0.0005,\n",
    "    \"unsupervised\": True,\n",
    "    \"epoch\": 1,\n",
    "    \"emb_save_dir\": \"./id_emb\",\n",
    "    \"node_type\": \"protein\",\n",
    "    \"edge_type\": \"link\",\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run training process\n",
    "\n",
    "After define training process and hyperparameters,\n",
    "\n",
    "Now we can start the traning process with learning engine \"lg\" and the hyperparameters configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train(config, lg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}