{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting started with EpiGraphDB in Python\n",
    "\n",
    "This notebook is provided as a brief introductory guide to working with the EpiGraphDB platform through Python. Here we will demonstrate a few basic operations that can be carried out using the platform, but for more advanced methods please refer to the [API endpoint documentation](http://docs.epigraphdb.org/api/api-endpoints/).\n",
    "\n",
    "A Python wrapper for EpiGraphDB's API is currently in the works, but for now we will be querying it directly using the `requests` library- knowledge of this package is advantageous but not essential."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we will ping the API to check our connection:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "If this line gets printed, ping was sucessful.\n"
     ]
    }
   ],
   "source": [
    "# Store our API URL as a string for future use\n",
    "API_URL = \"https://api.epigraphdb.org\"\n",
    "\n",
    "# Here we use the .get() method to send a GET request to the /ping endpoint of the API\n",
    "endpoint = '/ping'\n",
    "response_object = requests.get(API_URL + endpoint)  \n",
    "\n",
    "# Check that the ping was sucessful\n",
    "response_object.raise_for_status() \n",
    "print(\"If this line gets printed, ping was sucessful.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## 1. Using EpiGraphDB to obtain biological mappings\n",
    "\n",
    "In this first section, we will take an arbitrary list of genes and query the EpiGraph API to find the proteins that they map to. We will be using the `POST` HTTP method which requires its parameters to be passed in JSON format, a conversion that is easy to do using the `json` library. To find the correct names of the parameters that we are about to set, we can navigate to the [EpiGraphDB API documentation](http://docs.epigraphdb.org/api/api-endpoints/) and find the endpoint of interest. From there we simply read off the parameters that we want to pass, and can take a look at the example request as a reference point if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gene.name</th>\n",
       "      <th>gene.ensembl_id</th>\n",
       "      <th>protein.uniprot_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TP53</td>\n",
       "      <td>ENSG00000141510</td>\n",
       "      <td>P04637</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>BRCA1</td>\n",
       "      <td>ENSG00000012048</td>\n",
       "      <td>P38398</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>TNF</td>\n",
       "      <td>ENSG00000232810</td>\n",
       "      <td>P01375</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  gene.name  gene.ensembl_id protein.uniprot_id\n",
       "0      TP53  ENSG00000141510             P04637\n",
       "1     BRCA1  ENSG00000012048             P38398\n",
       "2       TNF  ENSG00000232810             P01375"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1.1 Mapping genes to proteins\n",
    "\n",
    "# Set parameters and convert to JSON format\n",
    "import json\n",
    "params = {\n",
    "  \"gene_name_list\": [\n",
    "    \"TP53\",\n",
    "    \"BRCA1\", \n",
    "    \"TNF\"\n",
    "  ]\n",
    "}\n",
    "json_params = json.dumps(params)\n",
    "\n",
    "# Define which endpoint of the API we would like to connect with\n",
    "endpoint = '/mappings/gene-to-protein'\n",
    "\n",
    "# Send the POST request\n",
    "response_object = requests.post(API_URL + endpoint, data=json_params)\n",
    "\n",
    "# Check for successful request\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Store results in a pandas dataframe\n",
    "import pandas as pd\n",
    "results = response_object.json()['results']\n",
    "gene_protein_df = pd.json_normalize(results)\n",
    "\n",
    "gene_protein_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above cell, we queried EpiGraphDB for the proteins that have been mapped to the genes *TP53*, *BRCA1*, and *TNF*. Our query went through successfully and we received an associated protein for each. The columns in our output dataframe take the general form `entity.property` and this will remain consistent throughout this notebook. \n",
    "\n",
    "Specific descriptions for the properties of each entity can be found in EpiGraphDB's [data dictionary](https://docs.epigraphdb.org/graph-database/meta-nodes/). Simply click on the relevant entity in the table of contents  on the right hand side (or scroll down to the relevant section), then locate the property of interest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>uniprot_id</th>\n",
       "      <th>pathway_count</th>\n",
       "      <th>pathway_reactome_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>P04637</td>\n",
       "      <td>5</td>\n",
       "      <td>[R-HSA-6785807, R-HSA-390471, R-HSA-5689896, R...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>P38398</td>\n",
       "      <td>6</td>\n",
       "      <td>[R-HSA-6796648, R-HSA-1221632, R-HSA-8953750, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>P01375</td>\n",
       "      <td>3</td>\n",
       "      <td>[R-HSA-6785807, R-HSA-6783783, R-HSA-5357905]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  uniprot_id  pathway_count                                pathway_reactome_id\n",
       "0     P04637              5  [R-HSA-6785807, R-HSA-390471, R-HSA-5689896, R...\n",
       "1     P38398              6  [R-HSA-6796648, R-HSA-1221632, R-HSA-8953750, ...\n",
       "2     P01375              3      [R-HSA-6785807, R-HSA-6783783, R-HSA-5357905]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1.2 Proteins to pathways\n",
    "\n",
    "# As above, this is another POST request, so we need our data in JSON format\n",
    "json_params = json.dumps({\n",
    "  \"uniprot_id_list\": list(gene_protein_df['protein.uniprot_id'].values)\n",
    "})\n",
    "\n",
    "# Send the request\n",
    "endpoint = '/protein/in-pathway'\n",
    "response_object = requests.post(API_URL + endpoint, data=json_params)\n",
    "\n",
    "# Check for successful request\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Store results\n",
    "results = response_object.json()['results']\n",
    "protein_pathway_df = pd.json_normalize(results)\n",
    "\n",
    "protein_pathway_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above, we took the proteins that had been mapped to our genes of interest and queried the platform for their associated pathway data. The API found multiple such pathways for each gene and has returned the respective reactome IDs to us as lists.\n",
    "\n",
    "\n",
    "It is worth noting here that so far we have only been accessing the `'results'` key in the nested dictionairy returned by the `.json()` method of our response object. The other available key is `'metadata'` (see the output below) which provides us with information about the request itself, including the specific Cypher query that the platform ran to get these results. If you would like to know more about the use of Cypher in these requests, there is a section dedicated to this at the end of this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'empty_results': False,\n",
      " 'query': 'MATCH p=(protein:Protein)-[r:PROTEIN_IN_PATHWAY]-(pathway:Pathway) '\n",
      "          \"WHERE protein.uniprot_id IN ['P04637', 'P38398', 'P01375'] RETURN \"\n",
      "          'protein.uniprot_id AS uniprot_id, count(p) AS pathway_count, '\n",
      "          'collect(pathway.reactome_id) AS pathway_reactome_id',\n",
      " 'total_seconds': 0.005797}\n"
     ]
    }
   ],
   "source": [
    "from pprint import pprint\n",
    "metadata = response_object.json()['metadata']\n",
    "\n",
    "pprint(metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## 2. Epidemiological relationship analysis\n",
    "\n",
    "In the cell below, we will query EpiGraphDB to get metadata relating to GWAS studies of a target trait- body mass index. Following that, queries will be performed to get pre-computed Mendelian Randomisation (MR) results involving the same trait. \n",
    "\n",
    "Here we will be using a different HTTP method than before- the `GET` method, which is in fact easier to use in Python because the parameters can be passed directly as a dictionary. To learn more about the differences between `GET` and `POST`, please see [this guide](https://www.w3schools.com/tags/ref_httpmethods.asp). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>node.note</th>\n",
       "      <th>node.access</th>\n",
       "      <th>node.year</th>\n",
       "      <th>node.mr</th>\n",
       "      <th>node.author</th>\n",
       "      <th>node.consortium</th>\n",
       "      <th>node.sex</th>\n",
       "      <th>node.priority</th>\n",
       "      <th>node.pmid</th>\n",
       "      <th>node.population</th>\n",
       "      <th>node.unit</th>\n",
       "      <th>node.sample_size</th>\n",
       "      <th>node.nsnp</th>\n",
       "      <th>node.trait</th>\n",
       "      <th>node.id</th>\n",
       "      <th>node.subcategory</th>\n",
       "      <th>node.category</th>\n",
       "      <th>node.sd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NA</td>\n",
       "      <td>public</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>Hoffmann TJ</td>\n",
       "      <td>NA</td>\n",
       "      <td>NA</td>\n",
       "      <td>0</td>\n",
       "      <td>30108127</td>\n",
       "      <td>European</td>\n",
       "      <td>NA</td>\n",
       "      <td>315347</td>\n",
       "      <td>27854527</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ebi-a-GCST006368</td>\n",
       "      <td>NA</td>\n",
       "      <td>NA</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>public</td>\n",
       "      <td>2015</td>\n",
       "      <td>1</td>\n",
       "      <td>Locke AE</td>\n",
       "      <td>NA</td>\n",
       "      <td>Males and Females</td>\n",
       "      <td>1</td>\n",
       "      <td>25673413</td>\n",
       "      <td>Mixed</td>\n",
       "      <td>NA</td>\n",
       "      <td>339224</td>\n",
       "      <td>2555511</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ieu-a-2</td>\n",
       "      <td>Anthropometric</td>\n",
       "      <td>Risk factor</td>\n",
       "      <td>4.77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>public</td>\n",
       "      <td>2015</td>\n",
       "      <td>1</td>\n",
       "      <td>Locke AE</td>\n",
       "      <td>NA</td>\n",
       "      <td>Males</td>\n",
       "      <td>2</td>\n",
       "      <td>25673413</td>\n",
       "      <td>European</td>\n",
       "      <td>NA</td>\n",
       "      <td>152893</td>\n",
       "      <td>2477659</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ieu-a-785</td>\n",
       "      <td>Anthropometric</td>\n",
       "      <td>Risk factor</td>\n",
       "      <td>4.77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>public</td>\n",
       "      <td>2015</td>\n",
       "      <td>1</td>\n",
       "      <td>Locke AE</td>\n",
       "      <td>NA</td>\n",
       "      <td>Males and Females</td>\n",
       "      <td>3</td>\n",
       "      <td>25673413</td>\n",
       "      <td>European</td>\n",
       "      <td>NA</td>\n",
       "      <td>322154</td>\n",
       "      <td>2554668</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ieu-a-835</td>\n",
       "      <td>Anthropometric</td>\n",
       "      <td>Risk factor</td>\n",
       "      <td>4.77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NA</td>\n",
       "      <td>public</td>\n",
       "      <td>2017</td>\n",
       "      <td>1</td>\n",
       "      <td>Akiyama M</td>\n",
       "      <td>NA</td>\n",
       "      <td>NA</td>\n",
       "      <td>0</td>\n",
       "      <td>28892062</td>\n",
       "      <td>East Asian</td>\n",
       "      <td>NA</td>\n",
       "      <td>158284</td>\n",
       "      <td>5952516</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ebi-a-GCST004904</td>\n",
       "      <td>NA</td>\n",
       "      <td>NA</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  node.note node.access node.year node.mr  node.author node.consortium  \\\n",
       "0        NA      public      2018       1  Hoffmann TJ              NA   \n",
       "1       NaN      public      2015       1     Locke AE              NA   \n",
       "2       NaN      public      2015       1     Locke AE              NA   \n",
       "3       NaN      public      2015       1     Locke AE              NA   \n",
       "4        NA      public      2017       1    Akiyama M              NA   \n",
       "\n",
       "            node.sex node.priority node.pmid node.population node.unit  \\\n",
       "0                 NA             0  30108127        European        NA   \n",
       "1  Males and Females             1  25673413           Mixed        NA   \n",
       "2              Males             2  25673413        European        NA   \n",
       "3  Males and Females             3  25673413        European        NA   \n",
       "4                 NA             0  28892062      East Asian        NA   \n",
       "\n",
       "  node.sample_size node.nsnp       node.trait           node.id  \\\n",
       "0           315347  27854527  Body mass index  ebi-a-GCST006368   \n",
       "1           339224   2555511  Body mass index           ieu-a-2   \n",
       "2           152893   2477659  Body mass index         ieu-a-785   \n",
       "3           322154   2554668  Body mass index         ieu-a-835   \n",
       "4           158284   5952516  Body mass index  ebi-a-GCST004904   \n",
       "\n",
       "  node.subcategory node.category node.sd  \n",
       "0               NA            NA     NaN  \n",
       "1   Anthropometric   Risk factor    4.77  \n",
       "2   Anthropometric   Risk factor    4.77  \n",
       "3   Anthropometric   Risk factor    4.77  \n",
       "4               NA            NA     NaN  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 2.1 Getting GWAS studies from EpiGraphDB\n",
    "\n",
    "# Create a dictionary for the parameters to be passed\n",
    "params = {\n",
    "    'name':'Body mass index'\n",
    "}\n",
    "\n",
    "# Send the request\n",
    "endpoint = '/meta/nodes/Gwas/search'\n",
    "response_object = requests.get(API_URL + endpoint, params=params)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Store the results of the query and display\n",
    "result = response_object.json()['results']\n",
    "gwas_df = pd.json_normalize(result)\n",
    "\n",
    "gwas_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>exposure.id</th>\n",
       "      <th>exposure.trait</th>\n",
       "      <th>outcome.id</th>\n",
       "      <th>outcome.trait</th>\n",
       "      <th>mr.b</th>\n",
       "      <th>mr.se</th>\n",
       "      <th>mr.pval</th>\n",
       "      <th>mr.method</th>\n",
       "      <th>mr.selection</th>\n",
       "      <th>mr.moescore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ieu-a-2</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ukb-a-74</td>\n",
       "      <td>Non-cancer illness code  self-reported: diabetes</td>\n",
       "      <td>0.034559</td>\n",
       "      <td>0.002418</td>\n",
       "      <td>0.0</td>\n",
       "      <td>FE IVW</td>\n",
       "      <td>DF</td>\n",
       "      <td>0.93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ieu-a-2</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ukb-a-388</td>\n",
       "      <td>Hip circumference</td>\n",
       "      <td>0.724105</td>\n",
       "      <td>0.026588</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Simple median</td>\n",
       "      <td>Tophits</td>\n",
       "      <td>0.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ieu-a-2</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ukb-a-382</td>\n",
       "      <td>Waist circumference</td>\n",
       "      <td>0.656440</td>\n",
       "      <td>0.024496</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Simple median</td>\n",
       "      <td>Tophits</td>\n",
       "      <td>0.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ieu-a-2</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ukb-a-35</td>\n",
       "      <td>Comparative height size at age 10</td>\n",
       "      <td>0.136684</td>\n",
       "      <td>0.007909</td>\n",
       "      <td>0.0</td>\n",
       "      <td>FE IVW</td>\n",
       "      <td>Tophits</td>\n",
       "      <td>0.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ieu-a-2</td>\n",
       "      <td>Body mass index</td>\n",
       "      <td>ukb-a-34</td>\n",
       "      <td>Comparative body size at age 10</td>\n",
       "      <td>0.365580</td>\n",
       "      <td>0.023556</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Simple median</td>\n",
       "      <td>HF</td>\n",
       "      <td>0.87</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  exposure.id   exposure.trait outcome.id  \\\n",
       "0     ieu-a-2  Body mass index   ukb-a-74   \n",
       "1     ieu-a-2  Body mass index  ukb-a-388   \n",
       "2     ieu-a-2  Body mass index  ukb-a-382   \n",
       "3     ieu-a-2  Body mass index   ukb-a-35   \n",
       "4     ieu-a-2  Body mass index   ukb-a-34   \n",
       "\n",
       "                                      outcome.trait      mr.b     mr.se  \\\n",
       "0  Non-cancer illness code  self-reported: diabetes  0.034559  0.002418   \n",
       "1                                 Hip circumference  0.724105  0.026588   \n",
       "2                               Waist circumference  0.656440  0.024496   \n",
       "3                 Comparative height size at age 10  0.136684  0.007909   \n",
       "4                   Comparative body size at age 10  0.365580  0.023556   \n",
       "\n",
       "   mr.pval      mr.method mr.selection  mr.moescore  \n",
       "0      0.0         FE IVW           DF         0.93  \n",
       "1      0.0  Simple median      Tophits         0.95  \n",
       "2      0.0  Simple median      Tophits         0.94  \n",
       "3      0.0         FE IVW      Tophits         0.94  \n",
       "4      0.0  Simple median           HF         0.87  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 2.2 Getting MR results for a trait\n",
    "\n",
    "# Set parameters\n",
    "params = {'exposure_trait': 'Body mass index',\n",
    "          'pval_threshold': 1e-10}\n",
    "\n",
    "# Send request\n",
    "endpoint = '/mr'\n",
    "response_object = requests.get(API_URL + endpoint, params=params)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Store and display results\n",
    "result = response_object.json()['results']\n",
    "BMI_MR_df = pd.json_normalize(result) \n",
    "\n",
    "BMI_MR_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataframe above displays the results of our query. We requested all traits for which an MR analysis using body mass index as the exposure variable returned a causal estimate with a p-value lower than 1e-10. Information regarding the specific MR parameters, as well as the exposure and outcome variables, has been displayed in the table for all traits that matched our search conditions.\n",
    "\n",
    "In the parameters we set in 2.2, another viable parameter name is `'outcome_trait'` which takes the same type of values as `'exposure_trait'`. Either one or both of these parameters can be passed during an MR query, which allows users to refine which results are returned to them depending on their own analytical preferences."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## 3. Looking for literature evidence\n",
    "\n",
    "Accessing information in the literature is a ubiquitous task in research, be it for novel hypothesis generation or as part of evidence triangulation. EpiGraphDB facilitates fast processing of this information by allowing access to a host of literature-mined relationships that have been structured into semantic triples. These take the general form *(subject, predicate, object)* and have been generated using contemporary natural language processing techniques applied to a massive amount of published biomedical research papers. In the following section we will query the API for the literature relationship between a given gene and an outcome trait."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pubmed_id</th>\n",
       "      <th>gene.name</th>\n",
       "      <th>st.predicate</th>\n",
       "      <th>st.object_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[17484863, 21155887]</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>NEG_ASSOCIATED_WITH</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[27852544]</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>AFFECTS</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[17484863, 19575361, 19496308, 18383521, 18341...</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>ASSOCIATED_WITH</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[23131344]</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>PREDISPOSES</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                           pubmed_id gene.name  \\\n",
       "0                               [17484863, 21155887]     IL23R   \n",
       "1                                         [27852544]     IL23R   \n",
       "2  [17484863, 19575361, 19496308, 18383521, 18341...     IL23R   \n",
       "3                                         [23131344]     IL23R   \n",
       "\n",
       "          st.predicate               st.object_name  \n",
       "0  NEG_ASSOCIATED_WITH  Inflammatory Bowel Diseases  \n",
       "1              AFFECTS  Inflammatory Bowel Diseases  \n",
       "2      ASSOCIATED_WITH  Inflammatory Bowel Diseases  \n",
       "3          PREDISPOSES  Inflammatory Bowel Diseases  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Establish parameters\n",
    "params = {\n",
    "    'gene_name': \"IL23R\",\n",
    "    'object_name': \"Inflammatory bowel disease\"\n",
    "}\n",
    "\n",
    "# Send the request\n",
    "endpoint = \"/literature/gene\"\n",
    "response_object = requests.get(API_URL + endpoint, params=params)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Store the results of the query and display\n",
    "result = response_object.json()['results']\n",
    "lit_df = pd.json_normalize(result) \n",
    "\n",
    "lit_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataframe outputted above shows the results of our query- four unique predicates were found between the gene *IL23R* and the trait *Inflammatory bowel disease* and are displayed in the `st.predicate` column. Our leftmost column contains the pubmed IDs of the papers from which this triple was derived. These IDs allow us to access the respective papers by navigating to `https://pubmed.ncbi.nlm.nih.gov/*insert_pubmed_id_here*`. In this particular case it seems that *ASSOCIATED_WITH* is the most common predicate linking our gene to the trait, but we can't see exactly how many papers there are due to how pandas displays lists. Let's add a paper count to the dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pubmed_id</th>\n",
       "      <th>gene.name</th>\n",
       "      <th>st.predicate</th>\n",
       "      <th>st.object_name</th>\n",
       "      <th>publication_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[17484863, 21155887]</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>NEG_ASSOCIATED_WITH</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[27852544]</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>AFFECTS</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[17484863, 19575361, 19496308, 18383521, 18341...</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>ASSOCIATED_WITH</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[23131344]</td>\n",
       "      <td>IL23R</td>\n",
       "      <td>PREDISPOSES</td>\n",
       "      <td>Inflammatory Bowel Diseases</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                           pubmed_id gene.name  \\\n",
       "0                               [17484863, 21155887]     IL23R   \n",
       "1                                         [27852544]     IL23R   \n",
       "2  [17484863, 19575361, 19496308, 18383521, 18341...     IL23R   \n",
       "3                                         [23131344]     IL23R   \n",
       "\n",
       "          st.predicate               st.object_name  publication_count  \n",
       "0  NEG_ASSOCIATED_WITH  Inflammatory Bowel Diseases                  2  \n",
       "1              AFFECTS  Inflammatory Bowel Diseases                  1  \n",
       "2      ASSOCIATED_WITH  Inflammatory Bowel Diseases                 21  \n",
       "3          PREDISPOSES  Inflammatory Bowel Diseases                  1  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "counts = [len(papers_list) for papers_list in lit_df['pubmed_id']]\n",
    "lit_df['publication_count'] = counts\n",
    "\n",
    "lit_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-----\n",
    "## 4. EpiGraphDB node search\n",
    "\n",
    "EpiGraphDB stores data as nodes (entities) and edges (relationships) of a wide range of types. The `/meta` endpoints of the API offer us information about the structure of the graph itself- for example, the available classes of nodes can be listed through the `/meta/nodes/list` endpoint. Let’s do that now:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Gwas',\n",
       " 'Disease',\n",
       " 'Drug',\n",
       " 'Efo',\n",
       " 'Event',\n",
       " 'Gene',\n",
       " 'Tissue',\n",
       " 'Literature',\n",
       " 'Pathway',\n",
       " 'Protein',\n",
       " 'SemmedTerm',\n",
       " 'Variant']"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 4.1 Getting a list of available meta-nodes\n",
    "\n",
    "# Send the request\n",
    "endpoint = \"/meta/nodes/list\"\n",
    "response_object = requests.get(API_URL + endpoint)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Store the results of the query and display\n",
    "result = response_object.json()\n",
    "\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This list above corresponds to EpiGraphDB's meta nodes, whose documentation can be found [here](https://docs.epigraphdb.org/graph-database/meta-nodes/) along with their available properties.\n",
    "\n",
    "In the following, we will demonstrate how we can search by name for a node of interest, using the endpoint `/meta/nodes/{meta_node}/search`, where viable values for `{meta_node}` are those terms listed above.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>node.trait</th>\n",
       "      <th>node.id</th>\n",
       "      <th>node.sample_size</th>\n",
       "      <th>node.year</th>\n",
       "      <th>node.author</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Breast cancer</td>\n",
       "      <td>ebi-a-GCST007236</td>\n",
       "      <td>89677</td>\n",
       "      <td>2015</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Breast cancer</td>\n",
       "      <td>ebi-a-GCST004988</td>\n",
       "      <td>139274</td>\n",
       "      <td>2017</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Breast cancer (Combined Oncoarray; iCOGS; GWAS...</td>\n",
       "      <td>ieu-a-1126</td>\n",
       "      <td>228951</td>\n",
       "      <td>2017</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Breast cancer (GWAS)</td>\n",
       "      <td>ieu-a-1131</td>\n",
       "      <td>32498</td>\n",
       "      <td>2017</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Breast cancer (GWAS)</td>\n",
       "      <td>ieu-a-1168</td>\n",
       "      <td>33832</td>\n",
       "      <td>2015</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Breast cancer (Oncoarray)</td>\n",
       "      <td>ieu-a-1129</td>\n",
       "      <td>106776</td>\n",
       "      <td>2017</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Breast cancer (Survival)</td>\n",
       "      <td>ieu-a-1165</td>\n",
       "      <td>37954</td>\n",
       "      <td>2015</td>\n",
       "      <td>Guo Q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Breast cancer (iCOGS)</td>\n",
       "      <td>ieu-a-1162</td>\n",
       "      <td>89677</td>\n",
       "      <td>2015</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Breast cancer (iCOGS)</td>\n",
       "      <td>ieu-a-1130</td>\n",
       "      <td>89677</td>\n",
       "      <td>2017</td>\n",
       "      <td>Michailidou K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Breast cancer anti-estrogen resistance protein 3</td>\n",
       "      <td>prot-a-234</td>\n",
       "      <td>3301</td>\n",
       "      <td>2018</td>\n",
       "      <td>Sun BB</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                          node.trait           node.id  \\\n",
       "0                                      Breast cancer  ebi-a-GCST007236   \n",
       "1                                      Breast cancer  ebi-a-GCST004988   \n",
       "2  Breast cancer (Combined Oncoarray; iCOGS; GWAS...        ieu-a-1126   \n",
       "3                               Breast cancer (GWAS)        ieu-a-1131   \n",
       "4                               Breast cancer (GWAS)        ieu-a-1168   \n",
       "5                          Breast cancer (Oncoarray)        ieu-a-1129   \n",
       "6                           Breast cancer (Survival)        ieu-a-1165   \n",
       "7                              Breast cancer (iCOGS)        ieu-a-1162   \n",
       "8                              Breast cancer (iCOGS)        ieu-a-1130   \n",
       "9   Breast cancer anti-estrogen resistance protein 3        prot-a-234   \n",
       "\n",
       "  node.sample_size node.year    node.author  \n",
       "0            89677      2015  Michailidou K  \n",
       "1           139274      2017  Michailidou K  \n",
       "2           228951      2017  Michailidou K  \n",
       "3            32498      2017  Michailidou K  \n",
       "4            33832      2015  Michailidou K  \n",
       "5           106776      2017  Michailidou K  \n",
       "6            37954      2015          Guo Q  \n",
       "7            89677      2015  Michailidou K  \n",
       "8            89677      2017  Michailidou K  \n",
       "9             3301      2018         Sun BB  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 4.2 Searching for specific entities by name\n",
    "\n",
    "# Set params \n",
    "params = {\n",
    "    'name': 'breast cancer'\n",
    "}\n",
    "\n",
    "# Make request\n",
    "meta_node = 'Gwas'\n",
    "endpoint = f\"/meta/nodes/{meta_node}/search\"\n",
    "response_object = requests.get(API_URL + endpoint, params=params)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Convert to pandas\n",
    "results = pd.json_normalize(response_object.json()['results'])\n",
    "target_node_id = results['node.id'][3]  # Store one ID for use in the next cell\n",
    "\n",
    "results[['node.trait', 'node.id', 'node.sample_size', 'node.year', 'node.author']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above we used the `name` parameter of the endpoint to search for any GWAS nodes that fuzzily matched our specified string. Several did, and some of their basic node properties are displayed above. Fuzzy matching is useful because you don't need to know the exact name of the entity or its ID in order to look it up. \n",
    "\n",
    "On the other hand, once you have identified your entity of interest, it is often sensible to move forward using the node's ID for the sake of unambiguity. Fortunately we can also search for traits using their ID, as demonstrated below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>node.ncase</th>\n",
       "      <th>node.access</th>\n",
       "      <th>node.year</th>\n",
       "      <th>node.mr</th>\n",
       "      <th>node.author</th>\n",
       "      <th>node.consortium</th>\n",
       "      <th>node.sex</th>\n",
       "      <th>node.priority</th>\n",
       "      <th>node.pmid</th>\n",
       "      <th>node.population</th>\n",
       "      <th>node.unit</th>\n",
       "      <th>node.sample_size</th>\n",
       "      <th>node.nsnp</th>\n",
       "      <th>node.ncontrol</th>\n",
       "      <th>node.trait</th>\n",
       "      <th>node.id</th>\n",
       "      <th>node.subcategory</th>\n",
       "      <th>node.category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>14910</td>\n",
       "      <td>public</td>\n",
       "      <td>2017</td>\n",
       "      <td>1</td>\n",
       "      <td>Michailidou K</td>\n",
       "      <td>NA</td>\n",
       "      <td>Females</td>\n",
       "      <td>1</td>\n",
       "      <td>29059683</td>\n",
       "      <td>European</td>\n",
       "      <td>NA</td>\n",
       "      <td>32498</td>\n",
       "      <td>10680257</td>\n",
       "      <td>17588</td>\n",
       "      <td>Breast cancer (GWAS)</td>\n",
       "      <td>ieu-a-1131</td>\n",
       "      <td>Cancer</td>\n",
       "      <td>Disease</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  node.ncase node.access node.year node.mr    node.author node.consortium  \\\n",
       "0      14910      public      2017       1  Michailidou K              NA   \n",
       "\n",
       "  node.sex node.priority node.pmid node.population node.unit node.sample_size  \\\n",
       "0  Females             1  29059683        European        NA            32498   \n",
       "\n",
       "  node.nsnp node.ncontrol            node.trait     node.id node.subcategory  \\\n",
       "0  10680257         17588  Breast cancer (GWAS)  ieu-a-1131           Cancer   \n",
       "\n",
       "  node.category  \n",
       "0       Disease  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 4.3 Searching for a node by ID\n",
    "\n",
    "# Set params\n",
    "params = {\n",
    "    'id': target_node_id  # From previous cell\n",
    "}\n",
    "\n",
    "# Make request\n",
    "meta_node = 'Gwas'\n",
    "endpoint = f\"/meta/nodes/{meta_node}/search\"\n",
    "response_object = requests.get(API_URL + endpoint, params=params)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Convert to pandas\n",
    "results = pd.json_normalize(response_object.json()['results'])\n",
    "\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "## Advanced examples- Cypher\n",
    "\n",
    "Until now, to get information from the platform we have been simply creating a dictionary or JSON object containing our parameters and then sending it to the correct endpoint of the API using the `requests` library. This is fine practice and the API has been designed specifically to allow this method of use, as we have (inexhaustively) demonstrated above. It works because the API automatically converts the HTTP requests that it receives into a Cypher query, which it then passes to the Neo4j database on which EpiGraphDB is built. The database passes back the result of the query, which is then returned to us in Python as a response object. Each response object contains metadata that includes the exact Cypher query that was called on the database, as shown in the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MATCH (gene:Gene)-[gp:GENE_TO_PROTEIN]-(protein:Protein) WHERE gene.name IN ['TP53'] RETURN gene {.ensembl_id, .name}, protein {.uniprot_id}\n"
     ]
    }
   ],
   "source": [
    "# 4.1 Cypher\n",
    "\n",
    "params = {\n",
    "  \"gene_name_list\": [\n",
    "    \"TP53\"\n",
    "  ]\n",
    "}\n",
    "json_params = json.dumps(params)\n",
    "endpoint = '/mappings/gene-to-protein'\n",
    "response_object = requests.post(API_URL + endpoint, data=json_params)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Extract and print the Cypher query\n",
    "cypher_query = response_object.json()['metadata']['query']\n",
    "print(cypher_query)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The text printed above is the exact Cypher query that was run in section 1.1, behind the scenes. The basic structure of these queries is as follows:\n",
    "\n",
    "                                            MATCH subgraph\n",
    "\n",
    "                                            WHERE condition\n",
    "\n",
    "                                            RETURN data\n",
    "\n",
    "Note that the subgraph should take this general form: *(node)-[relationship]-(node)*, but for both nodes and relationships we write them as `my_variable_name:Meta_node` so that we can access their properties through the variable name we assigned them (my_variable_name), and use those properties to define our conditions and what data we want returned. Information on the available properties for each class of entity can be found in EpiGraphDB's documentation, specifically [here for nodes](https://docs.epigraphdb.org/graph-database/meta-nodes/) and [here for relationships](https://docs.epigraphdb.org/graph-database/meta-relationships/).\n",
    "\n",
    "Now let's write and send our own basic query to get traits with high genetic correlation to body mass index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "EOL while scanning string literal (<ipython-input-14-bf982b5ed6c6>, line 7)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-14-bf982b5ed6c6>\"\u001b[0;36m, line \u001b[0;32m7\u001b[0m\n\u001b[0;31m    cypher_query += ' WHERE trait1.trait = \"Body mass index (BMI)\"\u001b[0m\n\u001b[0m                                                                   ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m EOL while scanning string literal\n"
     ]
    }
   ],
   "source": [
    "# 4.2 Writing custom Cypher queries\n",
    "\n",
    "# Define the target subgraph\n",
    "cypher_query = 'MATCH (trait1:Gwas)-[corr:BN_GEN_COR]-(trait2:Gwas)'\n",
    "\n",
    "# Add conditions to the query\n",
    "cypher_query += ' WHERE trait1.trait = \"Body mass index (BMI)\" \n",
    "cypher_query += ' AND corr.rg > 0.9'\n",
    "\n",
    "# Add which data we want returned\n",
    "cypher_query += ' RETURN trait1, trait2, corr {.rg, .p}'\n",
    "\n",
    "# Put our query into the correct format for a POST request\n",
    "params = json.dumps({\n",
    "    'query': cypher_query\n",
    "})\n",
    "\n",
    "# Define the target endpoint and send the request\n",
    "endpoint = '/cypher'\n",
    "response_object = requests.post(API_URL + endpoint, data=params)\n",
    "response_object.raise_for_status()\n",
    "\n",
    "# Display the returned data\n",
    "results = response_object.json()['results']\n",
    "results_df = pd.json_normalize(results)\n",
    "\n",
    "results_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In our Cypher query, we grabbed a subgraph from the database that comprised nodes representing biomedical traits, with edges between them representing their genetic correlation. The subgraph was then filtered to select any node-edge-node triples where the first node had the `.trait` property of \"Body mass index (BMI)\", and where the edge between the nodes had a `.rg` (genetic correlation score) value greater than 0.9. We then asked Neo4j to return us the names of the two traits, as well as the score and p-value of the correlation between the two, for all triples not filtered out by our conditions. Finally, we converted the returned dictionary to a dataframe for ease of viewing.\n",
    "\n",
    "For more detailed information on Cypher queries, please refer to the [official documentation](https://neo4j.com/developer/cypher/)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}