{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Enrichment analysis and visualisations\n",
    "\n",
    "This is loosely adapted from one of the InterMineR vignettes\n",
    "https://bioconductor.org/packages/release/bioc/vignettes/InterMineR/inst/doc/Enrichment_Analysis_and_Visualization.html\n",
    "\n",
    "## Background\n",
    "\n",
    "The InterMine gene set enrichment tool looks for annotations to a set of genes that occur more than would be expected by chance, given a background population of annotations. The hypergeometric distribution is used to calculate a P-value for each annotation and a choice of correction methods for multiple testing (Bonferonni, Holm-Bonferonni and Benjamini Hochberg) are available (Smith et al. 2012; Kalderimis et al. 2014).\n",
    "\n",
    "InterMine provides Gene Ontology enrichment statistics as well as enrichment statistics for other annotation types including protein domains, pathways, human diseases, mammalian phenotypes and publications. The default background population is all genes in the genome with the specified annotation type. However, the background population can be changed by specifying another list. More information can be found in the [online documentation](http://intermine.readthedocs.io/en/latest/embedding/list-widgets/enrichment-widgets/).\n",
    "\n",
    "## Goal for this exercise\n",
    "\n",
    "1. Use InterMine's enrichment widgets on the HumanMine public list [PL_GenomicsEngland_GenePanel%:Genetic_Epilepsy_Syndromes](http://www.humanmine.org/humanmine/bagDetails.do?scope=all&bagName=PL_GenomicsEngland_GenePanel%3AGenetic_Epilepsy_Syndromes) to see if the list is enriched for any GO Terms.\n",
    "2. Visualise the results using the GeneAnswers class. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup\n",
    "Let's start by initialising InterMineR and choosing which InterMine we'll be running queries against:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Begin by activating the InterMineR library\n",
    "# Syntax: library(LIBRARY_NAME_HERE)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to query human data - so let's look and see what InterMines are available, using the `listMines()` function: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, let's select HumanMine from this list, and store it in a variable called `humanMine`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Syntax: listMines()[\"MINE _NAME_HERE\"] \n",
    "\n",
    "# Print the value to make sure it's what we expect\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now we need to tell InterMineR to query HumanMine specifically, using the function `initInterMine()`. Let's store this in a variable called `im`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Syntax: initInterMine(mine=MINE_URL_HERE) or, if you're accessing a personal list, you'll also need your API token: \n",
    "#         initInterMine(mine=MINE_URL_HERE, \"YOUR_TOKEN_HERE\")\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting started with Enrichment\n",
    "\n",
    "Performing enrichment analysis with InterMineR is preceded by two steps:\n",
    "\n",
    "1. Get the enrichment widget name which indicates the annotation types that you want to investigate for enrichment (e.g. Gene Ontology Terms, protein domains, KEGG and Reactome Pathways, human diseases, mammalian phenotypes and publications).\n",
    "2. Retrieve the list of bioentities of interest (Genes, Proteins, SNPs, etc.) for which the analysis will be performed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First, we'll retrieve the widgets available from HumanMine and store it in a variable called `humanWidgets`. \n",
    "# This will tell us what types of enrichment are available.\n",
    "# \n",
    "# Syntax: getWidgets(im)\n",
    "\n",
    "\n",
    "# Print out the widgets so we can see what there is\n",
    "\n",
    "humanWidgets\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look through the widgets returned, especially the columns `targets` and `widgetType`. Since we plan to enrich the gene list `PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes`, we're only interested in widgets that target `Gene`s. Similarly, we don't need to look at any widgets that aren't `enrichment` type widgets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We'll put the widget list `humanWidgets` in to a data frame, so it's easy to filter.\n",
    "#\n",
    "# Syntax: as.data.frame(LIST_NAME_HERE)\n",
    "\n",
    "humanWidgetsDataFrame <- #put data frame code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's ask the data frame to give us a subset of the widgets.  \n",
    "We want the following filters: \n",
    "- `widgetType` should be `'enrichment'`\n",
    "- `targets` should be `Gene`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Syntax: subset(YOU_DATA_FRAME, COLUMN1 == 'VALUE1' & COLUMN2 == 'VALUE2' )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Syntax Note: `=` vs `==`\n",
    "##### Assignment operator: `=` or `<-`\n",
    "In the code example above, we use the double equals sign `==`. This is because a single equals sign is used for *assignment*, e.g. `kittens = 5`, which would assign the value 5 to the variable `kittens`.  \n",
    "\n",
    "In R we also use `<-` for assignment, so `kittens = 5` and `kittens <- 5` do (basically) the same thing.\n",
    "\n",
    "##### Comparison operator: `==`\n",
    "The double equals, however, is used for comparison - when we say `widgetType == 'enrichment'` we're saying that the code should compare all `widgetTypes`, and only give us ones that are equal to `enrichment`. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform enrichment analysis\n",
    "\n",
    "Looking at the filtered list of widgets above, it looks like the widget we want to use to check for enriched GO Terms is named `go_enrichment_for_gene`. We'll use the `doEnrichment` method to get the enrichment results from HumanMine. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We're using the widget `go_enrichment_for_gene` against the list name \n",
    "# `PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes`\n",
    "#\n",
    "#Syntax: \n",
    "#\n",
    "# GOEnrichmentResult <- doEnrichment(\n",
    "#   im = YOUR_CHOSEN_INTERMINE_HERE\n",
    "#   list = LIST_NAME_SAVE_ON_YOUR_CHOSEN_INTERMINE,\n",
    "#   widget = \"go_enrichment_for_gene\" #  Or whichever widget you're using\n",
    "#   organism = \"Homo sapiens\" # optional if results from more than one organism are retrieved\n",
    "# )\n",
    "\n",
    "GOEnrichmentResult <- doEnrichment(\n",
    "    # Put enrichment details here\n",
    ")\n",
    "\n",
    "# Print the results \n",
    "GOEnrichmentResult"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualise it with Gene Answers \n",
    "\n",
    "    \n",
    "For the visualisation, we'll need both the enrichment results and the list of genes we originally enriched - but the list is still on the server. Let's run a query to get the primaryIdentifiers for the genes in `PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes`. \n",
    "\n",
    "Views to **Select**:  \n",
    "- `\"Gene.primaryIdentifier\"`\n",
    "\n",
    "**Constraints** to set: \n",
    "- Path: `\"Gene\"`\n",
    "- Operator: `IN`,\n",
    "- Value: `\"PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes\"`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We'll be using the setQuery method here\n",
    "# \n",
    "# Syntax: \n",
    "# setQuery( \n",
    "#  select = c(\"VIEW.NAME1\", \"VIEW.NAME2\", .... ),\n",
    "#  where = setConstraints(\n",
    "#    paths = c(\"CONSTRAINT.PATH.A\", \"CONSTRAINT.PATH.B\"),\n",
    "#    operators = c(\"CONSTRAINT.OPERATOR.A\", \"CONSTRAINT.OPERATOR.B\"),\n",
    "#    values = list(\"CONSTRAINT.VALUE.A\",\"CONSTRAINT.VALUE.B\")\n",
    "#  )\n",
    "#)\n",
    "queryEpilepsy <- setQuery(\n",
    "    # fill out your view and constraints here\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, looking good! Now that we've set that query up, we need to actually run it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the query using the syntax `runQuery(im, QUERY_HERE)` \n",
    "queryEpilepsyResults <- # Add run query details here\n",
    "\n",
    "# print the results:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Format data for GeneAnswers\n",
    "\n",
    "We'll be using [GeneAnswers](https://bioconductor.org/packages/release/bioc/html/GeneAnswers.html) to generate some visualisations of the data we've just fetched."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load GeneAnswers package. Remember the syntax is `library(LIBRARYNAME)`\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll need to convert our InterMine data into a shape that GeneAnswers can read. Luckily, there's a nice method for that in InterMineR: `convertToGeneAnswers`. We'll need to pass it the following arguments:\n",
    "- `GOEnrichmentResult` - the enriched GO term results.\n",
    "- `queryEpilepsyResults` as the original list of IDs we enriched\n",
    "- `\"Gene.primaryIdentifier\"` as the type of IDs we're providing (as opposed to symbol or secondaryidentifiers)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert InterMineR Gene Ontology Enrichment analysis results to GeneAnswers object\n",
    "# Syntax: \n",
    "# convertToGeneAnswers(\n",
    "#  enrichmentResult = AN_INTERMINE_ENRICHMENT_RESULT,\n",
    "#  geneInput = data.frame(GeneID = ORIGINAL_LIST_OF_IDS_WE_ENRICHED_ON, \n",
    "#                         stringsAsFactors = FALSE),\n",
    "#  geneInputType = \"TYPE_OF_IDS_PROVIDED\" (e.g. \"Gene.primaryIdentifier\" or \"Gene.symbol\"),\n",
    "#  categoryType = \"GO\" # could also be null for a user-provided annotation list\n",
    "#)\n",
    "\n",
    "queryEpilepsyForGeneAnswers <- convertToGeneAnswers(\n",
    "    # Put the arguments here\n",
    "    )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the `summary(geneAnswersObjectHere)` function to see\n",
    "interesting info about our list + enrichment. Try it now: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the summary on the Gene Answers object we made in the previous code cell. \n",
    "# The variable name we used was `queryEpilepsyForGeneAnswers` \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### It's time for some graphs\n",
    "\n",
    "For more info on how the graphs are constructed and what arguments you can pass them, visit the [GeneAnswers Bioconductor page](https://bioconductor.org/packages/release/bioc/html/GeneAnswers.html) - the vignettes and especially the manual have lots of helpful information.\n",
    "\n",
    "We'll start with a concept-gene network that shows which genes are associated with our given GO terms: \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geneAnswersConceptNet(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created\n",
    "                      colorValueColumn=NULL,       # Default colours\n",
    "                      centroidSize='correctedPvalue', # the column used to size graph nodes\n",
    "                      output='fixed',              # don't spawn a new pop-up window \n",
    "                      geneSymbol = TRUE)           # Show gene symbols as labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### GO structure network\n",
    "\n",
    "We can plot the GO terms that our list is enriched for to show their relationships to one another:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geneAnswersConceptRelation(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created\n",
    "                           directed=TRUE,               # show arrows to indicate the direction of relationship\n",
    "                           output=\"fixed\",              # don't spawn a new pop-up window \n",
    "                           netMode='connection') "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Gene Interaction Network\n",
    "\n",
    "In this example, GeneAnswers uses the annotation package ‘org.Hs.eg.db’, which was used previously in convertToGeneAnswers function, to retrieve gene interactions for the specified genes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "buildNet(getGeneInput(queryEpilepsyForGeneAnswers)[,1], # queryEpilepsyForGeneAnswers\n",
    "         idType='GeneInteraction', \n",
    "         layers=3,\n",
    "         output=\"fixed\",            # don't spawn a new pop-up window \n",
    "         filterGraphIDs=getGeneInput(queryEpilepsyForGeneAnswers)[,1],\n",
    "         filterLayer=2,             # How many layers deep to look for interactors. Try changing it to 3! \n",
    "         netMode='connection')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gene Interaction Network for **PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes**.   \n",
    "\n",
    "The large black dots with yellow frame stand for the six given genes. They also connect to other genes by dark-blue-purple edges. Small black dots represent the other genes from getGeneInput. Small white dots are genes that are not in the genes from getGeneInput, but interact with these genes. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Genes vs GO Term heat map\n",
    "\n",
    "Let's plot the genes which were enriched for GO terms in a heat map..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "geneAnswersHeatmap(queryEpilepsyForGeneAnswers, \n",
    "                   catTerm=TRUE, \n",
    "                   geneSymbol=TRUE,\n",
    "                   cex.axis = 0.75)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}