{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### **Title**: Sankey Element\n",
    "\n",
    "**Dependencies** Bokeh\n",
    "\n",
    "**Backends** [Bokeh](./Sankey.ipynb), [Matplotlib](../matplotlib/Sankey.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import holoviews as hv\n",
    "from holoviews import dim, opts\n",
    "\n",
    "hv.extension('bokeh')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "``Sankey`` elements represent flows and their quantities in proportion to one another. The data of a Sankey element defines a directed, acyclic graph, making it a specialized subclass of the ``Graph`` element. The width of the lines in a ``Sankey`` diagram represent the magnitudes of each edge. Both the edges and nodes can be defined through any valid tabular format including pandas dataframes, dictionaries/tuples of columns, NumPy arrays and lists of tuples. \n",
    "\n",
    "The easiest way to define a Sankey element is to define a list of edges and their associated quantities:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sankey = hv.Sankey([\n",
    "    ['A', 'X', 5],\n",
    "    ['A', 'Y', 7],\n",
    "    ['A', 'Z', 6],\n",
    "    ['B', 'X', 2],\n",
    "    ['B', 'Y', 9],\n",
    "    ['B', 'Z', 4]]\n",
    ")\n",
    "sankey.opts(width=600, height=400)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above the node labels are generated automatically from the supplied edges, however, frequently the edges are expressed as integer node indexes and labels are provided separately. We can explicitly define the set of nodes as a Dataset of indexes and labels as key and value dimensions respectively. We can also use the ``edge_color`` style option to define a style mapping to a dimension and adjust the ``label_position`` from ``\"right\"`` to ``\"left\"``.\n",
    "\n",
    "Here we will plot a simple dataset of the career paths of UK PhD students source as described in a [2010 Royal Society policy report](https://royalsociety.org/uploadedFiles/Royal_Society_Content/policy/publications/2010/4294970126.pdf) entitled “The Scientific Century: securing our future prosperity”. We define the nodes enumerated by their integer index and the percentages flowing between each career stage. Finally we define a Dimension with units for the values and color by the target node which we label \"To\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nodes = [\"PhD\", \"Career Outside Science\",  \"Early Career Researcher\", \"Research Staff\",\n",
    "         \"Permanent Research Staff\",  \"Professor\",  \"Non-Academic Research\"]\n",
    "nodes = hv.Dataset(enumerate(nodes), 'index', 'label')\n",
    "edges = [\n",
    "    (0, 1, 53), (0, 2, 47), (2, 6, 17), (2, 3, 30), (3, 1, 22.5), (3, 4, 3.5), (3, 6, 4.), (4, 5, 0.45)\n",
    "]\n",
    "\n",
    "value_dim = hv.Dimension('Percentage', unit='%')\n",
    "careers = hv.Sankey((edges, nodes), ['From', 'To'], vdims=value_dim)\n",
    "\n",
    "careers.opts(\n",
    "    opts.Sankey(labels='label', label_position='right', width=900, height=300, cmap='Set1',\n",
    "                edge_color=dim('To').str(), node_color=dim('index').str()))"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}