{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ![FREYA Logo](https://github.com/datacite/pidgraph-notebooks-python/blob/master/images/freya_200x121.png?raw=true) | [FREYA](https://www.project-freya.eu/en) WP2 [User Story 2](https://github.com/datacite/freya/issues/63) | As a software author, I want to be able to see the citations of my software aggregated across all versions, so that I see a complete picture of reuse.\n",
    " :------------- | :------------- | :-------------\n",
    "\n",
    "Software development process involves versioned releases. Consequently, different software versions may be used for scientific discovery and thus referenced in publications. In order to quantify impact of a software, its author must be able to capture the reuse of the software across all its versions.<p />\n",
    "This notebook uses the [DataCite GraphQL API](https://api.datacite.org/graphql) to retrieve metadata about software titled: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488), including all its versions, so that its overall reuse can be quantified.\n",
    "\n",
    "**Goal**: By the end of this notebook, for a given software you should be able to display:\n",
    "- Counts of <ins>citations, views and downloads</ins> metrics, aggregated across all versions of the software\n",
    "- An interactive stacked bar plot showing how the metric counts of each version contribute to the corresponding aggregated metric counts, e.g.<br> <img src=\"example_plot.png\" width=\"258\" height=\"215\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install libraries and prepare GraphQL client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "# Install required Python packages\n",
    "!pip install gql requests numpy plotly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare the GraphQL client\n",
    "import requests\n",
    "from IPython.display import display, Markdown\n",
    "from gql import gql, Client\n",
    "from gql.transport.requests import RequestsHTTPTransport\n",
    "\n",
    "_transport = RequestsHTTPTransport(\n",
    "    url='https://api.datacite.org/graphql',\n",
    "    use_json=True,\n",
    ")\n",
    "\n",
    "client = Client(\n",
    "    transport=_transport,\n",
    "    fetch_schema_from_transport=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define and run GraphQL query\n",
    "Define the GraphQL query to retrieve metadata for the software titled: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) using its DOI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the GraphQL query to retrieve the required software's metadata\n",
    "query_params = {\n",
    "    \"softwareId\" : \"https://doi.org/10.5281/zenodo.2799488\"\n",
    "}\n",
    "\n",
    "query = gql(\"\"\"query getSoftware($softwareId: ID!)\n",
    "{\n",
    "  software(id: $softwareId) {\n",
    "    id\n",
    "    titles {\n",
    "      title\n",
    "    }\n",
    "    publicationYear\n",
    "    citations {\n",
    "      nodes {\n",
    "        id\n",
    "        titles {\n",
    "          title\n",
    "        }        \n",
    "      }\n",
    "    }\n",
    "    version\n",
    "    versionCount\n",
    "    versionOfCount\n",
    "    citationCount\n",
    "    downloadCount\n",
    "    viewCount\n",
    "    versions {\n",
    "      nodes {\n",
    "        id\n",
    "        version\n",
    "        publicationYear\n",
    "        titles {\n",
    "          title\n",
    "        }\n",
    "        citations {\n",
    "          nodes {\n",
    "            id\n",
    "            titles {\n",
    "              title\n",
    "            }\n",
    "          }\n",
    "        }\n",
    "        version\n",
    "        versionCount\n",
    "        versionOfCount\n",
    "        citationCount\n",
    "        downloadCount\n",
    "        viewCount\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "}\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the above query via the GraphQL client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "data = client.execute(query, variable_values=json.dumps(query_params))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Display total software metrics\n",
    "Display total number of <ins>citations, views and downloads</ins> across all versions of software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Aggregated metric counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions:"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "|Metric | Aggregated Count|\n",
       "|---|---|\n",
       "citationCount | **3**\n",
       "viewCount | **0**\n",
       "downloadCount | **0**\n"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Get the total count per metric, aggregated for across all versions of the software\n",
    "software = data['software']\n",
    "# Initialise metric counts\n",
    "metricCounts = {}\n",
    "for metric in ['citationCount', 'viewCount', 'downloadCount']:\n",
    "    metricCounts[metric] = 0\n",
    "    \n",
    "# Aggregate metric counts across all the version\n",
    "for node in software['versions']['nodes']:\n",
    "    for metric in metricCounts:\n",
    "         metricCounts[metric] += node[metric]\n",
    "            \n",
    "# Display the aggregated metric counts\n",
    "tableBody=\"\"\n",
    "for metric in metricCounts:        \n",
    "    tableBody += \"%s | **%s**\\n\" % (metric, str(metricCounts[metric]))\n",
    "if tableBody:\n",
    "   display(Markdown(\"Aggregated metric counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions:\"))\n",
    "   display(Markdown(\"|Metric | Aggregated Count|\\n|---|---|\\n%s\" % tableBody))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot metric counts per software version\n",
    "Plot stacked bar plot showing how the individual versions of software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) contribute their metric counts to the corresponding aggregated total."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "Citations, views and downloads counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions, shown as stacked bar plot:"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "        <iframe\n",
       "            width=\"500\"\n",
       "            height=\"500\"\n",
       "            src=\"./out.html\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.lib.display.IFrame at 0x107a9add8>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import plotly.io as pio\n",
    "import plotly.express as px\n",
    "from IPython.display import IFrame\n",
    "import pandas as pd\n",
    "\n",
    "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n",
    "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n",
    "    idx_col = df.index.name\n",
    "    m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n",
    "    # For Plotly colour sequences see: https://plotly.com/python/discrete-color/     \n",
    "    return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n",
    "                  color_discrete_sequence=px.colors.qualitative.Pastel1)\n",
    "\n",
    "# Collect metric counts\n",
    "software = data['software']\n",
    "version = software['version']\n",
    "\n",
    "# Initialise dicts for the stacked bar plot\n",
    "labels = {0: 'All Software Versions'}\n",
    "citationCounts = {}\n",
    "viewCounts = {}\n",
    "downloadCounts = {}\n",
    "\n",
    "# Collect software/version labels\n",
    "versionCnt = 1\n",
    "for node in software['versions']['nodes']:\n",
    "    version = software['version']\n",
    "    labels[versionCnt] = '%s (%s)' % (version, node['publicationYear'])\n",
    "    versionCnt += 1\n",
    "    \n",
    "# Initialise aggregated metric counts (key: 0)\n",
    "citationCounts[0] = 0\n",
    "viewCounts[0] = 0\n",
    "downloadCounts[0] = 0\n",
    "    \n",
    "# Populate metric counts for individual versions (key: versionCnt) and add them to the aggregated counts (key: 0)\n",
    "versionCnt = 1\n",
    "for node in software['versions']['nodes']:\n",
    "    citationCounts[0] += node['citationCount']\n",
    "    viewCounts[0] += node['viewCount']\n",
    "    downloadCounts[0] += node['downloadCount']\n",
    "    citationCounts[versionCnt] = node['citationCount']\n",
    "    viewCounts[versionCnt] = node['viewCount']\n",
    "    downloadCounts[versionCnt] = node['downloadCount']\n",
    "    versionCnt += 1\n",
    "\n",
    "# Create stacked bar plot\n",
    "df = pd.DataFrame({'Software/Versions': labels,\n",
    "                   'Citations': citationCounts,\n",
    "                   'Views': viewCounts,\n",
    "                   'Downloads': downloadCounts})\n",
    "fig = px_stacked_bar(df.set_index('Software/Versions'), y_name = \"Counts\")\n",
    "\n",
    "# Set plot background to transparent\n",
    "fig.update_layout({\n",
    "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n",
    "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n",
    "})\n",
    "\n",
    "# Write interactive plot out to html file\n",
    "pio.write_html(fig, file='out.html')\n",
    "\n",
    "# Display plot from the saved html file\n",
    "display(Markdown(\"Citations, views and downloads counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions, shown as stacked bar plot:\"))\n",
    "IFrame(src=\"./out.html\", width=500, height=500)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}