{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ![FREYA Logo](https://github.com/datacite/pidgraph-notebooks-python/blob/master/images/freya_200x121.png?raw=true) | [FREYA](https://www.project-freya.eu/en) WP2 [User Story3](https://www.pidforum.org/t/pid-graph-graphql-example-research-organization/929) | As an administrator for the University of Oxford I am interested in the reuse of research outputs from our university, so that I can help identify the most interesting research outputs.\n",
    ":------------- | :------------- | :-------------\n",
    "\n",
    "It is important for research organisations to measure quality and quantity of their outputs as well as their relevance to latest global research trends and to their own strategic science direction.<p />\n",
    "This notebook uses the [DataCite GraphQL API](https://api.datacite.org/graphql) to retrieve up to 100 outputs (e.g. publications or datasets) from [University of Oxford](https://ror.org/052gg0110) in order to quantify and visualise their reuse.\n",
    "\n",
    "**Goal**: By the end of this notebook, for a given organization, you should be able to display:\n",
    "- Counts of citations, views and downloads metrics, aggregated across all of the organization's outputs;\n",
    "- An interactive stacked bar plot showing how the metric counts of each of the following <ins>output characteristics</ins> contributes the corresponding aggregated metric counts:<br><ins>Type</ins>, <ins>Publication Year</ins>, <ins>Author Affiliation</ins> and <ins>DOI</ins>, e.g. <br><br><img src=\"example_plot.png\" width=\"353\" height=\"206\" />\n",
    "- A word cloud of words from output titles in which word size is determined by the <ins>aggregated citations, views and downloads count</ins> corresponding to all output titles in which it appears\n",
    "- An interactive matrix diagram graph of affiliations of organization's outputs' authors, e.g. \n",
    " - Affiliations are values on X and Y axes, and \n",
    " - The rectangular cells in the matrix indicate that authors from the two respective affiliations shared at least one publication together.\n",
    " - Matrix cells are coloured differently depending on the geographic regions of the corresponding affiliations, e.g. <br><br><img src=\"vega_by_index.svg\" width=\"370\" height=\"320\" />\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install libraries and prepare GraphQL client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "# Install required Python packages\n",
    "!pip install gql requests numpy plotly pyvis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare the GraphQL client\n",
    "import requests\n",
    "from IPython.display import display, Markdown\n",
    "from gql import gql, Client\n",
    "from gql.transport.requests import RequestsHTTPTransport\n",
    "\n",
    "_transport = RequestsHTTPTransport(\n",
    "    url='https://api.datacite.org/graphql',\n",
    "    use_json=True,\n",
    ")\n",
    "\n",
    "client = Client(\n",
    "    transport=_transport,\n",
    "    fetch_schema_from_transport=True,\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define and run GraphQL query\n",
    "Define the GraphQL query to retrieve up to 100 outputs (e.g. publications or datasets) from [University of Oxford](https://ror.org/052gg0110), using its **Research Organization Registry (ROR)** identifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the GraphQL query to retrieve up to 100 outputs of University of Oxford, with at least 100 views each.\n",
    "query_params = {\n",
    "    \"rorId\" : \"https://ror.org/052gg0110\",\n",
    "    \"maxOutputs\": 100,\n",
    "    \"minViews\" : 100\n",
    "}\n",
    "\n",
    "query = gql(\"\"\"query getOutputs($rorId: ID!, $maxOutputs: Int!, $minViews: Int!)\n",
    "{\n",
    " organization(id: $rorId) {\n",
    "    id\n",
    "    name\n",
    "    alternateName\n",
    "    citationCount\n",
    "    viewCount\n",
    "    downloadCount\n",
    "    works(hasViews: $minViews, first: $maxOutputs) {\n",
    "      totalCount\n",
    "      published {\n",
    "        title\n",
    "        count\n",
    "      }\n",
    "      resourceTypes {\n",
    "        title\n",
    "        count\n",
    "      }\n",
    "      nodes {\n",
    "        id\n",
    "        type\n",
    "        publisher\n",
    "        publicationYear\n",
    "        titles {\n",
    "          title\n",
    "        }\n",
    "        citations {\n",
    "           nodes {\n",
    "             id\n",
    "             titles {\n",
    "                title\n",
    "             }\n",
    "           }\n",
    "        }\n",
    "        creators {\n",
    "          id\n",
    "          name\n",
    "          affiliation {\n",
    "            id\n",
    "            name\n",
    "          }\n",
    "        }\n",
    "        citationCount\n",
    "        viewCount\n",
    "        downloadCount\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "}\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the above query via the GraphQL client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "data = client.execute(query, variable_values=json.dumps(query_params))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Display total metric counts \n",
    "Display total number of <ins>citations, views and downloads</ins> of [University of Oxford](https://ror.org/052gg0110)'s outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the total count per metric, aggregated across all of the organization's outputs\n",
    "organization = data['organization']\n",
    "organizationName = organization['name']\n",
    "# Initialise metric counts across all outputs of the organization\n",
    "metricCounts = {}\n",
    "for metric in ['citationCount', 'viewCount', 'downloadCount']:\n",
    "    metricCounts[metric] = 0\n",
    "    \n",
    "# Aggregate metric counts across all the parts\n",
    "for node in organization['works']['nodes']:\n",
    "    for metric in metricCounts:\n",
    "         metricCounts[metric] += node[metric]\n",
    "            \n",
    "# Display the aggregated metric counts\n",
    "tableBody=\"\"\n",
    "for metric in metricCounts:        \n",
    "    tableBody += \"%s | **%s**\\n\" % (metric, str(metricCounts[metric]))\n",
    "if tableBody:\n",
    "   display(Markdown(\"Aggregated metric counts across %d outputs of [University of Oxford](https://ror.org/052gg0110):\" % organization['works']['totalCount']))\n",
    "   display(Markdown(\"|Metric | Aggregated Count|\\n|---|---|\\n%s\" % tableBody))                                 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot metric counts per output type\n",
    "Plot stacked bar plot showing how <ins>each type</ins> of [University of Oxford](https://ror.org/052gg0110)'s outputs contribute their metric counts to the corresponding aggregated total."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.io as pio\n",
    "import plotly.express as px\n",
    "from IPython.display import IFrame\n",
    "import pandas as pd\n",
    "\n",
    "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n",
    "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n",
    "    idx_col = df.index.name\n",
    "    m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n",
    "    # For Plotly colour sequences see: https://plotly.com/python/discrete-color/     \n",
    "    return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n",
    "                  color_discrete_sequence=px.colors.qualitative.Pastel1)\n",
    "\n",
    "# Collect metric counts\n",
    "organization = data['organization']\n",
    "\n",
    "# Initialise dicts for the stacked bar plot\n",
    "labels = {0: 'All Output Types'}\n",
    "citationCounts = {}\n",
    "viewCounts = {}\n",
    "downloadCounts = {}\n",
    "\n",
    "# Collect output type labels\n",
    "outputTypesSet = set([])\n",
    "outputType2Pos = {}\n",
    "for node in organization['works']['nodes']:\n",
    "    outputTypesSet.add(node['type'])\n",
    "outputTypes = list(outputTypesSet);    \n",
    "for pos, outputType in enumerate(outputTypes):\n",
    "    labels[pos + 1] = outputType\n",
    "    outputType2Pos[outputType] = pos + 1\n",
    "    \n",
    "# Initialise metric counts\n",
    "for pos, _ in enumerate(labels):\n",
    "    citationCounts[pos] = 0\n",
    "    viewCounts[pos] = 0\n",
    "    downloadCounts[pos] = 0\n",
    "# Populate metric counts per output type (key = i) and add them to the aggregated counts (key: 0)\n",
    "for node in organization['works']['nodes']:\n",
    "    pos = outputType2Pos[node['type']]\n",
    "    citationCounts[0] += node['citationCount']\n",
    "    viewCounts[0] += node['viewCount']\n",
    "    downloadCounts[0] += node['downloadCount']\n",
    "    citationCounts[pos] += node['citationCount']\n",
    "    viewCounts[pos] += node['viewCount']\n",
    "    downloadCounts[pos] += node['downloadCount']\n",
    "\n",
    "# Create stacked bar plot\n",
    "x_name = \"%s's Output Types\" % organizationName\n",
    "df = pd.DataFrame({x_name: labels,\n",
    "                   'Citations': citationCounts,\n",
    "                   'Views': viewCounts,\n",
    "                   'Downloads': downloadCounts})\n",
    "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n",
    "\n",
    "# Set plot background to transparent\n",
    "fig.update_layout({\n",
    "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n",
    "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n",
    "})\n",
    "\n",
    "# Write interactive plot out to html file\n",
    "pio.write_html(fig, file='ot_out.html')\n",
    "\n",
    "# Display plot from the saved html file\n",
    "display(Markdown(\"Citations, views and downloads for [University of Oxford](https://ror.org/052gg0110)'s outputs, shown per output type as stacked bar plot:\"))\n",
    "IFrame(src=\"./ot_out.html\", width=500, height=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot metric counts per year\n",
    "Plot stacked bar plot showing how outputs of [University of Oxford](https://ror.org/052gg0110) in each year contribute their metric counts to the corresponding aggregated total."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.io as pio\n",
    "import plotly.express as px\n",
    "from IPython.display import IFrame\n",
    "import pandas as pd\n",
    "\n",
    "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n",
    "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n",
    "    idx_col = df.index.name\n",
    "    m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n",
    "    # For Plotly colour sequences see: https://plotly.com/python/discrete-color/     \n",
    "    return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n",
    "                  color_discrete_sequence=px.colors.qualitative.Pastel1)\n",
    "\n",
    "# Collect metric counts\n",
    "organization = data['organization']\n",
    "\n",
    "# Initialise dicts for the stacked bar plot\n",
    "labels = {}\n",
    "citationCounts = {}\n",
    "viewCounts = {}\n",
    "downloadCounts = {}\n",
    "\n",
    "# Collect output type labels\n",
    "outputPubYearsSet = set([])\n",
    "outputPubYear2Pos = {}\n",
    "for node in organization['works']['nodes']:\n",
    "    if node['publicationYear'] != '':\n",
    "        outputPubYearsSet.add(node['publicationYear'])\n",
    "outputPubYears = list(outputPubYearsSet);    \n",
    "for pos, outputPubYear in enumerate(outputPubYears):\n",
    "    labels[pos] = outputPubYear\n",
    "    outputPubYear2Pos[outputPubYear] = pos\n",
    "    \n",
    "# Initialise metric counts\n",
    "for pos, _ in enumerate(labels):\n",
    "    citationCounts[pos] = 0\n",
    "    viewCounts[pos] = 0\n",
    "    downloadCounts[pos] = 0\n",
    "# Populate metric counts per output type (key = i) and add them to the aggregated counts (key: 0)\n",
    "for node in organization['works']['nodes']:\n",
    "    pos = outputPubYear2Pos[node['publicationYear']]\n",
    "    citationCounts[pos] += node['citationCount']\n",
    "    viewCounts[pos] += node['viewCount']\n",
    "    downloadCounts[pos] += node['downloadCount']\n",
    "\n",
    "# Create stacked bar plot\n",
    "x_name = \"Publication Years of %s's Outputs\" % organizationName\n",
    "df = pd.DataFrame({x_name: labels,\n",
    "                   'Citations': citationCounts,\n",
    "                   'Views': viewCounts,\n",
    "                   'Downloads': downloadCounts})\n",
    "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n",
    "\n",
    "# Set plot background to transparent\n",
    "fig.update_layout({\n",
    "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n",
    "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n",
    "})\n",
    "\n",
    "# Write interactive plot out to html file\n",
    "pio.write_html(fig, file='yr_out.html')\n",
    "\n",
    "# Display plot from the saved html file\n",
    "display(Markdown(\"Citations, views and downloads counts of [University of Oxford](https://ror.org/052gg0110)'s outputs, shown per publication year as stacked bar plot:\"))\n",
    "IFrame(src=\"./yr_out.html\", width=1000, height=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot metric counts per author affiliation\n",
    "Plot stacked bar plot showing how individual author's affiliations of [University of Oxford](https://ror.org/052gg0110)'s outputs contribute their metric counts to the corresponding aggregated total. The plot shows <ins>top 30</ins> author affiliations (other than University of Oxford) by the <ins>combined citations, views and downloads count<ins>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.io as pio\n",
    "import plotly.express as px\n",
    "from IPython.display import IFrame\n",
    "import pandas as pd\n",
    "from operator import itemgetter\n",
    "\n",
    "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n",
    "# c.f. https://plotly.com/python-api-reference/generated/plotly.express.bar.html#plotly.express.bar\n",
    "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n",
    "    idx_col = df.index.name\n",
    "    m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n",
    "    # For Plotly colour sequences see: https://plotly.com/python/discrete-color/     \n",
    "    return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n",
    "                  color_discrete_sequence=px.colors.qualitative.Pastel1)\n",
    "\n",
    "MAX_TOP_AFFILIATIONS_BY_ALL_METRIC_COUNT=30\n",
    "\n",
    "# Collect metric counts\n",
    "organization = data['organization']\n",
    "organizationName = organization['name']\n",
    "\n",
    "# Initialise dicts for the stacked bar plot\n",
    "labels = {}\n",
    "citationCounts = {}\n",
    "viewCounts = {}\n",
    "downloadCounts = {}\n",
    "\n",
    "# Collect output type labels\n",
    "affiliationsSet = set([])\n",
    "for node in organization['works']['nodes']:\n",
    "    for creator in node['creators']:\n",
    "        for affiliation in creator['affiliation']:\n",
    "            affiliationsSet.add(affiliation['name'])\n",
    "affiliations = list(affiliationsSet);    \n",
    "    \n",
    "# Retrieve metric counts by affiliation\n",
    "affiliation2CitationCount = {}\n",
    "affiliation2ViewCount = {}\n",
    "affiliation2DownloadCount = {}\n",
    "affiliation2AllMetricCount = {}\n",
    "# Initialise metric counts\n",
    "for an in affiliations:\n",
    "    affiliation2CitationCount[an] = 0\n",
    "    affiliation2ViewCount[an] = 0\n",
    "    affiliation2DownloadCount[an] = 0\n",
    "    affiliation2AllMetricCount[an] = 0\n",
    "    \n",
    "for node in organization['works']['nodes']:\n",
    "    seenAffiliationInNode = set([])\n",
    "    for creator in node['creators']:  \n",
    "        for affiliation in creator['affiliation']:\n",
    "            an = affiliation['name']\n",
    "            if an not in seenAffiliationInNode: \n",
    "                affiliation2CitationCount[an] += node['citationCount']\n",
    "                affiliation2ViewCount[an] += node['viewCount']\n",
    "                affiliation2DownloadCount[an] += node['downloadCount']\n",
    "                affiliation2AllMetricCount[an] += node['citationCount'] + node['viewCount'] + node['downloadCount']\n",
    "                seenAffiliationInNode.add(an)         \n",
    "# Initialise metric counts\n",
    "for pos in range(len(affiliations)):\n",
    "    citationCounts[pos] = 0\n",
    "    viewCounts[pos] = 0\n",
    "    downloadCounts[pos] = 0\n",
    "    \n",
    "# Populate metric counts per output type (key = pos)\n",
    "pos = 0\n",
    "for an, _ in sorted(affiliation2AllMetricCount.items(), key = itemgetter(1), reverse = True):\n",
    "    if an != organizationName:\n",
    "        labels[pos] = an\n",
    "        citationCounts[pos] += affiliation2CitationCount[an]\n",
    "        viewCounts[pos] += affiliation2ViewCount[an]\n",
    "        downloadCounts[pos] += affiliation2AllMetricCount[an]\n",
    "        pos += 1\n",
    "    if pos >= MAX_TOP_AFFILIATIONS_BY_ALL_METRIC_COUNT:\n",
    "        break;\n",
    "\n",
    "# Create stacked bar plot\n",
    "x_name = \"Affiliations of %s's Output Authors\" % organizationName\n",
    "df = pd.DataFrame({x_name: labels,\n",
    "                   'Citations': citationCounts,\n",
    "                   'Views': viewCounts,\n",
    "                   'Downloads': downloadCounts})\n",
    "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n",
    "\n",
    "# Set plot background to transparent\n",
    "fig.update_layout({\n",
    "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n",
    "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n",
    "})\n",
    "\n",
    "# Write interactive plot out to html file\n",
    "pio.write_html(fig, file='af_out.html')\n",
    "\n",
    "# Display plot from the saved html file\n",
    "display(Markdown(\"Citations, views and downloads counts across affiliations of authors of [University of Oxford](https://ror.org/052gg0110)'s outputs, shown as stacked bar plot.<br>The plot shows <ins>top 30</ins> author affiliations (other than University of Oxford) by the <ins>combined citations, views and downloads count<ins>.\"))\n",
    "IFrame(src=\"./af_out.html\", width=1000, height=800)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot metric counts per individual output\n",
    "Plot stacked bar plot showing how individual outputs of [University of Oxford](https://ror.org/052gg0110)'s contribute their metric counts to the corresponding aggregated total. The plot shows DOIs of <ins>top 30</ins> outputs by the <ins>combined citations, views and downloads count<ins>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import plotly.io as pio\n",
    "import plotly.express as px\n",
    "from IPython.display import IFrame\n",
    "import pandas as pd\n",
    "from operator import itemgetter\n",
    "\n",
    "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n",
    "# c.f. https://plotly.com/python-api-reference/generated/plotly.express.bar.html#plotly.express.bar\n",
    "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n",
    "    idx_col = df.index.name\n",
    "    m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n",
    "    # For Plotly colour sequences see: https://plotly.com/python/discrete-color/     \n",
    "    return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n",
    "                  color_discrete_sequence=px.colors.qualitative.Pastel1)\n",
    "\n",
    "MAX_TOP_DOIS_BY_ALL_METRIC_COUNT=30\n",
    "\n",
    "# Collect metric counts\n",
    "organization = data['organization']\n",
    "organizationName = organization['name']\n",
    "\n",
    "# Initialise dicts for the stacked bar plot\n",
    "labels = {}\n",
    "citationCounts = {}\n",
    "viewCounts = {}\n",
    "downloadCounts = {}\n",
    "    \n",
    "# Retrieve metric counts by doi\n",
    "doi2CitationCount = {}\n",
    "doi2ViewCount = {}\n",
    "doi2DownloadCount = {}\n",
    "doi2AllMetricCount = {}\n",
    "\n",
    "for node in organization['works']['nodes']:\n",
    "    doi = node['id']\n",
    "    doi2CitationCount[doi] = node['citationCount']\n",
    "    doi2ViewCount[doi] = node['viewCount']\n",
    "    doi2DownloadCount[doi] = node['downloadCount']\n",
    "    doi2AllMetricCount[doi] = node['citationCount'] + node['viewCount'] + node['downloadCount']  \n",
    "    \n",
    "# Initialise metric counts\n",
    "pos = 0;\n",
    "for node in organization['works']['nodes']:\n",
    "    citationCounts[pos] = 0\n",
    "    viewCounts[pos] = 0\n",
    "    downloadCounts[pos] = 0\n",
    "    pos += 1\n",
    "    \n",
    "# Populate metric counts per output type (key = pos)\n",
    "pos = 0\n",
    "for doi, _ in sorted(doi2AllMetricCount.items(), key = itemgetter(1), reverse = True):\n",
    "    labels[pos] = \"<a href=\\\"%s\\\">%s</a>\" % (doi, \"/\".join(doi.split(\"/\")[3:]))\n",
    "    citationCounts[pos] += doi2CitationCount[doi]\n",
    "    viewCounts[pos] += doi2ViewCount[doi]\n",
    "    downloadCounts[pos] += doi2AllMetricCount[doi]\n",
    "    pos += 1\n",
    "    if pos >= MAX_TOP_DOIS_BY_ALL_METRIC_COUNT:\n",
    "        break;\n",
    "\n",
    "# Create stacked bar plot\n",
    "x_name = \"%s's Output DOIs\" % organizationName\n",
    "df = pd.DataFrame({x_name: labels,\n",
    "                   'Citations': citationCounts,\n",
    "                   'Views': viewCounts,\n",
    "                   'Downloads': downloadCounts})\n",
    "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n",
    "\n",
    "# Set plot background to transparent\n",
    "fig.update_layout({\n",
    "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n",
    "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n",
    "})\n",
    "\n",
    "# Write interactive plot out to html file\n",
    "pio.write_html(fig, file='doi_out.html')\n",
    "\n",
    "# Display plot from the saved html file\n",
    "display(Markdown(\"Citations, views and downloads counts for individual outputs of [University of Oxford](https://ror.org/052gg0110), shown as stacked bar plot.<br>The plot shows DOIs of <ins>top 30</ins> outputs by the <ins>combined citations, views and downloads count<ins>.\"))\n",
    "IFrame(src=\"./doi_out.html\", width=1000, height=800)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Display a word cloud of output titles.\n",
    "Display a word cloud of words from output titles in which word size is determined by the <ins>aggregated citations, views and downloads count</ins> corresponding to all output titles in which it appears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from wordcloud import WordCloud, STOPWORDS \n",
    "import matplotlib.pyplot as plt \n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import re\n",
    "\n",
    "stopWords = set(STOPWORDS)\n",
    "stopWords.update(['_','data','from','of','in','case','study'])\n",
    "\n",
    "organization = data['organization']\n",
    "organizationName = organization['name']\n",
    "\n",
    "titleWords=[]\n",
    "for metricCount in ['citationCount', 'viewCount', 'downloadCount']:\n",
    "    for node in organization['works']['nodes']:\n",
    "        for title in node['titles']:\n",
    "            tokens = [t.lower() for t in re.split(' |:', str(title['title'])) if t.lower() not in stopWords] \n",
    "            for i in range(node[metricCount]):\n",
    "                titleWords += tokens\n",
    "     \n",
    "x, y = np.ogrid[:800, :800]\n",
    "mask = (x - 400) ** 2 + (y - 400) ** 2 > 345 ** 2\n",
    "mask = 255 * mask.astype(int)\n",
    "    \n",
    "wordcloud = WordCloud(width = 600, height = 600, \n",
    "                background_color ='white', \n",
    "                stopwords = stopWords, \n",
    "                min_font_size = 10, \n",
    "                prefer_horizontal = 0.95,\n",
    "                mask = mask).generate(\" \".join(titleWords))\n",
    "    \n",
    "fig, ax = plt.subplots(1, 1, figsize = (10, 10), facecolor = None)\n",
    "ax.set_title(\"Word cloud of titles of up to %d outputs of %s,\\nbased on their corresponding combined citations, views and downloads count.\" % (query_params['maxOutputs'], organizationName))\n",
    "plt.imshow(wordcloud, interpolation=\"bilinear\") \n",
    "plt.axis(\"off\") \n",
    "plt.tight_layout(pad = 0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot via [Vega Editor](https://vega.github.io/editor) an interactive matrix diagram of output authors' affiliations\n",
    "Generate data in the format that you can use in [Vega Editor](https://vega.github.io/editor) to plot an interactive matrix diagram of [University of Oxford](https://ror.org/052gg0110)'s outputs author affiliations. In this diagram:\n",
    "- Affiliations are values on X and Y axes, and \n",
    "- The rectangular cells in the matrix indicate that authors from the two respective affiliations shared at least one publication together.\n",
    "- Each region, from the _affiliation to region_ mapping above, is shown by different colour: \n",
    " - **brown** cell colour indicates that the corresponding affiliations are **not in the same geographic* region**; \n",
    " - **any other** cell colour indicates that the corresponding affiliations are **in the same geographic region**. \n",
    "<br>\n",
    "\n",
    "*For affiliation to geographic region mapping see below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from IPython.display import FileLink, FileLinks\n",
    "\n",
    "# Map affiliations of authors of University of Oxford publications to regions\n",
    "af2Loc = {\n",
    "\"University of Oxford\" : \"UK\",\n",
    "\"University of Warwick\" : \"UK\",\n",
    "\"University of Idaho\" : \"North America\",\n",
    "\"University of Zurich\" : \"Europe\",\n",
    "\"University of Aberdeen\" : \"UK\",\n",
    "\"University of Sheffield\" : \"UK\",\n",
    "\"University of Bergen\" : \"Europe\",\n",
    "\"University of Tokyo\" : \"Asia\",\n",
    "\"University of Arizona\" : \"North America\",\n",
    "\"University of Connecticut\" : \"North America\",\n",
    "\"University of Queensland\" : \"Australia/New Zealand\",\n",
    "\"University of Southern Denmark\" : \"Europe\",\n",
    "\"University College London\" : \"UK\",\n",
    "\"University of Toronto\" : \"North America\",\n",
    "\"University of Washington\" : \"North America\",\n",
    "\"University of Amsterdam\" : \"Europe\",\n",
    "\"University of Edinburgh\" : \"UK\",\n",
    "\"University of California System\" : \"North America\",\n",
    "\"University of Lincoln\" : \"UK\",\n",
    "\"University of Vermont\" : \"North America\",\n",
    "\"University of Western Australia\" : \"Australia/New Zealand\",\n",
    "\"University of Helsinki\" : \"Europe\",\n",
    "\"University of Bordeaux\" : \"Europe\",\n",
    "\"University of Freiburg\" : \"Europe\",\n",
    "\"University of Liège\" : \"Europe\",\n",
    "\"University of Maryland, College Park\" : \"North America\",\n",
    "\"University of Stirling\" : \"UK\",\n",
    "\"University of Montpellier\" : \"Europe\",\n",
    "\"University of Louisville\" : \"North America\",\n",
    "\"University College Cork\" : \"Europe\",\n",
    "\"University of Auckland\" : \"Australia/New Zealand\",\n",
    "\"University of Exeter\" : \"UK\",\n",
    "\"University of Minnesota\" : \"North America\",\n",
    "\"University of Birmingham\" : \"UK\",\n",
    "\"University of Bristol\" : \"UK\",\n",
    "\"University of Córdoba\" : \"Europe\",\n",
    "\"University of Extremadura\" : \"Europe\",\n",
    "\"University of Lausanne\" : \"Europe\",\n",
    "\"University of Otago\" : \"Australia/New Zealand\",\n",
    "\"University of Paris-Sud\" : \"Europe\",\n",
    "\"University of Cape Town\" : \"Africa\",\n",
    "\"University of Groningen\" : \"Europe\",\n",
    "\"University of Konstanz\" : \"Europe\",\n",
    "\"University of Cambridge\" : \"UK\",\n",
    "\"University of Oslo\" : \"Europe\",\n",
    "\"University of the French West Indies and Guiana\" : \"Latin America\",\n",
    "\"University of California, Davis\" : \"North America\",\n",
    "\"University of Bath\" : \"Europe\",\n",
    "\"University of Montreal\" : \"North America\",\n",
    "\"University of the West of England\" : \"UK\",\n",
    "\"University of Aveiro\" : \"Europe\",\n",
    "\"University of Lisbon\" : \"Europe\",\n",
    "\"University of Leicester\" : \"UK\",\n",
    "\"University of Florida\" : \"North America\",\n",
    "\"University of South Florida\" : \"North America\",\n",
    "\"University of California, Irvine\" : \"North America\",\n",
    "\"University of Gothenburg\" : \"Europe\",\n",
    "\"University of Sussex\" : \"UK\",\n",
    "\"University of Bern\" : \"Europe\",\n",
    "\"University of Manitoba\" : \"North America\",\n",
    "\"University of Southern California\" : \"North America\",\n",
    "\"University of Technology Sydney\" : \"Australia/New Zealand\",\n",
    "\"University of Southampton\" : \"UK\",\n",
    "\"University of KwaZulu-Natal\" : \"Africa\",\n",
    "\"Columbia University\": \"North America\",\n",
    "\"Federal University of Rio Grande do Sul\": \"Latin America\",\n",
    "\"Massey University\": \"Australia/New Zealand\",\n",
    "\"Statens Serum Institut\": \"Europe\",\n",
    "\"Swansea University\": \"UK\",\n",
    "\"United States Department of Agriculture\": \"North America\",\n",
    "\"National Museum of Nature and Science\": \"Asia\",\n",
    "\"Natural History Museum and Institute\": \"Asia\",\n",
    "\"Tohoku University\": \"Asia\",\n",
    "\"Stony Brook University\": \"North America\",\n",
    "\"Harvard University\": \"North America\",\n",
    "\"Max Planck Institute for Demographic Research\": \"Europe\",\n",
    "\"Princeton University\": \"North America\",\n",
    "\"Radboud University Nijmegen\": \"Europe\",\n",
    "\"Smithsonian Environmental Research Center\": \"North America\",\n",
    "\"Stockholm University\": \"Europe\",\n",
    "\"Institute of Vertebrate Paleontology and Paleoanthropology\": \"Asia\",\n",
    "\"Royal Ontario Museum\": \"North America\",\n",
    "\"Smithsonian Institution\": \"North America\",\n",
    "\"Uppsala University\": \"Europe\",\n",
    "\"Bond University\": \"Australia/New Zealand\",\n",
    "\"Aarhus University\": \"Europe\",\n",
    "\"Boston Children's Hospital\": \"North America\",\n",
    "\"Boston University\": \"North America\",\n",
    "\"Children's Hospital\": \"North America\",\n",
    "\"American Museum of Natural History\": \"North America\",\n",
    "\"Swarthmore College\": \"UK\",\n",
    "\"Duquesne University\": \"North America\",\n",
    "\"East China Normal University\": \"Asia\",\n",
    "\"US Forest Service\": \"North America\",\n",
    "\"Centre for Research on Ecology and Forestry Applications\": \"Europe\",\n",
    "\"Swedish University of Agricultural Sciences\": \"Europe\",\n",
    "\"Technical University Munich\": \"Europe\",\n",
    "\"Institute of Cancer Research\": \"UK\",\n",
    "\"Federal University of Lavraxs\": \"Latin America\",\n",
    "\"Lancaster University\": \"UK\",\n",
    "\"State University of Campinas\": \"Latin America\",\n",
    "\"Council for Scientific and Industrial Research\": \"Africa\",\n",
    "\"Florida International University\": \"North America\",\n",
    "\"French National Institute for Agricultural Research\": \"Europe\",\n",
    "\"German Center for Integrative Biodiversity Research\": \"Europe\",\n",
    "\"Kyoto University\": \"Asia\",\n",
    "\"Royal Holloway University of London\": \"UK\",\n",
    "\"Smithsonian Tropical Research Institute\": \"North America\",\n",
    "\"Wageningen University & Research\": \"Europe\",\n",
    "\"Zoological Society of London\": \"UK\",\n",
    "\"Emory University\": \"North America\",\n",
    "\"McGill University\": \"North America\",\n",
    "\"McGill University Health Centre\": \"North America\",\n",
    "\"New York University\": \"North America\",\n",
    "\"New York University School of Medicine\": \"North America\",\n",
    "\"National Museum\": \"Unknown\",\n",
    "\"Nederlands Instituut voor Ecologie\": \"Europe\",\n",
    "\"Macquarie University\": \"Australia/New Zealand\",\n",
    "\"Australian National University\": \"Australia/New Zealand\",\n",
    "\"Bielefeld University\": \"Europe\",\n",
    "\"British Antarctic Survey\": \"UK\",\n",
    "\"Centre d'Ecologie Fonctionnelle et Evolutive\": \"Europe\",\n",
    "\"Eötvös Loránd University\": \"Europe\",\n",
    "\"Institute of Avian Research\": \"Europe\",\n",
    "\"UNSW Australia\": \"Australia/New Zealand\",\n",
    "\"Stellenbosch University\": \"Africa\",\n",
    "\"Laboratoire de Neurosciences Cognitives\": \"Europe\",\n",
    "\"Yale University\": \"North America\",\n",
    "\"Chinese Academy of Sciences\": \"Asia\",\n",
    "\"Department of Earth Sciences\": \"UK\",\n",
    "\"Imperial College London\": \"UK\",\n",
    "\"Aalto University\": \"Europe\",\n",
    "\"Institute of Theoretical Physics\": \"Unknown\",\n",
    "\"Mahidol University\": \"Asia\",\n",
    "\"Royal Institute of Technology\": \"UK\",\n",
    "\"Vanderbilt University\": \"Europe\",\n",
    "\"Wellcome Trust\": \"UK\",\n",
    "\"Max Planck Institute for Ornithology\": \"Europe\",\n",
    "\"Santa Fe Institute\": \"North America\",\n",
    "\"Lund University\": \"Europe\",\n",
    "\"Cardiff University\": \"UK\",\n",
    "\"Manchester Metropolitan University\": \"UK\",\n",
    "\"Griffith University\": \"Australia/New Zealand\",\n",
    "\"National Museums Scotland\": \"UK\",\n",
    "\"Oregon State University\": \"North America\",\n",
    "\"Rocky Mountain Biological Laboratory\": \"North America\",\n",
    "\"Federal University of Alagoas\": \"Latin America\",\n",
    "\"City, University of London\": \"UK\",\n",
    "\"Commonwealth Scientific and Industrial Research Organisation\": \"Australia/New Zealand\",\n",
    "\"National Autonomous University of Mexico\": \"Latin America\",\n",
    "\"The Open University\": \"UK\",\n",
    "\"Western Sydney University\": \"Australia/New Zealand\",\n",
    "\"Forest Research\": \"UK\",\n",
    "\"European Molecular Biology Laboratory\": \"Europe\",\n",
    "\"Johns Hopkins University\": \"North America\",\n",
    "\"National Institute of Allergy and Infectious Diseases\": \"North America\",\n",
    "\"Rakai Health Sciences Program\": \"Africa\",\n",
    "\"Federal University of Lavras\": \"Latin America\"\n",
    "}\n",
    "\n",
    "# Map regions from the above mapping to ids of groups that will be shown in different colours the matrix diagram \n",
    "loc2Group = {\n",
    "    \"Africa\": 1,\n",
    "    \"Asia\": 2,\n",
    "    \"Australia/New Zealand\": 3,\n",
    "    \"Europe\": 4,\n",
    "    \"North America\": 5,\n",
    "    \"UK\": 6,\n",
    "    \"Latin America\": 7,\n",
    "    \"Unknown\": 8\n",
    "}\n",
    "\n",
    "# Initialise intermediate data structure to store: (srcAf, trgAf) -> number of shared publications \n",
    "srcAfTrgAf2Count = {}\n",
    "# Initialise intermediate data structure to store: af --> Set of connected affs\n",
    "# Note that the number of connected affs will determine the colour of each affiliation node\n",
    "af2OtherAfs = {}\n",
    "organization = data['organization']\n",
    "organizationName = organization['name']\n",
    "\n",
    "# Populate srcAfTrgAf2Count\n",
    "allAffs = set()\n",
    "for node in organization['works']['nodes']:\n",
    "    affSet = set()\n",
    "    for creator in node['creators']:\n",
    "        for affiliation in creator['affiliation']:\n",
    "            af = affiliation['name']\n",
    "            affSet.add(af)\n",
    "    affs = sorted(list(affSet))\n",
    "    allAffs.update(affs)\n",
    "    for af in affs:\n",
    "        for af1 in affs:\n",
    "            if af1 != af:\n",
    "                if af < af1:\n",
    "                    tuple = (af, af1)\n",
    "                else: \n",
    "                    tuple = (af1, af)\n",
    "                if af not in af2OtherAfs:\n",
    "                    af2OtherAfs[af] = set()\n",
    "                af2OtherAfs[af].add(af1)\n",
    "                if af1 not in af2OtherAfs:\n",
    "                    af2OtherAfs[af1] = set()\n",
    "                af2OtherAfs[af1].add(af)   \n",
    "                \n",
    "                if tuple not in srcAfTrgAf2Count:\n",
    "                    srcAfTrgAf2Count[tuple] = 0\n",
    "                else:\n",
    "                    srcAfTrgAf2Count[tuple] += 1                   \n",
    "\n",
    "# Populate data structures needed for the matrix diagram visualisation \n",
    "idx = 0\n",
    "af2idx = {}                    \n",
    "nodes, links = [], []\n",
    "for tuple in srcAfTrgAf2Count:\n",
    "    if srcAfTrgAf2Count[tuple] > 0:\n",
    "        srcAf = tuple[0]\n",
    "        trgAf = tuple[1]\n",
    "        for af in [srcAf, trgAf]:\n",
    "            if af not in af2idx:\n",
    "                af2idx[af] = idx\n",
    "                if af in af2Loc:\n",
    "                    loc = af2Loc[af]\n",
    "                else:\n",
    "                    loc = 'Unknown'\n",
    "                grp = loc2Group[loc]\n",
    "                nodes.append({\"name\": af, \"group\": grp, \"index\": idx})\n",
    "                idx += 1\n",
    "        links.append({\"source\": af2idx[srcAf], \"target\": af2idx[trgAf], \"value\": srcAfTrgAf2Count[tuple]})\n",
    "\n",
    "for template_file in ['vega_by_group.json', 'vega_by_index.json']:\n",
    "    with open(template_file,'r') as vega_template:\n",
    "        content = eval(vega_template.read())\n",
    "    for datum in content['data']:\n",
    "        if datum[\"name\"] == \"nodes\":\n",
    "            datum[\"values\"][\"nodes\"] = nodes\n",
    "        elif datum[\"name\"] == \"edges\":\n",
    "            datum[\"values\"][\"links\"] = links\n",
    "    with open(template_file.replace('.json','.txt'), 'w') as f:\n",
    "         json.dump(content, f)\n",
    "    \n",
    "display(Markdown(\" \\\n",
    "In order to display the matrix diagram of [University of Oxford](https://ror.org/052gg0110)'s outputs' author affiliations, \\\n",
    "please do the following: \\\n",
    "<br />- Open [Vega Editor](https://vega.github.io/editor/#/custom/vega) in a separate tab or window; \\\n",
    "<br />- Click on: [vega_by_group.txt](vega_by_group.txt) or [vega_by_index.txt](vega_by_index.txt), depending on which matrix you wish to display; \\\n",
    "<br />- Copy the content of the file you selected; \\\n",
    "<br />- Paste it (overwriting the default text) into the left-hand side of the editor, as shown below:\\\n",
    "<br /><img src=\\\"vega_editor.png\\\" width=\\\"615\\\" height=\\\"1074\\\" />\\\n",
    "<br /><br />On the right-hand side you will see the matrix diagram, in which affiliations are values on X and Y axes, and the rectangle \\\n",
    "cells in the matrix indicate that authors from the two respective affiliations shared at least one publication together. \\\n",
    "<br /><br />Each region, from the _affiliation to region_ mapping above, is shown by different colour: \\\n",
    "<br />- **brown** cell colour indicates that the corresponding affiliations are **not in the same geographic region**; \\\n",
    "<br />- **any other** cell colour indicates that the corresponding affiliations are **in the same geographic region**. \\\n",
    "<br /><br />The matrix diagram files and the example images of the corresponding matrix diagrams for University of Oxford outputs are shown below: \\\n",
    "\"))\n",
    "display(Markdown(\"* [vega_by_group.txt](vega_by_group.txt) - a matrix diagram* in which publications from authors with affiliations in the same region are clustered together:<br>**Click [here](vega_by_group.svg) to see the diagram below in SVG format*<br> <img src=\\\"vega_by_group.png\\\" width=\\\"1300\\\" height=\\\"1000\\\" />\"))\n",
    "display(Markdown(\"* [vega_by_index.txt](vega_by_index.txt) - a matrix diagram* in which publications are clustered together irrespective of the author affiliations' regions:<br>**Click [here](vega_by_index.svg) to see the diagram below in SVG format*<br> <img src=\\\"vega_by_index.png\\\" width=\\\"1500\\\" height=\\\"1000\\\" /> \"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}