{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ " ![FREYA Logo](https://github.com/datacite/pidgraph-notebooks-python/blob/master/images/freya_200x121.png?raw=true) | [FREYA](https://www.project-freya.eu/en) WP2 [User Story3](https://www.pidforum.org/t/pid-graph-graphql-example-research-organization/929) | As an administrator for the University of Oxford I am interested in the reuse of research outputs from our university, so that I can help identify the most interesting research outputs.\n", ":------------- | :------------- | :-------------\n", "\n", "It is important for research organisations to measure quality and quantity of their outputs as well as their relevance to latest global research trends and to their own strategic science direction.

\n", "This notebook uses the [DataCite GraphQL API](https://api.datacite.org/graphql) to retrieve up to 100 outputs (e.g. publications or datasets) from [University of Oxford](https://ror.org/052gg0110) in order to quantify and visualise their reuse.\n", "\n", "**Goal**: By the end of this notebook, for a given organization, you should be able to display:\n", "- Counts of citations, views and downloads metrics, aggregated across all of the organization's outputs;\n", "- An interactive stacked bar plot showing how the metric counts of each of the following output characteristics contributes the corresponding aggregated metric counts:
Type, Publication Year, Author Affiliation and DOI, e.g.

\n", "- A word cloud of words from output titles in which word size is determined by the aggregated citations, views and downloads count corresponding to all output titles in which it appears\n", "- An interactive matrix diagram graph of affiliations of organization's outputs' authors, e.g. \n", " - Affiliations are values on X and Y axes, and \n", " - The rectangular cells in the matrix indicate that authors from the two respective affiliations shared at least one publication together.\n", " - Matrix cells are coloured differently depending on the geographic regions of the corresponding affiliations, e.g.

\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install libraries and prepare GraphQL client" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Install required Python packages\n", "!pip install gql requests numpy plotly pyvis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Prepare the GraphQL client\n", "import requests\n", "from IPython.display import display, Markdown\n", "from gql import gql, Client\n", "from gql.transport.requests import RequestsHTTPTransport\n", "\n", "_transport = RequestsHTTPTransport(\n", " url='https://api.datacite.org/graphql',\n", " use_json=True,\n", ")\n", "\n", "client = Client(\n", " transport=_transport,\n", " fetch_schema_from_transport=True,\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define and run GraphQL query\n", "Define the GraphQL query to retrieve up to 100 outputs (e.g. publications or datasets) from [University of Oxford](https://ror.org/052gg0110), using its **Research Organization Registry (ROR)** identifier." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate the GraphQL query to retrieve up to 100 outputs of University of Oxford, with at least 100 views each.\n", "query_params = {\n", " \"rorId\" : \"https://ror.org/052gg0110\",\n", " \"maxOutputs\": 100,\n", " \"minViews\" : 100\n", "}\n", "\n", "query = gql(\"\"\"query getOutputs($rorId: ID!, $maxOutputs: Int!, $minViews: Int!)\n", "{\n", " organization(id: $rorId) {\n", " id\n", " name\n", " alternateName\n", " citationCount\n", " viewCount\n", " downloadCount\n", " works(hasViews: $minViews, first: $maxOutputs) {\n", " totalCount\n", " published {\n", " title\n", " count\n", " }\n", " resourceTypes {\n", " title\n", " count\n", " }\n", " nodes {\n", " id\n", " type\n", " publisher\n", " publicationYear\n", " titles {\n", " title\n", " }\n", " citations {\n", " nodes {\n", " id\n", " titles {\n", " title\n", " }\n", " }\n", " }\n", " creators {\n", " id\n", " name\n", " affiliation {\n", " id\n", " name\n", " }\n", " }\n", " citationCount\n", " viewCount\n", " downloadCount\n", " }\n", " }\n", " }\n", "}\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the above query via the GraphQL client" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "data = client.execute(query, variable_values=json.dumps(query_params))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display total metric counts \n", "Display total number of citations, views and downloads of [University of Oxford](https://ror.org/052gg0110)'s outputs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get the total count per metric, aggregated across all of the organization's outputs\n", "organization = data['organization']\n", "organizationName = organization['name']\n", "# Initialise metric counts across all outputs of the organization\n", "metricCounts = {}\n", "for metric in ['citationCount', 'viewCount', 'downloadCount']:\n", " metricCounts[metric] = 0\n", " \n", "# Aggregate metric counts across all the parts\n", "for node in organization['works']['nodes']:\n", " for metric in metricCounts:\n", " metricCounts[metric] += node[metric]\n", " \n", "# Display the aggregated metric counts\n", "tableBody=\"\"\n", "for metric in metricCounts: \n", " tableBody += \"%s | **%s**\\n\" % (metric, str(metricCounts[metric]))\n", "if tableBody:\n", " display(Markdown(\"Aggregated metric counts across %d outputs of [University of Oxford](https://ror.org/052gg0110):\" % organization['works']['totalCount']))\n", " display(Markdown(\"|Metric | Aggregated Count|\\n|---|---|\\n%s\" % tableBody)) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot metric counts per output type\n", "Plot stacked bar plot showing how each type of [University of Oxford](https://ror.org/052gg0110)'s outputs contribute their metric counts to the corresponding aggregated total." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly.io as pio\n", "import plotly.express as px\n", "from IPython.display import IFrame\n", "import pandas as pd\n", "\n", "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n", "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n", " idx_col = df.index.name\n", " m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n", " # For Plotly colour sequences see: https://plotly.com/python/discrete-color/ \n", " return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n", " color_discrete_sequence=px.colors.qualitative.Pastel1)\n", "\n", "# Collect metric counts\n", "organization = data['organization']\n", "\n", "# Initialise dicts for the stacked bar plot\n", "labels = {0: 'All Output Types'}\n", "citationCounts = {}\n", "viewCounts = {}\n", "downloadCounts = {}\n", "\n", "# Collect output type labels\n", "outputTypesSet = set([])\n", "outputType2Pos = {}\n", "for node in organization['works']['nodes']:\n", " outputTypesSet.add(node['type'])\n", "outputTypes = list(outputTypesSet); \n", "for pos, outputType in enumerate(outputTypes):\n", " labels[pos + 1] = outputType\n", " outputType2Pos[outputType] = pos + 1\n", " \n", "# Initialise metric counts\n", "for pos, _ in enumerate(labels):\n", " citationCounts[pos] = 0\n", " viewCounts[pos] = 0\n", " downloadCounts[pos] = 0\n", "# Populate metric counts per output type (key = i) and add them to the aggregated counts (key: 0)\n", "for node in organization['works']['nodes']:\n", " pos = outputType2Pos[node['type']]\n", " citationCounts[0] += node['citationCount']\n", " viewCounts[0] += node['viewCount']\n", " downloadCounts[0] += node['downloadCount']\n", " citationCounts[pos] += node['citationCount']\n", " viewCounts[pos] += node['viewCount']\n", " downloadCounts[pos] += node['downloadCount']\n", "\n", "# Create stacked bar plot\n", "x_name = \"%s's Output Types\" % organizationName\n", "df = pd.DataFrame({x_name: labels,\n", " 'Citations': citationCounts,\n", " 'Views': viewCounts,\n", " 'Downloads': downloadCounts})\n", "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n", "\n", "# Set plot background to transparent\n", "fig.update_layout({\n", "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n", "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n", "})\n", "\n", "# Write interactive plot out to html file\n", "pio.write_html(fig, file='ot_out.html')\n", "\n", "# Display plot from the saved html file\n", "display(Markdown(\"Citations, views and downloads for [University of Oxford](https://ror.org/052gg0110)'s outputs, shown per output type as stacked bar plot:\"))\n", "IFrame(src=\"./ot_out.html\", width=500, height=500)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot metric counts per year\n", "Plot stacked bar plot showing how outputs of [University of Oxford](https://ror.org/052gg0110) in each year contribute their metric counts to the corresponding aggregated total." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly.io as pio\n", "import plotly.express as px\n", "from IPython.display import IFrame\n", "import pandas as pd\n", "\n", "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n", "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n", " idx_col = df.index.name\n", " m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n", " # For Plotly colour sequences see: https://plotly.com/python/discrete-color/ \n", " return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n", " color_discrete_sequence=px.colors.qualitative.Pastel1)\n", "\n", "# Collect metric counts\n", "organization = data['organization']\n", "\n", "# Initialise dicts for the stacked bar plot\n", "labels = {}\n", "citationCounts = {}\n", "viewCounts = {}\n", "downloadCounts = {}\n", "\n", "# Collect output type labels\n", "outputPubYearsSet = set([])\n", "outputPubYear2Pos = {}\n", "for node in organization['works']['nodes']:\n", " if node['publicationYear'] != '':\n", " outputPubYearsSet.add(node['publicationYear'])\n", "outputPubYears = list(outputPubYearsSet); \n", "for pos, outputPubYear in enumerate(outputPubYears):\n", " labels[pos] = outputPubYear\n", " outputPubYear2Pos[outputPubYear] = pos\n", " \n", "# Initialise metric counts\n", "for pos, _ in enumerate(labels):\n", " citationCounts[pos] = 0\n", " viewCounts[pos] = 0\n", " downloadCounts[pos] = 0\n", "# Populate metric counts per output type (key = i) and add them to the aggregated counts (key: 0)\n", "for node in organization['works']['nodes']:\n", " pos = outputPubYear2Pos[node['publicationYear']]\n", " citationCounts[pos] += node['citationCount']\n", " viewCounts[pos] += node['viewCount']\n", " downloadCounts[pos] += node['downloadCount']\n", "\n", "# Create stacked bar plot\n", "x_name = \"Publication Years of %s's Outputs\" % organizationName\n", "df = pd.DataFrame({x_name: labels,\n", " 'Citations': citationCounts,\n", " 'Views': viewCounts,\n", " 'Downloads': downloadCounts})\n", "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n", "\n", "# Set plot background to transparent\n", "fig.update_layout({\n", "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n", "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n", "})\n", "\n", "# Write interactive plot out to html file\n", "pio.write_html(fig, file='yr_out.html')\n", "\n", "# Display plot from the saved html file\n", "display(Markdown(\"Citations, views and downloads counts of [University of Oxford](https://ror.org/052gg0110)'s outputs, shown per publication year as stacked bar plot:\"))\n", "IFrame(src=\"./yr_out.html\", width=1000, height=500)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot metric counts per author affiliation\n", "Plot stacked bar plot showing how individual author's affiliations of [University of Oxford](https://ror.org/052gg0110)'s outputs contribute their metric counts to the corresponding aggregated total. The plot shows top 30 author affiliations (other than University of Oxford) by the combined citations, views and downloads count." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly.io as pio\n", "import plotly.express as px\n", "from IPython.display import IFrame\n", "import pandas as pd\n", "from operator import itemgetter\n", "\n", "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n", "# c.f. https://plotly.com/python-api-reference/generated/plotly.express.bar.html#plotly.express.bar\n", "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n", " idx_col = df.index.name\n", " m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n", " # For Plotly colour sequences see: https://plotly.com/python/discrete-color/ \n", " return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n", " color_discrete_sequence=px.colors.qualitative.Pastel1)\n", "\n", "MAX_TOP_AFFILIATIONS_BY_ALL_METRIC_COUNT=30\n", "\n", "# Collect metric counts\n", "organization = data['organization']\n", "organizationName = organization['name']\n", "\n", "# Initialise dicts for the stacked bar plot\n", "labels = {}\n", "citationCounts = {}\n", "viewCounts = {}\n", "downloadCounts = {}\n", "\n", "# Collect output type labels\n", "affiliationsSet = set([])\n", "for node in organization['works']['nodes']:\n", " for creator in node['creators']:\n", " for affiliation in creator['affiliation']:\n", " affiliationsSet.add(affiliation['name'])\n", "affiliations = list(affiliationsSet); \n", " \n", "# Retrieve metric counts by affiliation\n", "affiliation2CitationCount = {}\n", "affiliation2ViewCount = {}\n", "affiliation2DownloadCount = {}\n", "affiliation2AllMetricCount = {}\n", "# Initialise metric counts\n", "for an in affiliations:\n", " affiliation2CitationCount[an] = 0\n", " affiliation2ViewCount[an] = 0\n", " affiliation2DownloadCount[an] = 0\n", " affiliation2AllMetricCount[an] = 0\n", " \n", "for node in organization['works']['nodes']:\n", " seenAffiliationInNode = set([])\n", " for creator in node['creators']: \n", " for affiliation in creator['affiliation']:\n", " an = affiliation['name']\n", " if an not in seenAffiliationInNode: \n", " affiliation2CitationCount[an] += node['citationCount']\n", " affiliation2ViewCount[an] += node['viewCount']\n", " affiliation2DownloadCount[an] += node['downloadCount']\n", " affiliation2AllMetricCount[an] += node['citationCount'] + node['viewCount'] + node['downloadCount']\n", " seenAffiliationInNode.add(an) \n", "# Initialise metric counts\n", "for pos in range(len(affiliations)):\n", " citationCounts[pos] = 0\n", " viewCounts[pos] = 0\n", " downloadCounts[pos] = 0\n", " \n", "# Populate metric counts per output type (key = pos)\n", "pos = 0\n", "for an, _ in sorted(affiliation2AllMetricCount.items(), key = itemgetter(1), reverse = True):\n", " if an != organizationName:\n", " labels[pos] = an\n", " citationCounts[pos] += affiliation2CitationCount[an]\n", " viewCounts[pos] += affiliation2ViewCount[an]\n", " downloadCounts[pos] += affiliation2AllMetricCount[an]\n", " pos += 1\n", " if pos >= MAX_TOP_AFFILIATIONS_BY_ALL_METRIC_COUNT:\n", " break;\n", "\n", "# Create stacked bar plot\n", "x_name = \"Affiliations of %s's Output Authors\" % organizationName\n", "df = pd.DataFrame({x_name: labels,\n", " 'Citations': citationCounts,\n", " 'Views': viewCounts,\n", " 'Downloads': downloadCounts})\n", "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n", "\n", "# Set plot background to transparent\n", "fig.update_layout({\n", "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n", "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n", "})\n", "\n", "# Write interactive plot out to html file\n", "pio.write_html(fig, file='af_out.html')\n", "\n", "# Display plot from the saved html file\n", "display(Markdown(\"Citations, views and downloads counts across affiliations of authors of [University of Oxford](https://ror.org/052gg0110)'s outputs, shown as stacked bar plot.
The plot shows top 30 author affiliations (other than University of Oxford) by the combined citations, views and downloads count.\"))\n", "IFrame(src=\"./af_out.html\", width=1000, height=800)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot metric counts per individual output\n", "Plot stacked bar plot showing how individual outputs of [University of Oxford](https://ror.org/052gg0110)'s contribute their metric counts to the corresponding aggregated total. The plot shows DOIs of top 30 outputs by the combined citations, views and downloads count." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import plotly.io as pio\n", "import plotly.express as px\n", "from IPython.display import IFrame\n", "import pandas as pd\n", "from operator import itemgetter\n", "\n", "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n", "# c.f. https://plotly.com/python-api-reference/generated/plotly.express.bar.html#plotly.express.bar\n", "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n", " idx_col = df.index.name\n", " m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n", " # For Plotly colour sequences see: https://plotly.com/python/discrete-color/ \n", " return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n", " color_discrete_sequence=px.colors.qualitative.Pastel1)\n", "\n", "MAX_TOP_DOIS_BY_ALL_METRIC_COUNT=30\n", "\n", "# Collect metric counts\n", "organization = data['organization']\n", "organizationName = organization['name']\n", "\n", "# Initialise dicts for the stacked bar plot\n", "labels = {}\n", "citationCounts = {}\n", "viewCounts = {}\n", "downloadCounts = {}\n", " \n", "# Retrieve metric counts by doi\n", "doi2CitationCount = {}\n", "doi2ViewCount = {}\n", "doi2DownloadCount = {}\n", "doi2AllMetricCount = {}\n", "\n", "for node in organization['works']['nodes']:\n", " doi = node['id']\n", " doi2CitationCount[doi] = node['citationCount']\n", " doi2ViewCount[doi] = node['viewCount']\n", " doi2DownloadCount[doi] = node['downloadCount']\n", " doi2AllMetricCount[doi] = node['citationCount'] + node['viewCount'] + node['downloadCount'] \n", " \n", "# Initialise metric counts\n", "pos = 0;\n", "for node in organization['works']['nodes']:\n", " citationCounts[pos] = 0\n", " viewCounts[pos] = 0\n", " downloadCounts[pos] = 0\n", " pos += 1\n", " \n", "# Populate metric counts per output type (key = pos)\n", "pos = 0\n", "for doi, _ in sorted(doi2AllMetricCount.items(), key = itemgetter(1), reverse = True):\n", " labels[pos] = \"%s\" % (doi, \"/\".join(doi.split(\"/\")[3:]))\n", " citationCounts[pos] += doi2CitationCount[doi]\n", " viewCounts[pos] += doi2ViewCount[doi]\n", " downloadCounts[pos] += doi2AllMetricCount[doi]\n", " pos += 1\n", " if pos >= MAX_TOP_DOIS_BY_ALL_METRIC_COUNT:\n", " break;\n", "\n", "# Create stacked bar plot\n", "x_name = \"%s's Output DOIs\" % organizationName\n", "df = pd.DataFrame({x_name: labels,\n", " 'Citations': citationCounts,\n", " 'Views': viewCounts,\n", " 'Downloads': downloadCounts})\n", "fig = px_stacked_bar(df.set_index(x_name), y_name = \"Counts\")\n", "\n", "# Set plot background to transparent\n", "fig.update_layout({\n", "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n", "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n", "})\n", "\n", "# Write interactive plot out to html file\n", "pio.write_html(fig, file='doi_out.html')\n", "\n", "# Display plot from the saved html file\n", "display(Markdown(\"Citations, views and downloads counts for individual outputs of [University of Oxford](https://ror.org/052gg0110), shown as stacked bar plot.
The plot shows DOIs of top 30 outputs by the combined citations, views and downloads count.\"))\n", "IFrame(src=\"./doi_out.html\", width=1000, height=800)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display a word cloud of output titles.\n", "Display a word cloud of words from output titles in which word size is determined by the aggregated citations, views and downloads count corresponding to all output titles in which it appears." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from wordcloud import WordCloud, STOPWORDS \n", "import matplotlib.pyplot as plt \n", "import pandas as pd\n", "import numpy as np\n", "import re\n", "\n", "stopWords = set(STOPWORDS)\n", "stopWords.update(['_','data','from','of','in','case','study'])\n", "\n", "organization = data['organization']\n", "organizationName = organization['name']\n", "\n", "titleWords=[]\n", "for metricCount in ['citationCount', 'viewCount', 'downloadCount']:\n", " for node in organization['works']['nodes']:\n", " for title in node['titles']:\n", " tokens = [t.lower() for t in re.split(' |:', str(title['title'])) if t.lower() not in stopWords] \n", " for i in range(node[metricCount]):\n", " titleWords += tokens\n", " \n", "x, y = np.ogrid[:800, :800]\n", "mask = (x - 400) ** 2 + (y - 400) ** 2 > 345 ** 2\n", "mask = 255 * mask.astype(int)\n", " \n", "wordcloud = WordCloud(width = 600, height = 600, \n", " background_color ='white', \n", " stopwords = stopWords, \n", " min_font_size = 10, \n", " prefer_horizontal = 0.95,\n", " mask = mask).generate(\" \".join(titleWords))\n", " \n", "fig, ax = plt.subplots(1, 1, figsize = (10, 10), facecolor = None)\n", "ax.set_title(\"Word cloud of titles of up to %d outputs of %s,\\nbased on their corresponding combined citations, views and downloads count.\" % (query_params['maxOutputs'], organizationName))\n", "plt.imshow(wordcloud, interpolation=\"bilinear\") \n", "plt.axis(\"off\") \n", "plt.tight_layout(pad = 0)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot via [Vega Editor](https://vega.github.io/editor) an interactive matrix diagram of output authors' affiliations\n", "Generate data in the format that you can use in [Vega Editor](https://vega.github.io/editor) to plot an interactive matrix diagram of [University of Oxford](https://ror.org/052gg0110)'s outputs author affiliations. In this diagram:\n", "- Affiliations are values on X and Y axes, and \n", "- The rectangular cells in the matrix indicate that authors from the two respective affiliations shared at least one publication together.\n", "- Each region, from the _affiliation to region_ mapping above, is shown by different colour: \n", " - **brown** cell colour indicates that the corresponding affiliations are **not in the same geographic* region**; \n", " - **any other** cell colour indicates that the corresponding affiliations are **in the same geographic region**. \n", "
\n", "\n", "*For affiliation to geographic region mapping see below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "from IPython.display import FileLink, FileLinks\n", "\n", "# Map affiliations of authors of University of Oxford publications to regions\n", "af2Loc = {\n", "\"University of Oxford\" : \"UK\",\n", "\"University of Warwick\" : \"UK\",\n", "\"University of Idaho\" : \"North America\",\n", "\"University of Zurich\" : \"Europe\",\n", "\"University of Aberdeen\" : \"UK\",\n", "\"University of Sheffield\" : \"UK\",\n", "\"University of Bergen\" : \"Europe\",\n", "\"University of Tokyo\" : \"Asia\",\n", "\"University of Arizona\" : \"North America\",\n", "\"University of Connecticut\" : \"North America\",\n", "\"University of Queensland\" : \"Australia/New Zealand\",\n", "\"University of Southern Denmark\" : \"Europe\",\n", "\"University College London\" : \"UK\",\n", "\"University of Toronto\" : \"North America\",\n", "\"University of Washington\" : \"North America\",\n", "\"University of Amsterdam\" : \"Europe\",\n", "\"University of Edinburgh\" : \"UK\",\n", "\"University of California System\" : \"North America\",\n", "\"University of Lincoln\" : \"UK\",\n", "\"University of Vermont\" : \"North America\",\n", "\"University of Western Australia\" : \"Australia/New Zealand\",\n", "\"University of Helsinki\" : \"Europe\",\n", "\"University of Bordeaux\" : \"Europe\",\n", "\"University of Freiburg\" : \"Europe\",\n", "\"University of Liège\" : \"Europe\",\n", "\"University of Maryland, College Park\" : \"North America\",\n", "\"University of Stirling\" : \"UK\",\n", "\"University of Montpellier\" : \"Europe\",\n", "\"University of Louisville\" : \"North America\",\n", "\"University College Cork\" : \"Europe\",\n", "\"University of Auckland\" : \"Australia/New Zealand\",\n", "\"University of Exeter\" : \"UK\",\n", "\"University of Minnesota\" : \"North America\",\n", "\"University of Birmingham\" : \"UK\",\n", "\"University of Bristol\" : \"UK\",\n", "\"University of Córdoba\" : \"Europe\",\n", "\"University of Extremadura\" : \"Europe\",\n", "\"University of Lausanne\" : \"Europe\",\n", "\"University of Otago\" : \"Australia/New Zealand\",\n", "\"University of Paris-Sud\" : \"Europe\",\n", "\"University of Cape Town\" : \"Africa\",\n", "\"University of Groningen\" : \"Europe\",\n", "\"University of Konstanz\" : \"Europe\",\n", "\"University of Cambridge\" : \"UK\",\n", "\"University of Oslo\" : \"Europe\",\n", "\"University of the French West Indies and Guiana\" : \"Latin America\",\n", "\"University of California, Davis\" : \"North America\",\n", "\"University of Bath\" : \"Europe\",\n", "\"University of Montreal\" : \"North America\",\n", "\"University of the West of England\" : \"UK\",\n", "\"University of Aveiro\" : \"Europe\",\n", "\"University of Lisbon\" : \"Europe\",\n", "\"University of Leicester\" : \"UK\",\n", "\"University of Florida\" : \"North America\",\n", "\"University of South Florida\" : \"North America\",\n", "\"University of California, Irvine\" : \"North America\",\n", "\"University of Gothenburg\" : \"Europe\",\n", "\"University of Sussex\" : \"UK\",\n", "\"University of Bern\" : \"Europe\",\n", "\"University of Manitoba\" : \"North America\",\n", "\"University of Southern California\" : \"North America\",\n", "\"University of Technology Sydney\" : \"Australia/New Zealand\",\n", "\"University of Southampton\" : \"UK\",\n", "\"University of KwaZulu-Natal\" : \"Africa\",\n", "\"Columbia University\": \"North America\",\n", "\"Federal University of Rio Grande do Sul\": \"Latin America\",\n", "\"Massey University\": \"Australia/New Zealand\",\n", "\"Statens Serum Institut\": \"Europe\",\n", "\"Swansea University\": \"UK\",\n", "\"United States Department of Agriculture\": \"North America\",\n", "\"National Museum of Nature and Science\": \"Asia\",\n", "\"Natural History Museum and Institute\": \"Asia\",\n", "\"Tohoku University\": \"Asia\",\n", "\"Stony Brook University\": \"North America\",\n", "\"Harvard University\": \"North America\",\n", "\"Max Planck Institute for Demographic Research\": \"Europe\",\n", "\"Princeton University\": \"North America\",\n", "\"Radboud University Nijmegen\": \"Europe\",\n", "\"Smithsonian Environmental Research Center\": \"North America\",\n", "\"Stockholm University\": \"Europe\",\n", "\"Institute of Vertebrate Paleontology and Paleoanthropology\": \"Asia\",\n", "\"Royal Ontario Museum\": \"North America\",\n", "\"Smithsonian Institution\": \"North America\",\n", "\"Uppsala University\": \"Europe\",\n", "\"Bond University\": \"Australia/New Zealand\",\n", "\"Aarhus University\": \"Europe\",\n", "\"Boston Children's Hospital\": \"North America\",\n", "\"Boston University\": \"North America\",\n", "\"Children's Hospital\": \"North America\",\n", "\"American Museum of Natural History\": \"North America\",\n", "\"Swarthmore College\": \"UK\",\n", "\"Duquesne University\": \"North America\",\n", "\"East China Normal University\": \"Asia\",\n", "\"US Forest Service\": \"North America\",\n", "\"Centre for Research on Ecology and Forestry Applications\": \"Europe\",\n", "\"Swedish University of Agricultural Sciences\": \"Europe\",\n", "\"Technical University Munich\": \"Europe\",\n", "\"Institute of Cancer Research\": \"UK\",\n", "\"Federal University of Lavraxs\": \"Latin America\",\n", "\"Lancaster University\": \"UK\",\n", "\"State University of Campinas\": \"Latin America\",\n", "\"Council for Scientific and Industrial Research\": \"Africa\",\n", "\"Florida International University\": \"North America\",\n", "\"French National Institute for Agricultural Research\": \"Europe\",\n", "\"German Center for Integrative Biodiversity Research\": \"Europe\",\n", "\"Kyoto University\": \"Asia\",\n", "\"Royal Holloway University of London\": \"UK\",\n", "\"Smithsonian Tropical Research Institute\": \"North America\",\n", "\"Wageningen University & Research\": \"Europe\",\n", "\"Zoological Society of London\": \"UK\",\n", "\"Emory University\": \"North America\",\n", "\"McGill University\": \"North America\",\n", "\"McGill University Health Centre\": \"North America\",\n", "\"New York University\": \"North America\",\n", "\"New York University School of Medicine\": \"North America\",\n", "\"National Museum\": \"Unknown\",\n", "\"Nederlands Instituut voor Ecologie\": \"Europe\",\n", "\"Macquarie University\": \"Australia/New Zealand\",\n", "\"Australian National University\": \"Australia/New Zealand\",\n", "\"Bielefeld University\": \"Europe\",\n", "\"British Antarctic Survey\": \"UK\",\n", "\"Centre d'Ecologie Fonctionnelle et Evolutive\": \"Europe\",\n", "\"Eötvös Loránd University\": \"Europe\",\n", "\"Institute of Avian Research\": \"Europe\",\n", "\"UNSW Australia\": \"Australia/New Zealand\",\n", "\"Stellenbosch University\": \"Africa\",\n", "\"Laboratoire de Neurosciences Cognitives\": \"Europe\",\n", "\"Yale University\": \"North America\",\n", "\"Chinese Academy of Sciences\": \"Asia\",\n", "\"Department of Earth Sciences\": \"UK\",\n", "\"Imperial College London\": \"UK\",\n", "\"Aalto University\": \"Europe\",\n", "\"Institute of Theoretical Physics\": \"Unknown\",\n", "\"Mahidol University\": \"Asia\",\n", "\"Royal Institute of Technology\": \"UK\",\n", "\"Vanderbilt University\": \"Europe\",\n", "\"Wellcome Trust\": \"UK\",\n", "\"Max Planck Institute for Ornithology\": \"Europe\",\n", "\"Santa Fe Institute\": \"North America\",\n", "\"Lund University\": \"Europe\",\n", "\"Cardiff University\": \"UK\",\n", "\"Manchester Metropolitan University\": \"UK\",\n", "\"Griffith University\": \"Australia/New Zealand\",\n", "\"National Museums Scotland\": \"UK\",\n", "\"Oregon State University\": \"North America\",\n", "\"Rocky Mountain Biological Laboratory\": \"North America\",\n", "\"Federal University of Alagoas\": \"Latin America\",\n", "\"City, University of London\": \"UK\",\n", "\"Commonwealth Scientific and Industrial Research Organisation\": \"Australia/New Zealand\",\n", "\"National Autonomous University of Mexico\": \"Latin America\",\n", "\"The Open University\": \"UK\",\n", "\"Western Sydney University\": \"Australia/New Zealand\",\n", "\"Forest Research\": \"UK\",\n", "\"European Molecular Biology Laboratory\": \"Europe\",\n", "\"Johns Hopkins University\": \"North America\",\n", "\"National Institute of Allergy and Infectious Diseases\": \"North America\",\n", "\"Rakai Health Sciences Program\": \"Africa\",\n", "\"Federal University of Lavras\": \"Latin America\"\n", "}\n", "\n", "# Map regions from the above mapping to ids of groups that will be shown in different colours the matrix diagram \n", "loc2Group = {\n", " \"Africa\": 1,\n", " \"Asia\": 2,\n", " \"Australia/New Zealand\": 3,\n", " \"Europe\": 4,\n", " \"North America\": 5,\n", " \"UK\": 6,\n", " \"Latin America\": 7,\n", " \"Unknown\": 8\n", "}\n", "\n", "# Initialise intermediate data structure to store: (srcAf, trgAf) -> number of shared publications \n", "srcAfTrgAf2Count = {}\n", "# Initialise intermediate data structure to store: af --> Set of connected affs\n", "# Note that the number of connected affs will determine the colour of each affiliation node\n", "af2OtherAfs = {}\n", "organization = data['organization']\n", "organizationName = organization['name']\n", "\n", "# Populate srcAfTrgAf2Count\n", "allAffs = set()\n", "for node in organization['works']['nodes']:\n", " affSet = set()\n", " for creator in node['creators']:\n", " for affiliation in creator['affiliation']:\n", " af = affiliation['name']\n", " affSet.add(af)\n", " affs = sorted(list(affSet))\n", " allAffs.update(affs)\n", " for af in affs:\n", " for af1 in affs:\n", " if af1 != af:\n", " if af < af1:\n", " tuple = (af, af1)\n", " else: \n", " tuple = (af1, af)\n", " if af not in af2OtherAfs:\n", " af2OtherAfs[af] = set()\n", " af2OtherAfs[af].add(af1)\n", " if af1 not in af2OtherAfs:\n", " af2OtherAfs[af1] = set()\n", " af2OtherAfs[af1].add(af) \n", " \n", " if tuple not in srcAfTrgAf2Count:\n", " srcAfTrgAf2Count[tuple] = 0\n", " else:\n", " srcAfTrgAf2Count[tuple] += 1 \n", "\n", "# Populate data structures needed for the matrix diagram visualisation \n", "idx = 0\n", "af2idx = {} \n", "nodes, links = [], []\n", "for tuple in srcAfTrgAf2Count:\n", " if srcAfTrgAf2Count[tuple] > 0:\n", " srcAf = tuple[0]\n", " trgAf = tuple[1]\n", " for af in [srcAf, trgAf]:\n", " if af not in af2idx:\n", " af2idx[af] = idx\n", " if af in af2Loc:\n", " loc = af2Loc[af]\n", " else:\n", " loc = 'Unknown'\n", " grp = loc2Group[loc]\n", " nodes.append({\"name\": af, \"group\": grp, \"index\": idx})\n", " idx += 1\n", " links.append({\"source\": af2idx[srcAf], \"target\": af2idx[trgAf], \"value\": srcAfTrgAf2Count[tuple]})\n", "\n", "for template_file in ['vega_by_group.json', 'vega_by_index.json']:\n", " with open(template_file,'r') as vega_template:\n", " content = eval(vega_template.read())\n", " for datum in content['data']:\n", " if datum[\"name\"] == \"nodes\":\n", " datum[\"values\"][\"nodes\"] = nodes\n", " elif datum[\"name\"] == \"edges\":\n", " datum[\"values\"][\"links\"] = links\n", " with open(template_file.replace('.json','.txt'), 'w') as f:\n", " json.dump(content, f)\n", " \n", "display(Markdown(\" \\\n", "In order to display the matrix diagram of [University of Oxford](https://ror.org/052gg0110)'s outputs' author affiliations, \\\n", "please do the following: \\\n", "
- Open [Vega Editor](https://vega.github.io/editor/#/custom/vega) in a separate tab or window; \\\n", "
- Click on: [vega_by_group.txt](vega_by_group.txt) or [vega_by_index.txt](vega_by_index.txt), depending on which matrix you wish to display; \\\n", "
- Copy the content of the file you selected; \\\n", "
- Paste it (overwriting the default text) into the left-hand side of the editor, as shown below:\\\n", "
\\\n", "

On the right-hand side you will see the matrix diagram, in which affiliations are values on X and Y axes, and the rectangle \\\n", "cells in the matrix indicate that authors from the two respective affiliations shared at least one publication together. \\\n", "

Each region, from the _affiliation to region_ mapping above, is shown by different colour: \\\n", "
- **brown** cell colour indicates that the corresponding affiliations are **not in the same geographic region**; \\\n", "
- **any other** cell colour indicates that the corresponding affiliations are **in the same geographic region**. \\\n", "

The matrix diagram files and the example images of the corresponding matrix diagrams for University of Oxford outputs are shown below: \\\n", "\"))\n", "display(Markdown(\"* [vega_by_group.txt](vega_by_group.txt) - a matrix diagram* in which publications from authors with affiliations in the same region are clustered together:
**Click [here](vega_by_group.svg) to see the diagram below in SVG format*
\"))\n", "display(Markdown(\"* [vega_by_index.txt](vega_by_index.txt) - a matrix diagram* in which publications are clustered together irrespective of the author affiliations' regions:
**Click [here](vega_by_index.svg) to see the diagram below in SVG format*
\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 4 }