{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ " ![FREYA Logo](https://github.com/datacite/pidgraph-notebooks-python/blob/master/images/freya_200x121.png?raw=true) | [FREYA](https://www.project-freya.eu/en) WP2 [User Story 2](https://github.com/datacite/freya/issues/63) | As a software author, I want to be able to see the citations of my software aggregated across all versions, so that I see a complete picture of reuse.\n", " :------------- | :------------- | :-------------\n", "\n", "Software development process involves versioned releases. Consequently, different software versions may be used for scientific discovery and thus referenced in publications. In order to quantify impact of a software, its author must be able to capture the reuse of the software across all its versions.

\n", "This notebook uses the [DataCite GraphQL API](https://api.datacite.org/graphql) to retrieve metadata about software titled: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488), including all its versions, so that its overall reuse can be quantified.\n", "\n", "**Goal**: By the end of this notebook, for a given software you should be able to display:\n", "- Counts of citations, views and downloads metrics, aggregated across all versions of the software\n", "- An interactive stacked bar plot showing how the metric counts of each version contribute to the corresponding aggregated metric counts, e.g.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install libraries and prepare GraphQL client" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Install required Python packages\n", "!pip install gql requests numpy plotly" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Prepare the GraphQL client\n", "import requests\n", "from IPython.display import display, Markdown\n", "from gql import gql, Client\n", "from gql.transport.requests import RequestsHTTPTransport\n", "\n", "_transport = RequestsHTTPTransport(\n", " url='https://api.datacite.org/graphql',\n", " use_json=True,\n", ")\n", "\n", "client = Client(\n", " transport=_transport,\n", " fetch_schema_from_transport=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define and run GraphQL query\n", "Define the GraphQL query to retrieve metadata for the software titled: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) using its DOI." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Generate the GraphQL query to retrieve the required software's metadata\n", "query_params = {\n", " \"softwareId\" : \"https://doi.org/10.5281/zenodo.2799488\"\n", "}\n", "\n", "query = gql(\"\"\"query getSoftware($softwareId: ID!)\n", "{\n", " software(id: $softwareId) {\n", " id\n", " titles {\n", " title\n", " }\n", " publicationYear\n", " citations {\n", " nodes {\n", " id\n", " titles {\n", " title\n", " } \n", " }\n", " }\n", " version\n", " versionCount\n", " versionOfCount\n", " citationCount\n", " downloadCount\n", " viewCount\n", " versions {\n", " nodes {\n", " id\n", " version\n", " publicationYear\n", " titles {\n", " title\n", " }\n", " citations {\n", " nodes {\n", " id\n", " titles {\n", " title\n", " }\n", " }\n", " }\n", " version\n", " versionCount\n", " versionOfCount\n", " citationCount\n", " downloadCount\n", " viewCount\n", " }\n", " }\n", " }\n", "}\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the above query via the GraphQL client" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import json\n", "data = client.execute(query, variable_values=json.dumps(query_params))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display total software metrics\n", "Display total number of citations, views and downloads across all versions of software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Aggregated metric counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions:" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "|Metric | Aggregated Count|\n", "|---|---|\n", "citationCount | **3**\n", "viewCount | **0**\n", "downloadCount | **0**\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get the total count per metric, aggregated for across all versions of the software\n", "software = data['software']\n", "# Initialise metric counts\n", "metricCounts = {}\n", "for metric in ['citationCount', 'viewCount', 'downloadCount']:\n", " metricCounts[metric] = 0\n", " \n", "# Aggregate metric counts across all the version\n", "for node in software['versions']['nodes']:\n", " for metric in metricCounts:\n", " metricCounts[metric] += node[metric]\n", " \n", "# Display the aggregated metric counts\n", "tableBody=\"\"\n", "for metric in metricCounts: \n", " tableBody += \"%s | **%s**\\n\" % (metric, str(metricCounts[metric]))\n", "if tableBody:\n", " display(Markdown(\"Aggregated metric counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions:\"))\n", " display(Markdown(\"|Metric | Aggregated Count|\\n|---|---|\\n%s\" % tableBody))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot metric counts per software version\n", "Plot stacked bar plot showing how the individual versions of software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) contribute their metric counts to the corresponding aggregated total." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Citations, views and downloads counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions, shown as stacked bar plot:" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import plotly.io as pio\n", "import plotly.express as px\n", "from IPython.display import IFrame\n", "import pandas as pd\n", "\n", "# Adapted from: https://stackoverflow.com/questions/58766305/is-there-any-way-to-implement-stacked-or-grouped-bar-charts-in-plotly-express\n", "def px_stacked_bar(df, color_name='Metric', y_name='Metrics', **pxargs):\n", " idx_col = df.index.name\n", " m = pd.melt(df.reset_index(), id_vars=idx_col, var_name=color_name, value_name=y_name)\n", " # For Plotly colour sequences see: https://plotly.com/python/discrete-color/ \n", " return px.bar(m, x=idx_col, y=y_name, color=color_name, **pxargs, \n", " color_discrete_sequence=px.colors.qualitative.Pastel1)\n", "\n", "# Collect metric counts\n", "software = data['software']\n", "version = software['version']\n", "\n", "# Initialise dicts for the stacked bar plot\n", "labels = {0: 'All Software Versions'}\n", "citationCounts = {}\n", "viewCounts = {}\n", "downloadCounts = {}\n", "\n", "# Collect software/version labels\n", "versionCnt = 1\n", "for node in software['versions']['nodes']:\n", " version = software['version']\n", " labels[versionCnt] = '%s (%s)' % (version, node['publicationYear'])\n", " versionCnt += 1\n", " \n", "# Initialise aggregated metric counts (key: 0)\n", "citationCounts[0] = 0\n", "viewCounts[0] = 0\n", "downloadCounts[0] = 0\n", " \n", "# Populate metric counts for individual versions (key: versionCnt) and add them to the aggregated counts (key: 0)\n", "versionCnt = 1\n", "for node in software['versions']['nodes']:\n", " citationCounts[0] += node['citationCount']\n", " viewCounts[0] += node['viewCount']\n", " downloadCounts[0] += node['downloadCount']\n", " citationCounts[versionCnt] = node['citationCount']\n", " viewCounts[versionCnt] = node['viewCount']\n", " downloadCounts[versionCnt] = node['downloadCount']\n", " versionCnt += 1\n", "\n", "# Create stacked bar plot\n", "df = pd.DataFrame({'Software/Versions': labels,\n", " 'Citations': citationCounts,\n", " 'Views': viewCounts,\n", " 'Downloads': downloadCounts})\n", "fig = px_stacked_bar(df.set_index('Software/Versions'), y_name = \"Counts\")\n", "\n", "# Set plot background to transparent\n", "fig.update_layout({\n", "'plot_bgcolor': 'rgba(0, 0, 0, 0)',\n", "'paper_bgcolor': 'rgba(0, 0, 0, 0)'\n", "})\n", "\n", "# Write interactive plot out to html file\n", "pio.write_html(fig, file='out.html')\n", "\n", "# Display plot from the saved html file\n", "display(Markdown(\"Citations, views and downloads counts for software: [Calculation Package: Inverting topography for landscape evolution model process representation](https://doi.org/10.5281/zenodo.2799488) across all its versions, shown as stacked bar plot:\"))\n", "IFrame(src=\"./out.html\", width=500, height=500)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 4 }