{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ " ![FREYA Logo](https://github.com/datacite/pidgraph-notebooks-python/blob/master/images/freya_200x121.png?raw=true) | [FREYA](https://www.project-freya.eu/en) WP2 [User Story 9](https://github.com/datacite/freya/issues/26) | As a bibliometrician, I want to know all the co-authors of a particular researcher, so that I can do a network analysis of the researcher's collaborations.\n", " :------------- | :------------- | :-------------\n", "\n", "A number of useful analyses are made possible by identifying co-authorship groups of a given researcher, for example identifying other active scientists in the researcher's field of study, or groups of closely collaborating (and often co-funded) author affiliations.

\n", "This notebook uses the [DataCite GraphQL API](https://api.datacite.org/graphql) to retrieve all publications of [Dr Sarah Teichmann](https://orcid.org/0000-0002-6294-6366).\n", "\n", "**Goal**: By the end of this notebook, for a researcher of interest, you should be able to:\n", "- Display an interactive sankey plot of the researcher's publication co-authors, e.g.
\n", "- Download a file containing their publication DOIs;\n", "- Load the above file into [VOSviewer](https://www.vosviewer.com/) and then construct and visualise the researcher's co-authorship network, following the steps listed in the notebook, e.g.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install libraries and prepare GraphQL client" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Install required Python packages\n", "!pip install gql requests numpy plotly" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Prepare the GraphQL client\n", "import requests\n", "from IPython.display import display, Markdown\n", "from gql import gql, Client\n", "from gql.transport.requests import RequestsHTTPTransport\n", "\n", "_transport = RequestsHTTPTransport(\n", " url='https://api.datacite.org/graphql',\n", " use_json=True,\n", ")\n", "\n", "client = Client(\n", " transport=_transport,\n", " fetch_schema_from_transport=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define and run GraphQL query\n", "Define the GraphQL query to find all publications including co-authors for [Dr Sarah Teichmann](https://orcid.org/0000-0002-6294-6366):" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Generate the GraphQL query: find all publications, including co-authors or researcher id: \"https://orcid.org/0000-0002-6294-6366\"\n", "query_params = {\n", " \"researcherId\" : \"https://orcid.org/0000-0002-6294-6366\",\n", " \"maxWorks\" : 300\n", "}\n", "\n", "query = gql(\"\"\"query getResearcherPublication($researcherId: ID!, $maxWorks: Int!)\n", "{\n", " person(id: $researcherId) {\n", " id\n", " name\n", " publications(first:$maxWorks) {\n", " totalCount\n", " published {\n", " title\n", " count\n", " }\n", " nodes {\n", " id\n", " type\n", " versionOfCount\n", " titles {\n", " title\n", " }\n", " creators {\n", " id\n", " name\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the above query via the GraphQL client" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import json\n", "data = client.execute(query, variable_values=json.dumps(query_params))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display total number of publications by the researcher\n", "Display the total number of the researcher's outputs to date." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "89" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get the total number of publication to date\n", "publications = data['person']['publications']\n", "display(Markdown(str(publications['totalCount'])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot the researcher's publications co-authors\n", "Display a sankey plot of the co-authors sharing **at least two** publications with the researcher, highlighting them by frequency of co-authorship." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "### [Teichmann, Sarah](https://orcid.org/0000-0002-6294-6366)'s first degree co-authors:" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "import plotly.express as px\n", "from IPython.display import IFrame\n", "\n", "# Retrieve creator names and ORCID ids from all publications\n", "all_creator_ids = []\n", "all_creator_ids_set = set([])\n", "creator_id2name = {}\n", "publications = data['person']['publications']\n", "for r in publications['nodes']:\n", " if r['versionOfCount'] > 0:\n", " # If the current output is a version of another one, exclude it\n", " continue\n", " creator_ids = list(filter(None, [s['id'] for s in r['creators']]))\n", " all_creator_ids_set.update(creator_ids)\n", " all_creator_ids.append(creator_ids)\n", " for creator in r['creators']:\n", " if (creator['id'] not in creator_id2name and creator['id'] is not None):\n", " creator_id2name[creator['id']] = creator['name']\n", " \n", "# Collect creator names into all_unique_creator_names - these will be labels in the sankey plot\n", "# Initialise coauthorship_matrix, that will be used to populate lists needed for the sankey plot\n", "all_unique_creator_ids = list(all_creator_ids_set)\n", "length = len(all_unique_creator_ids)\n", "coauthorship_matrix = []\n", "all_unique_creator_names = []\n", "for id in all_unique_creator_ids:\n", " all_unique_creator_names.append(creator_id2name[id])\n", " coauthorship_matrix.append([0] * length)\n", " \n", "# Populate coauthorship_matrix\n", "for cids in all_creator_ids:\n", " for cid in cids:\n", " c_pos = all_unique_creator_ids.index(cid)\n", " for cid in cids:\n", " co_pos = all_unique_creator_ids.index(cid)\n", " if c_pos != co_pos:\n", " coauthorship_matrix[c_pos][co_pos] += 1\n", " \n", "# Use coauthorship_matrix to populate lists needed for the sankey diagram: sourceIndexes, targetIndexes and linkWeights\n", "# For Plotly colour swatches, see: https://plotly.com/python/builtin-colorscales/\n", "colRange = px.colors.sequential.matter;\n", "maxColIndex = len(colRange)\n", "sourceIndexes = []\n", "targetIndexes = []\n", "linkWeights = []\n", "linkColours = []\n", "for c_pos, r in enumerate(coauthorship_matrix):\n", " # On the left hand side of sankey retain only the researcher in question\n", " if all_unique_creator_ids[c_pos] != query_params['researcherId']:\n", " continue\n", " for co_pos, weight in enumerate(r):\n", " if coauthorship_matrix[c_pos][co_pos] > 1:\n", " # Include links to co-authors of at least 2 publications \n", " sourceIndexes.append(c_pos)\n", " targetIndexes.append(co_pos)\n", " linkWeights.append(weight)\n", " linkColours.append(colRange[min(maxColIndex, weight)])\n", "\n", "# Create a sankey plot \n", "fig = go.Figure(data=[go.Sankey(\n", " node = dict(\n", " pad = 15,\n", " thickness = 20,\n", " line = dict(color = \"black\", width = 0.5),\n", " label = all_unique_creator_names,\n", " color = \"rgba(136,65,157, 0.6)\"\n", " ),\n", " link = dict(\n", " source = sourceIndexes, # indices correspond to labels in all_unique_creator_names\n", " target = targetIndexes, # ditto\n", " value = linkWeights,\n", " color = linkColours\n", " ))])\n", "\n", "fig.update_layout(title_text=\"\", font_size=10)\n", "# Write interactive plot out to html file\n", "pio.write_html(fig, file='out.html')\n", "\n", "# Display plot from the saved html file\n", "display(Markdown(\"### [%s](%s)'s first degree co-authors:\" % (creator_id2name[query_params['researcherId']], query_params['researcherId'])))\n", "IFrame(src=\"./out.html\", width=1000, height=800)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download a file containing publication DOIs\n", "This file can be loaded into [VOSviewer](https://www.vosviewer.com/) tool in order to construct and visualise the researcher's co-authorship network, using the following steps (see the image below):\n", "1. Select *File* tab on the right, then click on **Create** button\n", "2. In the *Choose type of data* window, select **Create a map based on biobliographic data**\n", "3. In the *Choose data source* window, select **Download data through API**\n", "4. In the *Specify search query or select file* select **DOI** tab, then *API*: **Crossref**, then in the *DOI files* text box type in or select the path to the file of DOIs you downloaded.\n", "6. Click on **Finish** button to construct and display the network. \n", "![VOSviewer Steps](VOSviewer_steps.png)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "\n", "var csv = 'https://doi.org/10.17863/cam.6611\\nhttps://doi.org/10.17863/cam.8651\\nhttps://doi.org/10.17863/cam.9293\\nhttps://doi.org/10.17863/cam.9696\\nhttps://doi.org/10.17863/cam.20316\\nhttps://doi.org/10.17863/cam.26822\\nhttps://doi.org/10.17863/cam.27576\\nhttps://doi.org/10.17863/cam.32384\\nhttps://doi.org/10.17863/cam.33936\\nhttps://doi.org/10.17863/cam.37473\\nhttps://doi.org/10.17863/cam.39438\\nhttps://doi.org/10.17863/cam.39791\\nhttps://doi.org/10.17863/cam.39961\\nhttps://doi.org/10.17863/cam.39963\\nhttps://doi.org/10.17863/cam.39964\\nhttps://doi.org/10.17863/cam.39965\\nhttps://doi.org/10.17863/cam.39966\\nhttps://doi.org/10.17863/cam.39967\\nhttps://doi.org/10.17863/cam.39969\\nhttps://doi.org/10.17863/cam.39971\\nhttps://doi.org/10.17863/cam.39972\\nhttps://doi.org/10.17863/cam.39984\\nhttps://doi.org/10.17863/cam.39985\\nhttps://doi.org/10.17863/cam.39986\\nhttps://doi.org/10.17863/cam.39987\\nhttps://doi.org/10.17863/cam.40022\\nhttps://doi.org/10.17863/cam.40032\\nhttps://doi.org/10.17863/cam.40023\\nhttps://doi.org/10.17863/cam.40025\\nhttps://doi.org/10.17863/cam.40026\\nhttps://doi.org/10.17863/cam.40027\\nhttps://doi.org/10.17863/cam.40029\\nhttps://doi.org/10.17863/cam.40039\\nhttps://doi.org/10.17863/cam.40030\\nhttps://doi.org/10.17863/cam.40034\\nhttps://doi.org/10.17863/cam.40036\\nhttps://doi.org/10.17863/cam.40037\\nhttps://doi.org/10.17863/cam.40038\\nhttps://doi.org/10.17863/cam.40040\\nhttps://doi.org/10.17863/cam.40041\\nhttps://doi.org/10.17863/cam.40043\\nhttps://doi.org/10.17863/cam.40045\\nhttps://doi.org/10.17863/cam.40046\\nhttps://doi.org/10.17863/cam.40047\\nhttps://doi.org/10.17863/cam.40048\\nhttps://doi.org/10.17863/cam.40044\\nhttps://doi.org/10.17863/cam.40049\\nhttps://doi.org/10.17863/cam.40050\\nhttps://doi.org/10.17863/cam.40051\\nhttps://doi.org/10.17863/cam.44557\\nhttps://doi.org/10.17863/cam.44717\\nhttps://doi.org/10.1038/s41591-019-0468-5\\nhttps://doi.org/10.17863/cam.46893\\nhttps://doi.org/10.17863/cam.47089\\nhttps://doi.org/10.17863/cam.47221\\nhttps://doi.org/10.17863/cam.47223\\nhttps://doi.org/10.17863/cam.47246\\nhttps://doi.org/10.17863/cam.47621\\nhttps://doi.org/10.17863/cam.48303\\nhttps://doi.org/10.1038/s41592-019-0692-4\\nhttps://doi.org/10.1038/s41467-019-14171-5\\nhttps://doi.org/10.1101/2020.01.28.911115\\nhttps://doi.org/10.1101/2019.12.12.871657\\nhttps://doi.org/10.1126/science.aat1699\\nhttps://doi.org/10.1038/s41467-018-07307-6\\nhttps://doi.org/10.1126/science.aan6828\\nhttps://doi.org/10.1101/709998\\nhttps://doi.org/10.1038/s41556-019-0333-2\\nhttps://doi.org/10.1101/413047\\nhttps://doi.org/10.1126/science.aat5031\\nhttps://doi.org/10.1038/s41467-019-11266-x\\nhttps://doi.org/10.1101/338178\\nhttps://doi.org/10.1038/s41467-018-07771-0\\nhttps://doi.org/10.1101/309831\\nhttps://doi.org/10.1101/397042\\nhttps://doi.org/10.26508/lsa.201800124\\nhttps://doi.org/10.1038/s41592-018-0254-1\\nhttps://doi.org/10.1038/s41592-018-0082-3\\nhttps://doi.org/10.1126/sciimmunol.aal2192\\nhttps://doi.org/10.1242/dev.152561\\nhttps://doi.org/10.1126/science.aah4115\\nhttps://doi.org/10.17863/cam.50163\\nhttps://doi.org/10.17863/cam.50702\\nhttps://doi.org/10.1101/gr.207704.116\\nhttps://doi.org/10.1038/s41590-020-0602-z\\nhttps://doi.org/10.1126/science.aay3224\\nhttps://doi.org/10.17863/cam.52009\\nhttps://doi.org/10.17863/cam.52896\\nhttps://doi.org/10.17863/cam.53893\\n';\n", "\n", "var filename = '0000-0002-6294-6366_dois.csv';\n", "var blob = new Blob([csv], { type: 'application/x-bibtex;charset=utf-8;' });\n", "if (navigator.msSaveBlob) { // IE 10+\n", " navigator.msSaveBlob(blob, filename);\n", "} else {\n", " var link = document.createElement(\"a\");\n", " if (link.download !== undefined) { // feature detection\n", " // Browsers that support HTML5 download attribute\n", " var url = URL.createObjectURL(blob);\n", " link.setAttribute(\"href\", url);\n", " link.setAttribute(\"download\", filename);\n", " link.style.visibility = 'hidden';\n", " document.body.appendChild(link);\n", " link.click();\n", " document.body.removeChild(link);\n", " }\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "from IPython.display import Javascript\n", "from requests.utils import requote_uri\n", "\n", "# Collect publication DOIs so that it can be downloaded\n", "dois = []\n", "publications = data['person']['publications']\n", "for n in publications['nodes']:\n", " if n['versionOfCount'] > 0:\n", " # If the current output is a version of another one, exclude it\n", " continue\n", " dois.append(n['id'])\n", "df = pd.DataFrame(dois, columns = None)\n", "file_name = \"%s_dois.csv\" % query_params['researcherId'].split(\"/\")[-1]\n", "\n", "js_download = \"\"\"\n", "var csv = '%s';\n", "\n", "var filename = '%s';\n", "var blob = new Blob([csv], { type: 'application/x-bibtex;charset=utf-8;' });\n", "if (navigator.msSaveBlob) { // IE 10+\n", " navigator.msSaveBlob(blob, filename);\n", "} else {\n", " var link = document.createElement(\"a\");\n", " if (link.download !== undefined) { // feature detection\n", " // Browsers that support HTML5 download attribute\n", " var url = URL.createObjectURL(blob);\n", " link.setAttribute(\"href\", url);\n", " link.setAttribute(\"download\", filename);\n", " link.style.visibility = 'hidden';\n", " document.body.appendChild(link);\n", " link.click();\n", " document.body.removeChild(link);\n", " }\n", "}\n", "\"\"\" % (df.to_csv(index=False, header=False).replace('\\n','\\\\n').replace(\"\\'\",\"\\\\'\").replace(\"\\\"\",\"\").replace(\"\\r\",\"\"), file_name)\n", " \n", "display(Javascript(js_download))\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "## [Dr Sarah Teichmann](https://orcid.org/0000-0002-6294-6366)'s co-authorship network as shown in VOSviewer\n", "Interestingly, the network (excluding publications with author lists longer than 25) shows clusters with at least three versions of the researcher's author name:\n", "- Teichmann Sarah A.\n", "- Teichmann Sarah A\n", "- Teichmann Sarah\n", "![VOSviewer Network](VOSviewer_network.png)\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# This section contains an example of co-authorship network for Dr Sarah Teichmann's publications - hence the conditional logic below\n", "if query_params['researcherId'] == \"https://orcid.org/0000-0002-6294-6366\":\n", " display(Markdown(\"\"\"\n", "## [Dr Sarah Teichmann](https://orcid.org/0000-0002-6294-6366)'s co-authorship network as shown in VOSviewer\n", "Interestingly, the network (excluding publications with author lists longer than 25) shows clusters with at least three versions of the researcher's author name:\n", "- Teichmann Sarah A.\n", "- Teichmann Sarah A\n", "- Teichmann Sarah\n", "![VOSviewer Network](VOSviewer_network.png)\n", "\"\"\"))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 4 }