{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![FREYA Logo](https://github.com/datacite/pidgraph-notebooks-python/blob/master/images/freya_200x121.png?raw=true) | [FREYA](https://www.project-freya.eu/en) | WP2 [User Story 7]( https://www.pidforum.org/t/pid-graph-graphql-example-second-degree-citations/939): As a data center, I want to see the citations of publications that use my repository for the underlying data, so that I can demonstrate the impact of our repository.\n", ":------------- | :------------- | :-------------\n", "\n", "It is important for repositories of scientific data to monitor and report on the impact of the data they store. One useful proxy of that impact are secondary citations, i.e. citations of publications which use the deposited data. This notebook focuses on visualisation of these citations by means of a force-directed graph.
\n", "This notebook uses the [DataCite GraphQL API](https://api.datacite.org/graphql) to retrieve the citations of the following different datasets: \n", "- [Effects of varying food-availability on ecology and distribution of smallest benthic organisms in sediments of the arctic Fram Strait during POLARSTERN cruise ARK-XV/2, supplement to: Schewe, Ingo; Soltwedel, Thomas (2003): Benthic response to ice-edge-induced particle flux in the Arctic Ocean. Polar Biology, 26(9), 610-620](https://doi.org/10.1594/pangaea.314690);\n", "- [Data from: Towards a worldwide wood economics spectrum](https://doi.org/10.5061/dryad.234); and\n", "- [rmca-albertine-rift-cichlids](https://doi.org/10.15468/n6ftyd).\n", "\n", "**Goal**: By the end of this notebook, for a given list of datasets, you should be able to display:\n", "- Total citation count for each retrieved dataset;\n", "- An interactive force-directed graph of the datasets and their citations, in which:\n", " - Pink nodes at the centre of each radial shape corresponds to a dataset;\n", " - Blue nodes correspond to citations (note that some citations may be shared by more than one dataset);\n", " - Larger node size represents more citations of the dataset or citation represented by that node. Note that to increase node visibility, node sizes between datasets and citations are not comparable to each other.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install libraries and prepare GraphQL client" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Install required Python packages\n", "!pip install gql requests pyvis jsonpickle" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [], "source": [ "# Prepare the GraphQL client\n", "import requests\n", "from IPython.display import display, Markdown\n", "from gql import gql, Client\n", "from gql.transport.requests import RequestsHTTPTransport\n", "\n", "_transport = RequestsHTTPTransport(\n", " url='https://api.datacite.org/graphql',\n", " use_json=True,\n", ")\n", "\n", "client = Client(\n", " transport=_transport,\n", " fetch_schema_from_transport=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define and run GraphQL query\n", "Define the GraphQL query to find all publications including co-authors for [Dr Sarah Teichmann](https://orcid.org/0000-0002-6294-6366):" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [], "source": [ "# Generate the GraphQL query to retrieve up to 100 researchers matching query \"John and Smith\"\n", "query_params = {\n", " \"ids\" : [\"10.5061/dryad.234\",\"10.15468/n6ftyd\",\"10.1594/pangaea.314690\"]\n", "}\n", "\n", "query = gql(\"\"\"query getDatasetCitations($ids: [String!]) {\n", " datasets(ids: $ids) {\n", " nodes {\n", " id\n", " titles {\n", " title\n", " }\n", " citationCount\n", " citations {\n", " nodes {\n", " id\n", " publisher\n", " titles {\n", " title\n", " }\n", " citationCount\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the above query via the GraphQL client" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [], "source": [ "import json\n", "data = client.execute(query, variable_values=json.dumps(query_params))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display total number of citations per dataset" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "| Dataset | Citation Count|\n", "|---|---|\n", "[Effects of varying food-availability on ecology and distribution of smallest benthic organisms in sediments of the arctic Fram Strait during POLARSTERN cruise ARK-XV/2, supplement to: Schewe, Ingo; Soltwedel, Thomas (2003): Benthic response to ice-edge-induced particle flux in the Arctic Ocean. Polar Biology, 26(9), 610-620](https://doi.org/10.1594/pangaea.314690) | [**164**](https://search.datacite.org/works/10.1594/pangaea.314690)\n", "[Data from: Towards a worldwide wood economics spectrum](https://doi.org/10.5061/dryad.234) | [**4**](https://search.datacite.org/works/10.5061/dryad.234)\n", "[rmca-albertine-rift-cichlids](https://doi.org/10.15468/n6ftyd) | [**150**](https://search.datacite.org/works/10.15468/n6ftyd)\n" ], "text/plain": [ "