{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![FREYA Logo](https://github.com/datacite/pidgraph-notebooks-python/blob/master/images/freya_200x121.png?raw=true) | [FREYA](https://www.project-freya.eu/en) | WP2 [User Story 6](https://www.pidforum.org/t/pid-graph-graphql-example-disambiguate-researchers/931): As a researcher, I am looking for more information about another researcher with a common name, but don’t know his/her ORCID ID.\n", ":------------- | :------------- | :-------------\n", "\n", "It is important to be able to locate a researcher of interest even though their ORCID ID is unknown. For example, a reader of a scientific publication may wish to find out more about one of the authors, whereby the publisher has not cross-referenced that author's name to ORCID.
\n", "\n", "This notebook uses the [DataCite GraphQL API](https://api.datacite.org/graphql) to disambiguate a researcher name via a *funnel* approach:\n", " * First all researcher records matching query \"John AND Smith\" and retrieved, and an alphabetically sorted list of affiliations and the corresponding researcher names is displayed;\n", " * Then the notebook simulates the user selecting one of the affiliations (in our case \"University of Arizona\"), and then performs a more detailed query: \"John AND Smith AND University of Arizona\". The second query retrieves and displays a much smaller set of results, now also containing the researcher's publications, thus helping the user pinpoint the researcher of interest more easily.\n", "\n", "**Goal**: By the end of this notebook, you should be able successfully disambiguate a researcher name of interest." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install libraries and prepare GraphQL client" ] }, { "cell_type": "code", "execution_count": 228, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Install required Python packages\n", "!pip install gql requests" ] }, { "cell_type": "code", "execution_count": 229, "metadata": {}, "outputs": [], "source": [ "# Prepare the GraphQL client\n", "import requests\n", "from IPython.display import display, Markdown\n", "from gql import gql, Client\n", "from gql.transport.requests import RequestsHTTPTransport\n", "\n", "_transport = RequestsHTTPTransport(\n", " url='https://api.datacite.org/graphql',\n", " use_json=True,\n", ")\n", "\n", "client = Client(\n", " transport=_transport,\n", " fetch_schema_from_transport=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define and run GraphQL query\n", "Define the GraphQL query to find all publications including co-authors for [Dr Sarah Teichmann](https://orcid.org/0000-0002-6294-6366):" ] }, { "cell_type": "code", "execution_count": 231, "metadata": {}, "outputs": [], "source": [ "# Generate the GraphQL query to retrieve up to 100 researchers matching query \"John and Smith\"\n", "query_params = {\n", " \"query\" : \"John AND Smith\",\n", " \"max_researchers\" : 100,\n", " \"query_end_cursor\" : \"\"\n", "}\n", "\n", "query_str = \"\"\"query getResearchersByName(\n", " $query: String!,\n", " $max_researchers: Int!,\n", " $query_end_cursor : String!\n", " )\n", "{\n", " people(query: $query, first: $max_researchers, after: $query_end_cursor) {\n", " totalCount\n", " pageInfo {\n", " hasNextPage\n", " endCursor\n", " } \n", " nodes {\n", " id\n", " givenName\n", " familyName\n", " name\n", " affiliation {\n", " name\n", " }\n", " }\n", " }\n", "}\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the above query via the GraphQL client" ] }, { "cell_type": "code", "execution_count": 232, "metadata": {}, "outputs": [], "source": [ "import json\n", "found_next_page = True\n", "\n", "# Initialise overall data dict that will store results\n", "data = {}\n", "\n", "# Keep retrieving results until there are no more results left\n", "while True:\n", " query = gql(\"%s\" % query_str)\n", " res = client.execute(query, variable_values=json.dumps(query_params))\n", " if \"people\" not in data:\n", " data = res\n", " else:\n", " people = res[\"people\"]\n", " data[\"people\"][\"nodes\"].extend(people[\"nodes\"])\n", " pageInfo = people[\"pageInfo\"]\n", " if pageInfo[\"hasNextPage\"]:\n", " if pageInfo[\"endCursor\"] is not None:\n", " query_params[\"query_end_cursor\"] = pageInfo[\"endCursor\"] \n", " else:\n", " break\n", " else:\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List researcher details\n", "List in tabular format affilitions and the corresponding researcher names. This allows the user to select one of the affiliations to use in a more detailed query (see below) that also retrieves publications." ] }, { "cell_type": "code", "execution_count": 234, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Total number of researchers found: **210**