### Query the FREYA PID Graph for works authored by a person

This notebook queries the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) via [Datacite's GraphQL API](https://api.datacite.org/graphql) to retrieve works created by a person. It takes an ORCID URL as input which is used to filter for all works registered at Datacite and some registered at Crossref where '`creator.nameIdentifier`' matches the given ORCID URL. From the resulting list of works we output all DOIs.

In [1]:
# Prerequisites:
import requests # dependency to make HTTP calls
from benedict import benedict # dependency for dealing with json

The input for this notebook is an ORCID URL, e.g. '`https://orcid.org/0000-0003-2499-7741`'.

In [2]:
# input parameter
example_orcid="https://orcid.org/0000-0003-2499-7741"

We use it to query Datacite's GraphQL API for the person's metadata and all works connected to them. Since the API uses pagination, we need to loop through all pages to get the complete result set.

In [3]:
# Datacite's GraphQL endpoint for the FREYA PID Graph
DATACITE_GRAPHQL_API = "https://api.datacite.org/graphql"

# GraphQL query to retrieve a person and all their created works
QUERY_PERSON2WORKS = """query person($orcid :ID!, $after:String){
 person(id: $orcid) {
 works(first:1000, after: $after) {
 pageInfo {
 endCursor
 hasNextPage
 }

 nodes {
 doi
 titles {
 title
 }
 versions {
 nodes {
 doi
 }
 }
 }
 }
 }
}"""

# query for all works connected to given ORCID
def query_freya_for_person2works(orcid):
 continue_paginating = True
 cursor=""
 
 while continue_paginating:
 vars = {'orcid': orcid, 'after': cursor}
 response = requests.post(url=DATACITE_GRAPHQL_API,
 json={'query': QUERY_PERSON2WORKS, 'variables': vars},
 headers={'Accept': 'application/json'})
 response.raise_for_status()
 result=response.json()
 if 'errors' in result:
 raise requests.exceptions.HTTPError(result)

 # check if next page exists and set cursor to next page
 cursor = next_cursor(result)
 continue_paginating = has_next_page(result)
 yield result

# check if there is another page with results to query
def has_next_page(response_data):
 resp_dict = benedict.from_json(response_data)
 has_next_page = resp_dict.get("data.person.works.pageInfo.hasNextPage")
 return has_next_page

# set cursor to next value
def next_cursor(response_data):
 resp_dict = benedict.from_json(response_data)
 cursor = resp_dict.get("data.person.works.pageInfo.endCursor")
 return cursor


#--- example execution
list_of_pages=query_freya_for_person2works(example_orcid)

From the returned pages we 
* extract the list of works,
* remove the ones that are older versions of another work, which is the case if the metadata field for '`versions.nodes.doi`' contains a DOI for the successing work,
* extract and print out the title and DOI of each work.

*Note: 
While we are able to filter some versions of a work if they are linked via the metadata field '`versions.nodes.doi`', others would need advanced filters (for example based on name similarity) which is out of scope for our project.*

In [4]:
# from the result pages we get from the GraphQL API, extract the data about the works
def extract_works_from_page(page):
 page_dict=benedict.from_json(page)
 return [work for work in page_dict.get('data.person.works.nodes') or []]

# remove old versions from the list of works
def filter_older_versions(works):
 return [work for work in works if not benedict.from_json(work).get('versions.nodes[0].doi')]

# extract DOI from work
def extract_doi(work):
 work_dict = benedict.from_json(work)
 doi = work_dict.get('doi')
 title = work_dict.get('titles[0].title')
 return doi, title


#--- example execution
for page in list_of_pages or []:
 works=extract_works_from_page(page)
 print(f"Complete number of works: {len(works)}")
 works_filtered=filter_older_versions(works)
 print(f"Filtered number of works: {len(works_filtered)}")
 for work in works_filtered or []:
 doi, title = extract_doi(work)
 print(f"{doi}, {title}")

Complete number of works: 105
Filtered number of works: 91
10.6084/m9.figshare.647329, Members of Deutscher Bibliotheksverband e. V. (dbv)
10.2314/coscv1.1, Literatur recherchieren und verwalten
10.2314/coscv1.2, Organisieren
10.2314/coscv1.3, Daten sammeln und verarbeiten
10.2314/coscv1, CoScience - Gemeinsam forschen und publizieren mit dem Netz
10.2314/coscv2, CoScience - Gemeinsam forschen und publizieren mit dem Netz
10.2314/coscv2.2, Oganisieren
10.2314/coscv2.3, Daten sammeln und verarbeiten
10.6084/m9.figshare.647329.v1, Members of Deutscher Bibliotheksverband e. V. (dbv)
10.6084/m9.figshare.5271943, TIB-FIS-Discovery - VIVO at the German National Library of Science and Technology (TIB)
10.6084/m9.figshare.5271943.v1, TIB-FIS-DISCOVERY VIVO AT THE GERMAN NATIONAL LIBRARY OF SCIENCE AND TECHNOLOGY (TIB)
10.6084/m9.figshare.5271943.v2, TIB-FIS-Discovery - VIVO at the German National Library of Science and Technology (TIB)
10.6084/m9.figshare.5285743, Lost in translation – challen