{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "YDW3V56iL0gO" }, "source": [ "### Query the FREYA PID Graph for works authored by a person\n", "\n", "This notebook queries the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) via [Datacite's GraphQL API](https://api.datacite.org/graphql) to retrieve works created by a person. It takes an ORCID URL as input which is used to filter for all works registered at Datacite and some registered at Crossref where '`creator.nameIdentifier`' matches the given ORCID URL. From the resulting list of works we output all DOIs." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "CCRX2-tC_ijb", "scrolled": true }, "outputs": [], "source": [ "# Prerequisites:\n", "import requests # dependency to make HTTP calls\n", "from benedict import benedict # dependency for dealing with json" ] }, { "cell_type": "markdown", "metadata": { "id": "9J5pNonyPXzZ" }, "source": [ "The input for this notebook is an ORCID URL, e.g. '`https://orcid.org/0000-0003-2499-7741`'." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1643038887890, "user": { "displayName": "Sandra M", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64", "userId": "04602594913862593282" }, "user_tz": -60 }, "id": "HKtYXm9XQiGB" }, "outputs": [], "source": [ "# input parameter\n", "example_orcid=\"https://orcid.org/0000-0003-2499-7741\"" ] }, { "cell_type": "markdown", "metadata": { "id": "s00nilOOQs5m" }, "source": [ "We use it to query Datacite's GraphQL API for the person's metadata and all works connected to them. Since the API uses pagination, we need to loop through all pages to get the complete result set." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1643038887890, "user": { "displayName": "Sandra M", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64", "userId": "04602594913862593282" }, "user_tz": -60 }, "id": "V0PWy15LPfZ_" }, "outputs": [], "source": [ "# Datacite's GraphQL endpoint for the FREYA PID Graph\n", "DATACITE_GRAPHQL_API = \"https://api.datacite.org/graphql\"\n", "\n", "# GraphQL query to retrieve a person and all their created works\n", "QUERY_PERSON2WORKS = \"\"\"query person($orcid :ID!, $after:String){\n", " person(id: $orcid) {\n", " works(first:1000, after: $after) {\n", " pageInfo {\n", " endCursor\n", " hasNextPage\n", " }\n", "\n", " nodes {\n", " doi\n", " titles {\n", " title\n", " }\n", " versions {\n", " nodes {\n", " doi\n", " }\n", " }\n", " }\n", " }\n", " }\n", "}\"\"\"\n", "\n", "# query for all works connected to given ORCID\n", "def query_freya_for_person2works(orcid):\n", " continue_paginating = True\n", " cursor=\"\"\n", " \n", " while continue_paginating:\n", " vars = {'orcid': orcid, 'after': cursor}\n", " response = requests.post(url=DATACITE_GRAPHQL_API,\n", " json={'query': QUERY_PERSON2WORKS, 'variables': vars},\n", " headers={'Accept': 'application/json'})\n", " response.raise_for_status()\n", " result=response.json()\n", " if 'errors' in result:\n", " raise requests.exceptions.HTTPError(result)\n", "\n", " # check if next page exists and set cursor to next page\n", " cursor = next_cursor(result)\n", " continue_paginating = has_next_page(result)\n", " yield result\n", "\n", "# check if there is another page with results to query\n", "def has_next_page(response_data):\n", " resp_dict = benedict.from_json(response_data)\n", " has_next_page = resp_dict.get(\"data.person.works.pageInfo.hasNextPage\")\n", " return has_next_page\n", "\n", "# set cursor to next value\n", "def next_cursor(response_data):\n", " resp_dict = benedict.from_json(response_data)\n", " cursor = resp_dict.get(\"data.person.works.pageInfo.endCursor\")\n", " return cursor\n", "\n", "\n", "#--- example execution\n", "list_of_pages=query_freya_for_person2works(example_orcid)" ] }, { "cell_type": "markdown", "metadata": { "id": "wmJZvW41Te2e" }, "source": [ "From the returned pages we \n", "* extract the list of works,\n", "* remove the ones that are older versions of another work, which is the case if the metadata field for '`versions.nodes.doi`' contains a DOI for the successing work,\n", "* extract and print out the title and DOI of each work.\n", "\n", "*Note: \n", "While we are able to filter some versions of a work if they are linked via the metadata field '`versions.nodes.doi`', others would need advanced filters (for example based on name similarity) which is out of scope for our project.*" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "executionInfo": { "elapsed": 8, "status": "ok", "timestamp": 1643038887891, "user": { "displayName": "Sandra M", "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GjjehryRcYlqHNFf_9Q6slGN_VZPE0y5QkvOxzG=s64", "userId": "04602594913862593282" }, "user_tz": -60 }, "id": "oNoGVZpbt5cP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Complete number of works: 105\n", "Filtered number of works: 91\n", "10.6084/m9.figshare.647329, Members of Deutscher Bibliotheksverband e. V. (dbv)\n", "10.2314/coscv1.1, Literatur recherchieren und verwalten\n", "10.2314/coscv1.2, Organisieren\n", "10.2314/coscv1.3, Daten sammeln und verarbeiten\n", "10.2314/coscv1, CoScience - Gemeinsam forschen und publizieren mit dem Netz\n", "10.2314/coscv2, CoScience - Gemeinsam forschen und publizieren mit dem Netz\n", "10.2314/coscv2.2, Oganisieren\n", "10.2314/coscv2.3, Daten sammeln und verarbeiten\n", "10.6084/m9.figshare.647329.v1, Members of Deutscher Bibliotheksverband e. V. (dbv)\n", "10.6084/m9.figshare.5271943, TIB-FIS-Discovery - VIVO at the German National Library of Science and Technology (TIB)\n", "10.6084/m9.figshare.5271943.v1, TIB-FIS-DISCOVERY VIVO AT THE GERMAN NATIONAL LIBRARY OF SCIENCE AND TECHNOLOGY (TIB)\n", "10.6084/m9.figshare.5271943.v2, TIB-FIS-Discovery - VIVO at the German National Library of Science and Technology (TIB)\n", "10.6084/m9.figshare.5285743, Lost in translation – challenges of tailoring VIVO to the needs of the German scholarly landscape\n", "10.6084/m9.figshare.5285743.v1, Lost in translation – challenges of tailoring VIVO to the needs of the German scholarly landscape\n", "10.6084/m9.figshare.6465149, User perceptions, feedback, and stories.pdf\n", "10.6084/m9.figshare.6465149.v1, User perceptions, feedback, and stories.pdf\n", "10.5281/zenodo.1287885, Supporting A Vivo Regional Community\n", "10.6084/m9.figshare.6819971, Information Security Challenges in VIVO - Adapting the BSI IT Security Catalog Standards\n", "10.6084/m9.figshare.6819971.v1, VIVO_Conference_2018_BSI-IT-Security.pdf\n", "10.6084/m9.figshare.6820217, Challenges and opportunities of using VIVO as a reporting tool\n", "10.6084/m9.figshare.6820217.v1, Challenges and opportunities of using VIVO as a reporting tool\n", "10.6084/m9.figshare.6819971.v2, Information Security Challenges in VIVO - Adapting the BSI IT Security Catalog Standards\n", "10.6084/m9.figshare.6819722.v2, VIVO-DE: Collaborative ontology editing & management with VoCol\n", "10.5281/zenodo.1445521, Vivo 2018 - Ein Rückblick\n", "10.5281/zenodo.1464108, Vivo - Eine Einführung\n", "10.5281/zenodo.1464897, Reporting / Kdsf Und Vivo\n", "10.5281/zenodo.1478571, Forschungsevaluation Und Visualisierung Von Zitationsnetzwerken\n", "10.6084/m9.figshare.6819722, VIVO-DE: Collaborative ontology editing & management with VoCol\n", "10.6084/m9.figshare.6819722.v1, VIVO-DE: Collaborative ontology editing & management with VoCol\n", "10.5281/zenodo.1689951, Referenzimplementierung für offene szientometrische Indikatoren (ROSI)\n", "10.5281/zenodo.2613539, ROSI – Reference Implementation for Open Scientometric Indicators\n", "10.5281/zenodo.2615675, The ROSI Project\n", "10.5281/zenodo.2639714, VIVO software 1.10.0 release\n", "10.5281/zenodo.2639713, VIVO software 1.10.0 release\n", "10.5281/zenodo.3242680, ROSI – Open Metrics for Open Repositories\n", "10.5281/zenodo.3243485, Registry of [Open] Scientometric Data Sources – Technische Evaluierung von Offenen Datenquellen\n", "10.6084/m9.figshare.9756770, Identifying Ontological Domains Related to VIVO\n", "10.6084/m9.figshare.9756770.v1, Identifying Ontological Domains Related to VIVO\n", "10.6084/m9.figshare.9771701, VIVO Ontology Version 2\n", "10.6084/m9.figshare.9771701.v1, VIVO Ontology Version 2\n", "10.5281/zenodo.3407681, Visualising open scientometric data in VIVO\n", "10.5281/zenodo.3407680, Visualising open scientometric data in VIVO\n", "10.6084/m9.figshare.9897008, A Library of Queries and Reports: Introducing the Vitro Query Tool\n", "10.6084/m9.figshare.9897008.v1, A Library of Queries and Reports: Introducing the Vitro Query Tool\n", "10.5281/zenodo.3518713, confIDent - for FAIR conference metadata\n", "10.5281/zenodo.3518714, confIDent - for FAIR conference metadata\n", "10.5281/zenodo.3564456, Trends und Entwicklungen rund um VIVO\n", "10.5281/zenodo.3564457, Trends und Entwicklungen rund um VIVO\n", "10.21105/joss.01182, VIVO: a system for research discovery\n", "10.31263/voebm.v72i2.2808, Open Science und die Bibliothek – Aktionsfelder und Berufsbild\n", "10.3897/rio.4.e31656, Reference implementation for open scientometric indicators (ROSI)\n", "10.6084/m9.figshare.11961699, Ontological Domains for Representing Scholarship\n", "10.6084/m9.figshare.11961699.v1, Ontological Domains for Representing Scholarship\n", "10.5281/zenodo.3862804, Bibliometrische Visualisierungen auf dem Prüfstein – Versuch einer bibliothekarischen Perspektive\n", "10.5281/zenodo.3896517, AEON - Towards an Academic Events Ontology\n", "10.5281/zenodo.3900193, Research Profile Ownership through User Studies: A Case Study in the German National Research System\n", "10.25815/cex2-cc69, Virtual Conferences Require Dedicated Time, Too\n", "10.6084/m9.figshare.13645577.v1, VIVO Ontology Development: Why, What, and How\n", "10.6084/m9.figshare.13645577, VIVO Ontology Development: Why, What, and How\n", "10.25968/opus-837, Roving Librarians in der Zentralbibliothek der Hochschule Hannover: ein Experiment\n", "10.25968/opus-6, Libworld. Biblioblogs global\n", "10.25968/opus-1, Teaching Information Literacy with the Lerninformationssystem\n", "10.25968/opus-2, Personalisiertes Lernen in der Bibliothek: das Düsseldorfer Online-Tutorial (DOT) Informationskompetenz\n", "10.25968/opus-198, LibWorld - library blogging worldwide\n", "10.25968/opus-521, CoScience : gemeinsam forschen und publizieren mit dem Netz\n", "10.25968/opus-102, 13 Dinge\n", "10.25968/opus-330, Die Online-Auskunft der Universitäts- und Landesbibliothek Düsseldorf : erste Evaluation eines Erfolgsmodells\n", "10.25968/opus-272, Working Paper: Open Government Data\n", "10.25968/opus-303, Lernen 2.0 : Bericht aus der Praxis\n", "10.25968/opus-453, Empfehlungen zur Öffnung bibliothekarischer Daten\n", "10.25798/qsdh-en13, 12th Annual / International VIVO Conference 2021\n", "10.6084/m9.figshare.14842221, Geospatial information in VIVO -- Thoughts, Ideas, Suggestions\n", "10.6084/m9.figshare.14842221.v1, Geospatial information in VIVO -- Thoughts, Ideas, Suggestions\n", "10.5281/zenodo.5031664, Der Forschungsatlas - a community-oriented research profile system in the making\n", "10.5281/zenodo.5031663, Der Forschungsatlas - a community-oriented research profile system in the making\n", "10.5281/zenodo.5040057, Towards FAIR research information - insights from expert workshops\n", "10.5281/zenodo.5040056, Towards FAIR research information - insights from expert workshops\n", "10.5281/zenodo.5060204, Forschungsevaluierung FAIR(ER) gestalten\n", "10.5281/zenodo.5060203, Forschungsevaluierung FAIR(ER) gestalten\n", "10.5281/zenodo.5075098, Sharing Queries and Reports with the Reporting Marketplace\n", "10.5281/zenodo.5075108, Creating a Semantic Catalogue of Architectural Drawings\n", "10.5281/zenodo.5075107, Creating a Semantic Catalogue of Architectural Drawings\n", "10.5281/zenodo.5082237, Priorities of a VIVO Community - results of a survey at the 5. VIVO-Workshop 2021\n", "10.5281/zenodo.5082236, Priorities of a VIVO Community - results of a survey at the 5. VIVO-Workshop 2021\n", "10.5281/zenodo.5082370, Geospatial information in VIVO - thoughts, ideas, suggestions\n", "10.5281/zenodo.5082369, Geospatial information in VIVO - thoughts, ideas, suggestions\n", "10.5281/zenodo.5526786, Das Projekt OPTIMETA – Stärkung des Open-Access-Publikationssystems durch offene Zitationen und raumzeitliche Metadaten\n", "10.5281/zenodo.5526785, Das Projekt OPTIMETA – Stärkung des Open-Access-Publikationssystems durch offene Zitationen und raumzeitliche Metadaten\n", "10.5281/zenodo.5767060, How even small and independent journals can contribute to the citation commons\n", "10.5281/zenodo.5767059, How even small and independent journals can contribute to the citation commons\n", "10.25968/opus-2135, Are Conference Posters Being Cited?\n" ] } ], "source": [ "# from the result pages we get from the GraphQL API, extract the data about the works\n", "def extract_works_from_page(page):\n", " page_dict=benedict.from_json(page)\n", " return [work for work in page_dict.get('data.person.works.nodes') or []]\n", "\n", "# remove old versions from the list of works\n", "def filter_older_versions(works):\n", " return [work for work in works if not benedict.from_json(work).get('versions.nodes[0].doi')]\n", "\n", "# extract DOI from work\n", "def extract_doi(work):\n", " work_dict = benedict.from_json(work)\n", " doi = work_dict.get('doi')\n", " title = work_dict.get('titles[0].title')\n", " return doi, title\n", "\n", "\n", "#--- example execution\n", "for page in list_of_pages or []:\n", " works=extract_works_from_page(page)\n", " print(f\"Complete number of works: {len(works)}\")\n", " works_filtered=filter_older_versions(works)\n", " print(f\"Filtered number of works: {len(works_filtered)}\")\n", " for work in works_filtered or []:\n", " doi, title = extract_doi(work)\n", " print(f\"{doi}, {title}\")" ] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyPYZxjYgTxRPJcOH8TS6HMv", "name": "freya_get_works_by_person.ipynb", "provenance": [ { "file_id": "1msooXyRI-0s9kVYKOQqYXoUbPYoW3f4U", "timestamp": 1643040022714 } ] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 1 }