{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "pycharm": { "is_executing": true } }, "source": [ "### Query OpenAIRE for the project(s) a publication was produced in\n", "This notebook queries the [OpenAIRE HTTP API](https://graph.openaire.eu/develop/api.html) for the project(s) a publication was produced in. It takes a DOI as input which is used to retrieve the publication's metadata via the API's `/publications` endpoint and checks if there is a `'isProducedBy'` relation to a project. If that is the case, the project's ID is used to query the API via its `/projects` endpoint and the title, call identifier and funded amount of the project are printed." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Prerequisites:\n", "import requests # dependency for making HTTP calls\n", "from benedict import benedict # dependency for dealing with json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The input for this notebook is a DOI, e.g. '`10.1007/978-3-030-74296-6_19`'." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# input parameter\n", "example_doi=\"10.1007/978-3-030-74296-6_19\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use it to query the OpenAIRE HTTP API for the specified publication and its metadata. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# OpenAIRE endpoint to query for publications\n", "OPENAIRE_API_PUBLICATIONS = \"https://api.openaire.eu/search/publications\"\n", "\n", "# query OpenAIRE for a specific publication\n", "def query_openaire_for_publication(doi):\n", " params = {'doi': doi, 'format': \"json\"}\n", " response = requests.get(url=OPENAIRE_API_PUBLICATIONS,\n", " params=params)\n", " response.raise_for_status()\n", " result=response.json()\n", " return result\n", "\n", "\n", "# ---- example execution\n", "pub_response=query_openaire_for_publication(example_doi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the complete response we get from the API, we extract the metadata for the specified publication.\n", "If the metadata contains a reference to a project within the list of relations (`'rels'`), then extract the project's ID." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['corda__h2020::c6af905285a4bcd97a2fdf7cadc3cf3a']\n" ] } ], "source": [ "# extract the metadata about the publication from the response\n", "path_to_result='response.results.result[0].metadata.oaf:entity.oaf:result'\n", "oaf_result=benedict.from_json(pub_response).get(path_to_result, {})\n", "\n", "# extract the metadata about relations\n", "# and check for each rel, if it is pointing to a project\n", "rels=oaf_result.get('rels.rel') or []\n", "is_rel_to_project = lambda rel: rel['to']['@class']==\"isProducedBy\" and rel['to']['@type']==\"project\"\n", "\n", "# unfortunately the json data is inconsistently modeled:\n", "# if there is one rel for a publication, it is a json object\n", "# if there are multiple rels for a publication, they form a json list\n", "if isinstance(rels, list):\n", " project_ids=[rel['to']['$'] for rel in rels if is_rel_to_project(rel)]\n", "else:\n", " project_ids= [rels['to']['$']] if is_rel_to_project(rels) else []\n", "\n", "print(project_ids)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each project ID, we query the OpenAIRE HTTP API via its `/projects` endpoint for the project's metadata." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# OpenAIRE endpoint to query for projects\n", "OPENAIRE_API_PROJECTS = \"https://api.openaire.eu/search/projects\"\n", "\n", "# query OpenAIRE for a specific project\n", "def query_openaire_for_project(openaire_project_id):\n", " params = {'openaireProjectID': openaire_project_id, 'format': \"json\"}\n", " response = requests.get(url=OPENAIRE_API_PROJECTS,\n", " params=params)\n", " response.raise_for_status()\n", " result=response.json()\n", " return result\n", "\n", "\n", "# ---- example execution\n", "project_responses=[query_openaire_for_project(project_id) for project_id in project_ids]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's extract and print each project's title, code, call identifier and funded amount." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Project data:\n", " code: 819536\n", " title: Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communication\n", " callidentifier: ERC-2018-COG\n", " fundedamount:1996250.0 EUR\n", "\n" ] } ], "source": [ "def extract_data_from_project(project_response):\n", " path_to_project='response.results.result[0].metadata.oaf:entity.oaf:project'\n", " oaf_project=benedict.from_json(project_response).get(path_to_project, {})\n", " \n", " title=oaf_project.get('title.$')\n", " code=oaf_project.get('code.$')\n", " callidentifier=oaf_project.get('callidentifier.$')\n", " fundedamount=oaf_project.get('fundedamount.$')\n", " currency=oaf_project.get('currency.$')\n", " return title, code, callidentifier, f\"{fundedamount} {currency}\"\n", "\n", "\n", "# ---- example execution\n", "if (not project_responses):\n", " print(\"No projects associated with publication\")\n", "for project in project_responses:\n", " title, code, callidentifier, fundedamount = extract_data_from_project(project)\n", " print(\"Project data:\")\n", " print(f\" code: {code}\\n title: {title}\\n callidentifier: {callidentifier}\\n fundedamount:{fundedamount}\\n\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 1 }