{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "is_executing": true
    }
   },
   "source": [
    "### Query OpenAIRE for the project(s) a publication was produced in\n",
    "This notebook queries the [OpenAIRE HTTP API](https://graph.openaire.eu/develop/api.html) for the project(s) a publication was produced in. It takes a DOI as input which is used to retrieve the publication's metadata via the API's `/publications` endpoint and checks if there is a `'isProducedBy'` relation to a project. If that is the case, the project's ID is used to query the API via its `/projects` endpoint and the title, call identifier and funded amount of the project are printed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prerequisites:\n",
    "import requests                    # dependency for making HTTP calls\n",
    "from benedict import benedict      # dependency for dealing with json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The input for this notebook is a DOI, e.g. '`10.1007/978-3-030-74296-6_19`'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# input parameter\n",
    "example_doi=\"10.1007/978-3-030-74296-6_19\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use it to query the OpenAIRE HTTP API for the specified publication and its metadata. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# OpenAIRE endpoint to query for publications\n",
    "OPENAIRE_API_PUBLICATIONS = \"https://api.openaire.eu/search/publications\"\n",
    "\n",
    "# query OpenAIRE for a specific publication\n",
    "def query_openaire_for_publication(doi):\n",
    "    params = {'doi': doi, 'format': \"json\"}\n",
    "    response = requests.get(url=OPENAIRE_API_PUBLICATIONS,\n",
    "                            params=params)\n",
    "    response.raise_for_status()\n",
    "    result=response.json()\n",
    "    return result\n",
    "\n",
    "\n",
    "# ---- example execution\n",
    "pub_response=query_openaire_for_publication(example_doi)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the complete response we get from the API, we extract the metadata for the specified publication.\n",
    "If the metadata contains a reference to a project within the list of relations (`'rels'`), then extract the project's ID."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['corda__h2020::c6af905285a4bcd97a2fdf7cadc3cf3a']\n"
     ]
    }
   ],
   "source": [
    "# extract the metadata about the publication from the response\n",
    "path_to_result='response.results.result[0].metadata.oaf:entity.oaf:result'\n",
    "oaf_result=benedict.from_json(pub_response).get(path_to_result, {})\n",
    "\n",
    "# extract the metadata about relations\n",
    "# and check for each rel, if it is pointing to a project\n",
    "rels=oaf_result.get('rels.rel') or []\n",
    "is_rel_to_project = lambda rel: rel['to']['@class']==\"isProducedBy\" and rel['to']['@type']==\"project\"\n",
    "\n",
    "# unfortunately the json data is inconsistently modeled:\n",
    "# if there is one rel for a publication, it is a json object\n",
    "# if there are multiple rels for a publication, they form a json list\n",
    "if isinstance(rels, list):\n",
    "    project_ids=[rel['to']['$'] for rel in rels if is_rel_to_project(rel)]\n",
    "else:\n",
    "    project_ids= [rels['to']['$']] if is_rel_to_project(rels) else []\n",
    "\n",
    "print(project_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For each project ID, we query the OpenAIRE HTTP API via its `/projects` endpoint for the project's metadata."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# OpenAIRE endpoint to query for projects\n",
    "OPENAIRE_API_PROJECTS = \"https://api.openaire.eu/search/projects\"\n",
    "\n",
    "# query OpenAIRE for a specific project\n",
    "def query_openaire_for_project(openaire_project_id):\n",
    "    params = {'openaireProjectID': openaire_project_id, 'format': \"json\"}\n",
    "    response = requests.get(url=OPENAIRE_API_PROJECTS,\n",
    "                            params=params)\n",
    "    response.raise_for_status()\n",
    "    result=response.json()\n",
    "    return result\n",
    "\n",
    "\n",
    "# ---- example execution\n",
    "project_responses=[query_openaire_for_project(project_id) for project_id in project_ids]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's extract and print each project's title, code, call identifier and funded amount."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Project data:\n",
      " code: 819536\n",
      " title: Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communication\n",
      " callidentifier: ERC-2018-COG\n",
      " fundedamount:1996250.0 EUR\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def extract_data_from_project(project_response):\n",
    "    path_to_project='response.results.result[0].metadata.oaf:entity.oaf:project'\n",
    "    oaf_project=benedict.from_json(project_response).get(path_to_project, {})\n",
    "    \n",
    "    title=oaf_project.get('title.$')\n",
    "    code=oaf_project.get('code.$')\n",
    "    callidentifier=oaf_project.get('callidentifier.$')\n",
    "    fundedamount=oaf_project.get('fundedamount.$')\n",
    "    currency=oaf_project.get('currency.$')\n",
    "    return title, code, callidentifier, f\"{fundedamount} {currency}\"\n",
    "\n",
    "\n",
    "# ---- example execution\n",
    "if (not project_responses):\n",
    "    print(\"No projects associated with publication\")\n",
    "for project in project_responses:\n",
    "    title, code, callidentifier, fundedamount = extract_data_from_project(project)\n",
    "    print(\"Project data:\")\n",
    "    print(f\" code: {code}\\n title: {title}\\n callidentifier: {callidentifier}\\n fundedamount:{fundedamount}\\n\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}