{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "etxiXTW668ZD"
   },
   "source": [
    "### Query the FREYA PID Graph for people affiliated with an organization\n",
    "\n",
    "This notebook queries the [FREYA PID Graph](https://blog.datacite.org/powering-the-pid-graph/) via [Datacite's GraphQL API](https://api.datacite.org/graphql) to retrieve people affiliated with an organization. It takes a ROR URL as input which is used internally to retrieve the Grid ID from the ROR API and Ringgold ID from Wikidata and use these identifiers to [find ORCID record holders at the institution](https://info.orcid.org/faq/how-do-i-find-orcid-record-holders-at-my-institution/). From the resulting list of people we output the ORCID iDs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "8Mk7-aYc7x3A",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Prerequisites:\n",
    "import requests                        # dependency to make HTTP calls\n",
    "from benedict import benedict          # dependency for dealing with json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "J31_ejB6bWqd"
   },
   "source": [
    "The input for this notebook is a ROR URL, e.g. '`https://ror.org/021k10z87`'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "executionInfo": {
     "elapsed": 15,
     "status": "ok",
     "timestamp": 1643208788232,
     "user": {
      "displayName": "",
      "photoUrl": "",
      "userId": ""
     },
     "user_tz": -60
    },
    "id": "UwYUsbnMbZnI"
   },
   "outputs": [],
   "source": [
    "# input parameter\n",
    "example_ror=\"https://ror.org/021k10z87\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ba_A3Anpbl4P"
   },
   "source": [
    "We use it to query Datacite's GraphQL API for the organization's metadata and all people connected to it.\n",
    "Since the API uses pagination, we need to loop through all pages to get the complete result set.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "executionInfo": {
     "elapsed": 226,
     "status": "ok",
     "timestamp": 1643208819281,
     "user": {
      "displayName": "",
      "photoUrl": "",
      "userId": ""
     },
     "user_tz": -60
    },
    "id": "7FAu2l388OeD"
   },
   "outputs": [],
   "source": [
    "# Datacite's GraphQL endpoint for the FREYA PID Graph\n",
    "DATACITE_GRAPHQL_API = \"https://api.datacite.org/graphql\"\n",
    "\n",
    "# Query to retrieve an organization and all its affiliated people\n",
    "QUERY_ORGA2PEOPLE = \"\"\"query organization($ror :ID!, $after:String){\n",
    "organization(id: $ror) {\n",
    "    people(first: 1000, after: $after) {\n",
    "      totalCount\n",
    "      pageInfo {\n",
    "        endCursor\n",
    "        hasNextPage\n",
    "        }\n",
    "\n",
    "      nodes {\n",
    "        id\n",
    "        name\n",
    "      }\n",
    "    }\n",
    "  }\n",
    "}\"\"\"\n",
    "\n",
    "# query the freya pid graph for all people connected to given ROR\n",
    "def query_freya_for_orga2people(ror):\n",
    "    continue_paginating = True\n",
    "    cursor=\"\"\n",
    "    \n",
    "    while continue_paginating:\n",
    "        vars = {'ror': ror, 'after': cursor}\n",
    "        response = requests.post(url=DATACITE_GRAPHQL_API,\n",
    "                                 json={'query': QUERY_ORGA2PEOPLE, 'variables': vars},\n",
    "                                 headers={'Accept': 'application/json'})\n",
    "        response.raise_for_status()\n",
    "        result=response.json()\n",
    "        if 'errors' in result:\n",
    "            raise requests.exceptions.HTTPError(result)\n",
    "\n",
    "        # check if next page exists and set cursor to next page\n",
    "        continue_paginating = has_next_page(result)\n",
    "        cursor = next_cursor(result)\n",
    "        yield result\n",
    "\n",
    "# check if there is another page with results to query\n",
    "def has_next_page(response_data):\n",
    "    resp_dict = benedict.from_json(response_data)\n",
    "    has_next_page = resp_dict.get(\"data.organization.people.pageInfo.hasNextPage\")\n",
    "    return has_next_page\n",
    "\n",
    "# set cursor to next value\n",
    "def next_cursor(response_data):\n",
    "    resp_dict = benedict.from_json(response_data)\n",
    "    cursor = resp_dict.get(\"data.organization.people.pageInfo.endCursor\")\n",
    "    return cursor\n",
    "\n",
    "\n",
    "#--- example execution\n",
    "list_of_pages=query_freya_for_orga2people(example_ror)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2lR-J8vUcI5-"
   },
   "source": [
    "From the returned pages we extract the list of people and for each person we extract and print out their name and ORCID iD."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "executionInfo": {
     "elapsed": 261,
     "status": "ok",
     "timestamp": 1643208827139,
     "user": {
      "displayName": "",
      "photoUrl": "",
      "userId": ""
     },
     "user_tz": -60
    },
    "id": "lQqnqydz2hUh"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0000-0002-3783-6130, Irene Weipert-Fenner\n",
      "0000-0002-5452-0488, Hans-Joachim Spanger\n",
      "0000-0002-4621-9687, Simone Schnabel\n",
      "0000-0001-6731-5304, Julia Eckert\n",
      "0000-0001-6746-1248, Anton Peez\n",
      "0000-0003-1575-9688, Hendrik Simon\n",
      "0000-0002-1712-2624, Julian Junk\n",
      "0000-0003-0035-5840, Raphael Oidtmann\n",
      "0000-0002-5925-043X, Ariadne Natal\n",
      "0000-0002-7012-6739, Peter Kreuzer\n",
      "0000-0001-7843-4480, Dirk Peters\n",
      "0000-0001-6823-6819, Janna Lisa Chalmovsky\n",
      "0000-0003-1940-8877, Mikhail Polianskii\n",
      "0000-0001-7302-444X, Katja Freistein\n",
      "0000-0002-8739-2486, Elvira Rosert\n",
      "0000-0001-7286-3575, Paul Chambers\n",
      "0000-0003-0039-9827, Eldad Ben Aharon\n",
      "0000-0002-4259-6071, Felix S. Bethke\n"
     ]
    }
   ],
   "source": [
    "# from the result pages we get from the GraphQL API, extract the data about the people\n",
    "def extract_people_from_page(page):\n",
    "    page_dict=benedict.from_json(page)\n",
    "    return [person for person in page_dict.get('data.organization.people.nodes') or []]\n",
    "\n",
    "# extract ORCID from person\n",
    "def extract_orcid(person):\n",
    "    person_dict = benedict.from_json(person)\n",
    "    orcid = person_dict.get('id').replace(\"https://orcid.org/\", \"\")\n",
    "    name = person_dict.get('name')\n",
    "    return orcid, name\n",
    "\n",
    "\n",
    "#--- example execution\n",
    "for page in list_of_pages or []:\n",
    "    people=extract_people_from_page(page)\n",
    "    for person in people or []:\n",
    "        orcid, name = extract_orcid(person)\n",
    "        print(f\"{orcid}, {name}\")"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "authorship_tag": "ABX9TyOPyixqZithrfY0TncA4o1K",
   "name": "freya_get_people_by_organization.ipynb",
   "provenance": [
    {
     "file_id": "https://github.com/Project-TAPIR/pidgraph-notebooks/blob/organization-people/organization-people/freya_get_people_by_organization.ipynb",
     "timestamp": 1643208926409
    }
   ]
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}