{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5-final" }, "orig_nbformat": 2, "kernelspec": { "name": "python_defaultSpec_1599468226162", "display_name": "Python 3.8.5 64-bit" } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Actionable Data Management Plan connections\n", "\n", "Data management plans (DMPs) are documents accompanying research proposals and project outputs. DMPs are created as free-form text and describe the data and tools employed in scientific investigations. They are often seen as an administrative exercise and not as an integral part of research practice. Machine Actionable DMPs (maDMPs) takes this concept further by \n", "\n", "This notebook displays in a human-friendly way the connections embedded in a maDMP. By the end of this notebook, you will be able to succinctly display the essential components of the maDMP vision using persistent identifiers (PIDs): Open Researcher and Contributor IDs (ORCIDs), funders IDs, organizations Org IDs, Dataset IDs (DOIs)." ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "import json\n", "import pandas as pd\n", "import numpy as np\n", "from dfply import *\n", "import altair.vega.v5 as alt\n" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "# Prepare the GraphQL client\n", "import requests\n", "from IPython.display import display, Markdown\n", "from gql import gql, Client\n", "from gql.transport.requests import RequestsHTTPTransport\n", "\n", "_transport = RequestsHTTPTransport(\n", " url='https://api.datacite.org/graphql',\n", " use_json=True,\n", ")\n", "\n", "client = Client(\n", " transport=_transport,\n", " fetch_schema_from_transport=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetching Data\n", "\n", "We obtain all the data from the DataCite GraphQL API.\n" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "# Generate the GraphQL query to retrieve up to 100 outputs of University of Oxford, with at least 100 views each.\n", "query_params = {\n", " \"id\" : \"https://doi.org/10.5281/zenodo.3372460\",\n", " \"maxOutputs\": 100,\n", " \"minViews\" : 100\n", "}\n", "\n", "query = gql(\"\"\"query getOutputs($id: ID!)\n", "{\n", " dataManagementPlan(id: $id) {\n", " id\n", " titles {\n", " title\n", " }\n", " datasets: references(resourceTypeId:\"dataset\") {\n", " totalCount\n", " nodes {\n", " id: doi\n", " name: titles {\n", " title\n", " }\n", " }\n", " }\n", " organisations: contributors(contributorType: \"HostingInstitution\") {\n", " id\n", " name\n", " affiliation{\n", " id\n", " }\n", " } \n", " fundingReferences {\n", " id: funderIdentifier\n", " funderIdentifierType\n", " name: funderName\n", " }\n", " people: creators {\n", " id\n", " name\n", " affiliation{\n", " id\n", " }\n", " } \n", " }\n", "}\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [], "source": [ "def get_data():\n", " return client.execute(query, variable_values=json.dumps(query_params))[\"dataManagementPlan\"]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Transformation\n", "\n", "Simple transformations are performed to convert the graphql response into an array that can be take by Vega." ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "def add_node_attributes(dataframe, parent=2):\n", " \"\"\"Modifies each item to include attributes needed for the node visulisation\n", "\n", " Parameters:\n", " dataframe (dataframe): A dataframe with all the itemss\n", " parent (int): The id of the parent node\n", "\n", " Returns:\n", " dataframe:Returning vthe same dataframe with new attributes\n", "\n", " \"\"\"\n", " if (dataframe) is None:\n", " return pd.DataFrame() \n", " else: \n", " print(dataframe) \n", " return (dataframe >>\n", " mutate(\n", " id = X.id,\n", " parent = parent,\n", " ))\n", " " ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [], "source": [ "def create_node(array=[], parent=2):\n", " \"\"\"creates a node for the chart and formats it\n", "\n", " Parameters:\n", " array (array): An array with all the itemss\n", " parent (int): The id of the parent node\n", "\n", " Returns:\n", " dict:Dict with all the nodes\n", "\n", " \"\"\"\n", " print(array)\n", " if len(array) == 0:\n", " return {} \n", " else:\n", " # return {} if (array) is None else array\n", " print(array[0].keys())\n", " print(len(array))\n", " df = add_node_attributes(pd.DataFrame(array,columns=array[0].keys()), parent)\n", " return df.to_dict(orient='records')\n", " " ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [], "source": [ "def merge_nodes(dataset=[],funders=[],orgs=[],people=[]):\n", " \"\"\"Merges all the nodes lists\n", "\n", " Parameters:\n", " datasets (array): dataset nodes\n", " funders (array): funders nodes\n", " orgs (array): orgs nodes\n", " people (array): people nodes\n", "\n", " Returns:\n", " array:Array with all the nodes\n", "\n", " \"\"\"\n", " dataset = [] if len(dataset) == 0 else dataset\n", " funders = [] if len(funders) == 0 else funders\n", " orgs = [] if len(orgs) == 0 else orgs\n", " people = [] if len(people) == 0 else people\n", "\n", " dmp = {\"id\":1, \"name\": \"dmp\"}\n", " datasets_node = {\"id\":2, \"name\": \"Datasets\", \"parent\":1}\n", " funders_node = {\"id\":3, \"name\": \"Funders\", \"parent\":1}\n", " organisations_node = {\"id\":4, \"name\": \"Organisations\", \"parent\":1}\n", " people_node = {\"id\":5, \"name\": \"People\", \"parent\":1}\n", " nodes_list = [dmp, datasets_node, funders_node,organisations_node,people_node] + dataset + funders + orgs + people\n", " # return np.array(nodes_list, dtype=object) \n", " return nodes_list" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "tags": [] }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": "[]\n[]\n[{'id': 'https://orcid.org/0000-0003-0578-6033', 'name': 'Aksetøy, Laila Økdal', 'affiliation': [{'id': None}]}]\ndict_keys(['id', 'name', 'affiliation'])\n1\n id name affiliation\n0 https://orcid.org/0000-0003-0578-6033 Aksetøy, Laila Økdal [{'id': None}]\n[{'id': 'https://doi.org/10.13039/501100000780', 'funderIdentifierType': 'Crossref Funder ID', 'name': 'European Commission'}]\ndict_keys(['id', 'funderIdentifierType', 'name'])\n1\n id funderIdentifierType \\\n0 https://doi.org/10.13039/501100000780 Crossref Funder ID \n\n name \n0 European Commission \n" } ], "source": [ "data = get_data()\n", "datasets = create_node(data[\"datasets\"][\"nodes\"],2)\n", "orgs = create_node(data[\"organisations\"],4)\n", "people = create_node(data[\"people\"],5)\n", "funders = create_node(data[\"fundingReferences\"],3)\n", "\n", "nodes = merge_nodes(datasets, funders, orgs, people)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visulisation\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "def vega_template(data):\n", " \"\"\"Injects data into the vega specification\n", "\n", " Parameters:\n", " data (array): Array of nodes\n", "\n", " Returns:\n", " VegaSpec:Specification with data\n", "\n", " \"\"\"\n", " return \"\"\"\n", "{\n", " \"$schema\": \"https://vega.github.io/schema/vega/v3.json\",\n", " \"description\": \"An example of a radial layout for a node-link diagram of hierarchical data.\",\n", " \"width\": 720,\n", " \"height\": 720,\n", " \"padding\": 5,\n", " \"autosize\": \"none\",\n", " \"signals\": [\n", " {\"name\": \"labels\", \"value\": true, \"bind\": {\"input\": \"checkbox\"}},\n", " {\n", " \"name\": \"radius\",\n", " \"value\": 280,\n", " \"bind\": {\"input\": \"range\", \"min\": 20, \"max\": 600}\n", " },\n", " {\n", " \"name\": \"extent\",\n", " \"value\": 360,\n", " \"bind\": {\"input\": \"range\", \"min\": 0, \"max\": 360, \"step\": 1}\n", " },\n", " {\n", " \"name\": \"rotate\",\n", " \"value\": 0,\n", " \"bind\": {\"input\": \"range\", \"min\": 0, \"max\": 360, \"step\": 1}\n", " },\n", " {\n", " \"name\": \"layout\",\n", " \"value\": \"tidy\",\n", " \"bind\": {\"input\": \"radio\", \"options\": [\"tidy\", \"cluster\"]}\n", " },\n", " {\n", " \"name\": \"links\",\n", " \"value\": \"diagonal\",\n", " \"bind\": {\n", " \"input\": \"select\",\n", " \"options\": [\"line\", \"curve\", \"diagonal\", \"orthogonal\"]\n", " }\n", " },\n", " {\"name\": \"originX\", \"update\": \"width / 2\"},\n", " {\"name\": \"originY\", \"update\": \"height / 2\"}\n", " ],\n", " \"data\": [\n", " {\n", " \"name\": \"tree\",\n", " \"values\": \"\"\" + data + \"\"\",\n", " \"transform\": [\n", " {\"type\": \"stratify\", \"key\": \"id\", \"parentKey\": \"parent\"},\n", " {\n", " \"type\": \"tree\",\n", " \"method\": {\"signal\": \"layout\"},\n", " \"size\": [1, {\"signal\": \"radius\"}],\n", " \"as\": [\"alpha\", \"radius\", \"depth\", \"children\"]\n", " },\n", " {\n", " \"type\": \"formula\",\n", " \"expr\": \"(rotate + extent * datum.alpha + 270) % 360\",\n", " \"as\": \"angle\"\n", " },\n", " {\"type\": \"formula\", \"expr\": \"PI * datum.angle / 180\", \"as\": \"radians\"},\n", " {\n", " \"type\": \"formula\",\n", " \"expr\": \"inrange(datum.angle, [90, 270])\",\n", " \"as\": \"leftside\"\n", " },\n", " {\n", " \"type\": \"formula\",\n", " \"expr\": \"originX + datum.radius * cos(datum.radians)\",\n", " \"as\": \"x\"\n", " },\n", " {\n", " \"type\": \"formula\",\n", " \"expr\": \"originY + datum.radius * sin(datum.radians)\",\n", " \"as\": \"y\"\n", " }\n", " ]\n", " },\n", " {\n", " \"name\": \"links\",\n", " \"source\": \"tree\",\n", " \"transform\": [\n", " {\"type\": \"treelinks\"},\n", " {\n", " \"type\": \"linkpath\",\n", " \"shape\": {\"signal\": \"links\"},\n", " \"orient\": \"radial\",\n", " \"sourceX\": \"source.radians\",\n", " \"sourceY\": \"source.radius\",\n", " \"targetX\": \"target.radians\",\n", " \"targetY\": \"target.radius\"\n", " }\n", " ]\n", " }\n", " ],\n", " \"scales\": [\n", " {\n", " \"name\": \"color\",\n", " \"type\": \"linear\",\n", " \"range\": {\"scheme\": \"viridis\"},\n", " \"domain\": {\"data\": \"tree\", \"field\": \"depth\"},\n", " \"zero\": true\n", " }\n", " ],\n", " \"marks\": [\n", " {\n", " \"type\": \"path\",\n", " \"from\": {\"data\": \"links\"},\n", " \"encode\": {\n", " \"update\": {\n", " \"x\": {\"signal\": \"originX\"},\n", " \"y\": {\"signal\": \"originY\"},\n", " \"path\": {\"field\": \"path\"},\n", " \"stroke\": {\"value\": \"#ccc\"}\n", " }\n", " }\n", " },\n", " {\n", " \"type\": \"symbol\",\n", " \"from\": {\"data\": \"tree\"},\n", " \"encode\": {\n", " \"enter\": {\"size\": {\"value\": 300}, \"stroke\": {\"value\": \"#fff\"}},\n", " \"update\": {\n", " \"x\": {\"field\": \"x\"},\n", " \"y\": {\"field\": \"y\"},\n", " \"fill\": {\"scale\": \"color\", \"field\": \"depth\"}\n", " }\n", " }\n", " },\n", " {\n", " \"type\": \"text\",\n", " \"from\": {\"data\": \"tree\"},\n", " \"encode\": {\n", " \"enter\": {\n", " \"text\": {\"field\": \"name\"},\n", " \"fontSize\": {\"value\": 12},\n", " \"baseline\": {\"value\": \"middle\"}\n", " },\n", " \"update\": {\n", " \"x\": {\"field\": \"x\"},\n", " \"y\": {\"field\": \"y\"},\n", " \"dx\": {\"signal\": \"(datum.leftside ? -1 : 1) * 12\"},\n", " \"align\": {\"signal\": \"datum.leftside ? 'right' : 'left'\"},\n", " \"opacity\": {\"signal\": \"labels ? 1 : 0\"}\n", " }\n", " }\n", " }\n", " ]\n", "}\n", "\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "output_type": "display_data", "data": { "application/vnd.vega.v5+json": { "$schema": "https://vega.github.io/schema/vega/v3.json", "description": "An example of a radial layout for a node-link diagram of hierarchical data.", "width": 720, "height": 720, "padding": 5, "autosize": "none", "signals": [ { "name": "labels", "value": true, "bind": { "input": "checkbox" } }, { "name": "radius", "value": 280, "bind": { "input": "range", "min": 20, "max": 600 } }, { "name": "extent", "value": 360, "bind": { "input": "range", "min": 0, "max": 360, "step": 1 } }, { "name": "rotate", "value": 0, "bind": { "input": "range", "min": 0, "max": 360, "step": 1 } }, { "name": "layout", "value": "tidy", "bind": { "input": "radio", "options": [ "tidy", "cluster" ] } }, { "name": "links", "value": "diagonal", "bind": { "input": "select", "options": [ "line", "curve", "diagonal", "orthogonal" ] } }, { "name": "originX", "update": "width / 2" }, { "name": "originY", "update": "height / 2" } ], "data": [ { "name": "tree", "values": [ { "id": 1, "name": "dmp" }, { "id": 2, "name": "Datasets", "parent": 1 }, { "id": 3, "name": "Funders", "parent": 1 }, { "id": 4, "name": "Organisations", "parent": 1 }, { "id": 5, "name": "People", "parent": 1 }, { "id": "https://doi.org/10.13039/501100000780", "funderIdentifierType": "Crossref Funder ID", "name": "European Commission", "parent": 3 }, { "id": "https://orcid.org/0000-0003-0578-6033", "name": "Aksetøy, Laila Økdal", "affiliation": [ { "id": null } ], "parent": 5 } ], "transform": [ { "type": "stratify", "key": "id", "parentKey": "parent" }, { "type": "tree", "method": { "signal": "layout" }, "size": [ 1, { "signal": "radius" } ], "as": [ "alpha", "radius", "depth", "children" ] }, { "type": "formula", "expr": "(rotate + extent * datum.alpha + 270) % 360", "as": "angle" }, { "type": "formula", "expr": "PI * datum.angle / 180", "as": "radians" }, { "type": "formula", "expr": "inrange(datum.angle, [90, 270])", "as": "leftside" }, { "type": "formula", "expr": "originX + datum.radius * cos(datum.radians)", "as": "x" }, { "type": "formula", "expr": "originY + datum.radius * sin(datum.radians)", "as": "y" } ] }, { "name": "links", "source": "tree", "transform": [ { "type": "treelinks" }, { "type": "linkpath", "shape": { "signal": "links" }, "orient": "radial", "sourceX": "source.radians", "sourceY": "source.radius", "targetX": "target.radians", "targetY": "target.radius" } ] } ], "scales": [ { "name": "color", "type": "linear", "range": { "scheme": "viridis" }, "domain": { "data": "tree", "field": "depth" }, "zero": true } ], "marks": [ { "type": "path", "from": { "data": "links" }, "encode": { "update": { "x": { "signal": "originX" }, "y": { "signal": "originY" }, "path": { "field": "path" }, "stroke": { "value": "#ccc" } } } }, { "type": "symbol", "from": { "data": "tree" }, "encode": { "enter": { "size": { "value": 300 }, "stroke": { "value": "#fff" } }, "update": { "x": { "field": "x" }, "y": { "field": "y" }, "fill": { "scale": "color", "field": "depth" } } } }, { "type": "text", "from": { "data": "tree" }, "encode": { "enter": { "text": { "field": "name" }, "fontSize": { "value": 12 }, "baseline": { "value": "middle" } }, "update": { "x": { "field": "x" }, "y": { "field": "y" }, "dx": { "signal": "(datum.leftside ? -1 : 1) * 12" }, "align": { "signal": "datum.leftside ? 'right' : 'left'" }, "opacity": { "signal": "labels ? 1 : 0" } } } } ] }, "text/plain": "\n\nIf you see this message, it means the renderer has not been properly enabled\nfor the frontend that you are using. For more information, see\nhttps://altair-viz.github.io/user_guide/troubleshooting.html\n" }, "metadata": {} } ], "source": [ "alt.vega(json.loads(vega_template(json.dumps(nodes))))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ] }