{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5-final" }, "orig_nbformat": 2, "kernelspec": { "name": "python3", "display_name": "Python 3.8.5 64-bit", "metadata": { "interpreter": { "hash": "1ee38ef4a5a9feb55287fd749643f13d043cb0a7addaab2a9c224cbe137c0062" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Organisation/Funder/Repository Data Management Plans statistics\n", "\n", "Data management plans (DMPs) are documents accompanying research proposals and project outputs. DMPs are created as textual narratives and describe the data and tools employed in scientific investigations.They are sometimes seen as an administrative exercise and not as an integral part of research practice. Machine Actionable DMPs (maDMPs) take the DMP concept further by using PIDs and PIDs services to connect all resources associated with a DMP.\n", "\n", "\n", "This notebook displays all DMP statistics for an organisation, funder and/or data repository. By the end of this notebook, you will be able to succinctly display all the DMPs statistics for an organization, a funder and a repository. To demonstrate this we use the **California Digital Library** as Organization (https://ror.org/03yrm5c26) and the ** European Commision** as Funder (https://doi.org/10.13039/501100000780). In the summary statistics you will find a row for each DMP of the EC. Each row includes the title of the DMP, the PID, number of datasets and related publications, people involved, organizations and funders.\n", "\n", "\n", "The process of displaying the DMP statistics is very simple. First, and after an initial setup, we fetch all we need from the DataCite GraphQL API. Then, we transform this data into a data structure that can be used for computation. Finally, we take the data transformation and supply it to a table.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "# Install required Python packages\n", "!pip install dfply\n", "\n", "import json\n", "import pandas as pd\n", "import numpy as np\n", "from dfply import *\n", "\n", "# Prepare the GraphQL client\n", "import requests\n", "from IPython.display import display, Markdown\n", "from gql import gql, Client\n", "from gql.transport.requests import RequestsHTTPTransport\n", "\n", "_transport = RequestsHTTPTransport(\n", " url='https://api.datacite.org/graphql',\n", " use_json=True,\n", ")\n", "\n", "client = Client(\n", " transport=_transport,\n", " fetch_schema_from_transport=True,\n", ")\n", "\n", "import ipywidgets as widgets\n", "f = widgets.Dropdown(\n", " options=[('European Commission - ror.org/00k4n6c32', 'https://ror.org/00k4n6c32'), ('California Digital Library - ror.org/03yrm5c26','https://ror.org/03yrm5c26')],\n", " value='https://ror.org/03yrm5c26',\n", " description='Choose Organisation:',\n", " disabled=False,\n", ")\n", "\n", "\n", "organizationQuery = gql(\"\"\"query getOutputs($rorId: ID!)\n", "{\n", " organization(id: $rorId) {\n", " name\n", " dataManagementPlans(first: 10) {\n", " totalCount\n", " nodes {\n", " id\n", " title: titles(first: 1) {\n", " title\n", " }\n", " datasets: citations(query:\"types.resourceTypeGeneral:Dataset\") {\n", " totalCount\n", " }\n", " publications: citations(query:\"types.resourceTypeGeneral:Text\") {\n", " totalCount\n", " }\n", " producer: contributors(contributorType: \"Producer\") {\n", " id\n", " title: name\n", " }\n", " funders: fundingReferences {\n", " id: funderIdentifier\n", " funderIdentifierType\n", " title: funderName\n", " }\n", " people: creators {\n", " id\n", " name\n", " }\n", " contributors {\n", " id\n", " name\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\"\"\")\n", "\n", "funderQuery = gql(\"\"\"query getOutputs($funderId: ID!)\n", "{\n", " funder(id: $funderId) {\n", " name\n", " dataManagementPlans(first: 10) {\n", " totalCount\n", " nodes {\n", " id\n", " title: titles(first: 1) {\n", " title\n", " }\n", " datasets: citations(query:\"types.resourceTypeGeneral:Dataset\") {\n", " totalCount\n", " }\n", " publications: citations(query:\"types.resourceTypeGeneral:Text\") {\n", " totalCount\n", " }\n", " producer: contributors(contributorType: \"Producer\") {\n", " id\n", " title: name\n", " }\n", " funders: fundingReferences {\n", " id: funderIdentifier\n", " funderIdentifierType\n", " title: funderName\n", " }\n", " people: creators {\n", " id\n", " name\n", " }\n", " contributors {\n", " id\n", " name\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\"\"\")\n", "\n", "repositoryQuery = gql(\"\"\"query getOutputs($repositoryId: ID!)\n", "{\n", " repository(id: $repositoryId) {\n", " name\n", " dataManagementPlans(first: 10) {\n", " totalCount\n", " nodes {\n", " id\n", " title: titles(first: 1) {\n", " title\n", " }\n", " datasets: citations(query:\"types.resourceTypeGeneral:Dataset\") {\n", " totalCount\n", " }\n", " publications: citations(query:\"types.resourceTypeGeneral:Text\") {\n", " totalCount\n", " }\n", " producer: contributors(contributorType: \"Producer\") {\n", " id\n", " title: name\n", " }\n", " funders: fundingReferences {\n", " id: funderIdentifier\n", " funderIdentifierType\n", " title: funderName\n", " }\n", " people: creators {\n", " id\n", " name\n", " }\n", " contributors {\n", " id\n", " name\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\"\"\")\n", "\n", "def get_data(type, pid):\n", "\n", " repo_id = \"cdl.cdl\" if pid == \"https://ror.org/03yrm5c26\" else \"cern.zenodo\"\n", " funder_id = \"https://doi.org/10.13039/100000141\" if pid == \"https://ror.org/03yrm5c26\" else \"https://doi.org/10.13039/501100000780\"\n", " query_params = {\n", " \"rorId\" : pid,\n", " \"funderId\" : funder_id,\n", " \"repositoryId\" : repo_id\n", " }\n", "\n", " if type == \"organization\":\n", " return client.execute(organizationQuery, variable_values=json.dumps(query_params))[\"organization\"]\n", " elif type == \"funder\":\n", " return client.execute(funderQuery, variable_values=json.dumps(query_params))[\"funder\"]\n", " else:\n", " return client.execute(repositoryQuery, variable_values=json.dumps(query_params))[\"repository\"]\n", "\n", "def get_series_size(series_element):\n", " return len(series_element)\n", "\n", "\n", "def get_total(series_element):\n", " if len(series_element) == 0:\n", " return 0\n", " return series_element['totalCount']\n", "\n", "\n", "def dmp_header(row):\n", " s = 'DMP: '+ row.dmp + '\\r Funder: '+row.funders+'\\r Producer: '+row.producer\n", " return s\n", "\n", "\n", "def get_dataset_nodes(series_element):\n", " return series_element['nodes']\n", "\n", "def get_title(series_element):\n", " if len(series_element) == 0:\n", " return \"None\"\n", " return series_element[0]['title']\n", "\n", "def transform_dmps(dataframe):\n", " \"\"\"Modifies each item to include attributes needed for the node visulisation\n", "\n", " Parameters:\n", " dataframe (dataframe): A dataframe with all the itemss\n", " parent (int): The id of the parent node\n", "\n", " Returns:\n", " dataframe:Returning vthe same dataframe with new attributes\n", "\n", " \"\"\"\n", " if (dataframe) is None:\n", " return pd.DataFrame() \n", " else: \n", " return (dataframe >>\n", " mutate(\n", " DMP = X.title.apply(get_title),\n", " doi = X.id,\n", " NumDatasets = X.datasets.apply(get_total),\n", " NumPublications = X.publications.apply(get_total),\n", " Producer = X.producer.apply(get_title),\n", " Funder = X.funders.apply(get_title),\n", " NumPeople = (X.people + X.contributors).apply(get_series_size)\n", " ) \n", " # >> \n", " # mutate(\n", " # header = dmp_header(X),\n", " # ) \n", " # >>\n", " # filter_by(\n", " # X.hostingInstitution > 0\n", " # )\n", " )\n", "\n", "def processTable(type, pid):\n", " data = get_data(type, pid)\n", " if len(data[\"dataManagementPlans\"]['nodes']) == 0:\n", " return None\n", " else:\n", " table = pd.DataFrame(data[\"dataManagementPlans\"]['nodes'],columns=data[\"dataManagementPlans\"]['nodes'][0].keys())\n", " return transform_dmps(table)[list(('DMP', 'Funder', 'Producer', 'NumDatasets','NumPublications','NumPeople', 'doi'))].style.set_caption(data['name']) \n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "output_type": "display_data", "data": { "text/plain": "Dropdown(description='Choose Organisation:', index=1, options=(('European Commission - ror.org/00k4n6c32', 'ht…", "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "14fc6dc95a5c45a8bea99e39a7fdbd3b" } }, "metadata": {} } ], "source": [ "display(f)" ] }, { "source": [ "## DMP Statistics Visulisation\n", "\n", "\n", "The following three tables show the DMP Statistics for three different entities. Each of the tables includes the DMP title, its funding body, producer, host, and summary statistics about the number of datasets, publications, and people linked to the DMP. The first table displays DMP statistics that are hosted by the California Digital Library. The next table displays the statistics of DMPs funded by the European Commission. Finally, the last table shows the DMP statistics stored in the Zenodo Repository." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "text/html": "\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
California Digital Library
DMP Funder Producer NumDatasets NumPublications NumPeople doi
0DMPRoadmap: Making Data Management Plans ActionableNational Science Foundation (NSF)University Of California System004https://doi.org/10.48321/d1mw28
1LTREB: Drivers of temperate forest carbon storage from canopy closure through successional timeNational Science Foundation (NSF)University Of Michigan135https://doi.org/10.48321/d1h59r
2Late Season Productivity, Carbon, and Nutrient Dynamics in a Changing ArcticNational Science Foundation (NSF)Oregon State University005https://doi.org/10.48321/d17p4j
3REU Site: A Multidisciplinary Research Experience in Engineered Bioactive Interfaces and DevicesNational Science Foundation (NSF)University Of Kentucky004https://doi.org/10.48321/d1cc7t
4Brown carbon characterizationNational Science Foundation (NSF)College, Harvey Mudd023https://doi.org/10.48321/d13w2m
5A Political Ecology of Value: A Cohort-Based Ethnography of the Environmental Turn in Nicaraguan Urban Social PolicyNational Science Foundation (NSF)Western Washington University023https://doi.org/10.48321/d10593
6Finding Levers for Privacy and Security by Design in Mobile DevelopmentNational Science Foundation (NSF)University Of Maryland, College Park064https://doi.org/10.48321/d1vc75
7Use of telemetry and the Acoustic Wave Glider to study southern flounder migrationsNational Science Foundation (NSF)East Carolina University006https://doi.org/10.48321/d1kw2z
8The Virgin Islands Partnership to Increase Participation and Engagement through Linked, Informal, Nurturing Experiences in STEM (V.I. PIPELINES)National Science Foundation (NSF)University Of The Virgin Islands007https://doi.org/10.48321/d1qp4w
9DMP for The Role of Temperature in Regulating Herbivory and Algal Biomass in Upwelling SystemsNational Science Foundation (NSF)University Of North Carolina, Chapel Hill0133https://doi.org/10.48321/d1g59f
" }, "metadata": {}, "execution_count": 3 } ], "source": [ "processTable(\"organization\", f.value)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "text/html": "\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Division of Ocean Sciences
DMP Funder Producer NumDatasets NumPublications NumPeople doi
0Impacts of size-selective mortality on sex-changing fishesDivision of Ocean Sciences (nsf.gov)Oregon State University044https://doi.org/10.48321/d1101n
1Turbulence-spurred settlement: Deciphering a newly recognized class of larval responseDivision of Ocean Sciences (nsf.gov)San Francisco State University (Sfsu.Edu)046https://doi.org/10.48321/d14s38
2Collaborative Research: New Approaches to New ProductionDivision of Ocean Sciences (nsf.gov)University Of Southern California (Usc.Edu)074https://doi.org/10.48321/d1w88t
3Adaptations of fish and fishing communities to rapid climate changeDivision of Ocean Sciences (nsf.gov)University Of California, Santa Barbara (Ucsb.Edu)1109https://doi.org/10.48321/d1h010
4Gene content, gene expression, and physiology in mesopelagic ammonia-oxidizing archaeaDivision of Ocean Sciences (nsf.gov)J. Craig Venter Institute (Jcvi.Org)014https://doi.org/10.48321/d1ms3m
5Collaborative Research: Ocean Acidification and Coral Reefs: Scale Dependence and Adaptive CapacityDivision of Ocean Sciences (nsf.gov)California State University, Northridge (Csun.Edu)1118https://doi.org/10.48321/d1rg6w
6Collaborative research: Quantifying the biological, chemical, and physical linkages between chemosynthetic communities and the surrounding deep seaDivision of Ocean Sciences (nsf.gov)University Of California, San Diego (Ucsd.Edu)738https://doi.org/10.48321/d17g67
7Collaborative Research: Field test of larval behavior on transport and connectivity in an upwelling regimeDivision of Ocean Sciences (nsf.gov)University Of California, Davis (Ucdavis.Edu)006https://doi.org/10.48321/d1c885
8Collaborative Research: Dissolved organic matter feedbacks in coral reef resilience: The genomic & geochemical basis for microbial modulation of algal phase shiftsDivision of Ocean Sciences (nsf.gov)University Of Hawaii At Manoa (Manoa.Hawaii.Edu)0106https://doi.org/10.48321/d1001b
9Quantifying the potential for biogeochemical feedbacks to create 'refugia' from ocean acidification on tropical coral reefsDivision of Ocean Sciences (nsf.gov)Carnegie Institution For Science (Carnegiescience.Edu)017https://doi.org/10.48321/d13s3z
" }, "metadata": {}, "execution_count": 4 } ], "source": [ "processTable(\"funder\", f.value)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "text/html": "\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
California Digital Library
DMP Funder Producer NumDatasets NumPublications NumPeople doi
0DMPRoadmap: Making Data Management Plans ActionableNational Science Foundation (NSF)University Of California System004https://doi.org/10.48321/d1mw28
1LTREB: Drivers of temperate forest carbon storage from canopy closure through successional timeNational Science Foundation (NSF)University Of Michigan135https://doi.org/10.48321/d1h59r
2Late Season Productivity, Carbon, and Nutrient Dynamics in a Changing ArcticNational Science Foundation (NSF)Oregon State University005https://doi.org/10.48321/d17p4j
3REU Site: A Multidisciplinary Research Experience in Engineered Bioactive Interfaces and DevicesNational Science Foundation (NSF)University Of Kentucky004https://doi.org/10.48321/d1cc7t
4Brown carbon characterizationNational Science Foundation (NSF)College, Harvey Mudd023https://doi.org/10.48321/d13w2m
5A Political Ecology of Value: A Cohort-Based Ethnography of the Environmental Turn in Nicaraguan Urban Social PolicyNational Science Foundation (NSF)Western Washington University023https://doi.org/10.48321/d10593
6Finding Levers for Privacy and Security by Design in Mobile DevelopmentNational Science Foundation (NSF)University Of Maryland, College Park064https://doi.org/10.48321/d1vc75
7Use of telemetry and the Acoustic Wave Glider to study southern flounder migrationsNational Science Foundation (NSF)East Carolina University006https://doi.org/10.48321/d1kw2z
8The Virgin Islands Partnership to Increase Participation and Engagement through Linked, Informal, Nurturing Experiences in STEM (V.I. PIPELINES)National Science Foundation (NSF)University Of The Virgin Islands007https://doi.org/10.48321/d1qp4w
9DMP for The Role of Temperature in Regulating Herbivory and Algal Biomass in Upwelling SystemsNational Science Foundation (NSF)University Of North Carolina, Chapel Hill0133https://doi.org/10.48321/d1g59f
" }, "metadata": {}, "execution_count": 5 } ], "source": [ "processTable(\"repository\", f.value)" ] } ] }