# Organisation/Funder/Repository Data Management Plans statistics

Data management plans (DMPs) are documents accompanying research proposals and project outputs. DMPs are created as textual narratives and describe the data and tools employed in scientific investigations.They are sometimes seen as an administrative exercise and not as an integral part of research practice. Machine Actionable DMPs (maDMPs) take the DMP concept further by using PIDs and PIDs services to connect all resources associated with a DMP.


This notebook displays all DMP statistics for an organisation, funder and/or data repository. By the end of this notebook, you will be able to succinctly display all the DMPs statistics for an organization, a funder and a repository. To demonstrate this we use the **California Digital Library**  as Organization (https://ror.org/03yrm5c26) and the ** European Commision** as Funder (https://doi.org/10.13039/501100000780). In the summary statistics you will find a row for each DMP of the EC. Each row includes the title of the DMP, the PID, number of datasets and related publications, people involved, organizations and funders.


The process of displaying the DMP statistics is very simple. First, and after an initial setup, we fetch all we need from the DataCite GraphQL API. Then, we transform this data into a data structure that can be used for computation. Finally, we take the data transformation and supply it to a table.




In [1]:
%%capture
# Install required Python packages
!pip install dfply

import json
import pandas as pd
import numpy as np
from dfply import *

# Prepare the GraphQL client
import requests
from IPython.display import display, Markdown
from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

_transport = RequestsHTTPTransport(
    url='https://api.datacite.org/graphql',
    use_json=True,
)

client = Client(
    transport=_transport,
    fetch_schema_from_transport=True,
)

import ipywidgets as widgets
f = widgets.Dropdown(
    options=[('European Commission - ror.org/00k4n6c32', 'https://ror.org/00k4n6c32'), ('California Digital Library - ror.org/03yrm5c26','https://ror.org/03yrm5c26')],
    value='https://ror.org/03yrm5c26',
    description='Choose Organisation:',
    disabled=False,
)


organizationQuery = gql("""query getOutputs($rorId: ID!)
{
  organization(id: $rorId) {
    name
    dataManagementPlans(first: 10) {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: citations(query:"types.resourceTypeGeneral:Dataset") {
          totalCount
        }
        publications: citations(query:"types.resourceTypeGeneral:Text") {
          totalCount
        }
        producer: contributors(contributorType: "Producer") {
          id
          title: name
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          title: funderName
        }
        people: creators {
          id
          name
        }
        contributors {
          id
          name
        }
      }
    }
  }
}
""")

funderQuery = gql("""query getOutputs($funderId: ID!)
{
  funder(id: $funderId) {
    name
    dataManagementPlans(first: 10) {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: citations(query:"types.resourceTypeGeneral:Dataset") {
          totalCount
        }
        publications: citations(query:"types.resourceTypeGeneral:Text") {
          totalCount
        }
        producer: contributors(contributorType: "Producer") {
          id
          title: name
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          title: funderName
        }
        people: creators {
          id
          name
        }
        contributors {
          id
          name
        }
      }
    }
  }
}
""")

repositoryQuery = gql("""query getOutputs($repositoryId: ID!)
{
  repository(id: $repositoryId) {
    name
    dataManagementPlans(first: 10) {
      totalCount
      nodes {
        id
        title: titles(first: 1) {
          title
        }
        datasets: citations(query:"types.resourceTypeGeneral:Dataset") {
          totalCount
        }
        publications: citations(query:"types.resourceTypeGeneral:Text") {
          totalCount
        }
        producer: contributors(contributorType: "Producer") {
          id
          title: name
        }
        funders: fundingReferences {
          id: funderIdentifier
          funderIdentifierType
          title: funderName
        }
        people: creators {
          id
          name
        }
        contributors {
          id
          name
        }
      }
    }
  }
}
""")

def get_data(type, pid):

    repo_id = "cdl.cdl" if pid == "https://ror.org/03yrm5c26" else "cern.zenodo"
    funder_id = "https://doi.org/10.13039/100000141" if pid == "https://ror.org/03yrm5c26" else "https://doi.org/10.13039/501100000780"
    query_params = {
        "rorId" : pid,
        "funderId" : funder_id,
        "repositoryId" : repo_id
    }

    if type == "organization":
        return client.execute(organizationQuery, variable_values=json.dumps(query_params))["organization"]
    elif type == "funder":
        return client.execute(funderQuery, variable_values=json.dumps(query_params))["funder"]
    else:
        return client.execute(repositoryQuery, variable_values=json.dumps(query_params))["repository"]

def get_series_size(series_element):
    return len(series_element)


def get_total(series_element):
    if len(series_element) == 0:
        return 0
    return series_element['totalCount']


def dmp_header(row):
    s = 'DMP: '+ row.dmp + '\r Funder: '+row.funders+'\r Producer: '+row.producer
    return s


def get_dataset_nodes(series_element):
    return series_element['nodes']

def get_title(series_element):
    if len(series_element) == 0:
        return "None"
    return series_element[0]['title']

def transform_dmps(dataframe):
    """Modifies each item to include attributes needed for the node visulisation

    Parameters:
    dataframe (dataframe): A dataframe with all the itemss
    parent (int): The id of the parent node

    Returns:
    dataframe:Returning vthe same dataframe with new attributes

   """
    if (dataframe) is None:
        return pd.DataFrame() 
    else: 
        return (dataframe >>
        mutate(
            DMP = X.title.apply(get_title),
            doi = X.id,
            NumDatasets = X.datasets.apply(get_total),
            NumPublications = X.publications.apply(get_total),
            Producer = X.producer.apply(get_title),
            Funder = X.funders.apply(get_title),
            NumPeople = (X.people + X.contributors).apply(get_series_size)
        ) 
        # >> 
        # mutate(
        #     header = dmp_header(X),
        # ) 
        # >>
        # filter_by(
        #     X.hostingInstitution > 0
        # )
        )

def processTable(type, pid):
    data = get_data(type, pid)
    if len(data["dataManagementPlans"]['nodes']) == 0:
        return None
    else:
        table = pd.DataFrame(data["dataManagementPlans"]['nodes'],columns=data["dataManagementPlans"]['nodes'][0].keys())
    return transform_dmps(table)[list(('DMP', 'Funder', 'Producer', 'NumDatasets','NumPublications','NumPeople', 'doi'))].style.set_caption(data['name'])    



In [2]:
display(f)

Dropdown(description='Choose Organisation:', index=1, options=(('European Commission - ror.org/00k4n6c32', 'htâ€¦

## DMP Statistics Visulisation


The following three tables show the DMP Statistics for three different entities. Each of the tables includes the DMP title, its funding body, producer, host, and summary statistics about the number of datasets, publications, and people linked to the DMP. The first table displays DMP statistics that are hosted by the California Digital Library. The next table displays the statistics of DMPs funded by the European Commission. Finally, the last table shows the DMP statistics stored in the Zenodo Repository.

In [3]:
processTable("organization", f.value)

Unnamed: 0,DMP,Funder,Producer,NumDatasets,NumPublications,NumPeople,doi
0,DMPRoadmap: Making Data Management Plans Actionable,National Science Foundation (NSF),University Of California System,0,0,4,https://doi.org/10.48321/d1mw28
1,LTREB: Drivers of temperate forest carbon storage from canopy closure through successional time,National Science Foundation (NSF),University Of Michigan,1,3,5,https://doi.org/10.48321/d1h59r
2,"Late Season Productivity, Carbon, and Nutrient Dynamics in a Changing Arctic",National Science Foundation (NSF),Oregon State University,0,0,5,https://doi.org/10.48321/d17p4j
3,REU Site: A Multidisciplinary Research Experience in Engineered Bioactive Interfaces and Devices,National Science Foundation (NSF),University Of Kentucky,0,0,4,https://doi.org/10.48321/d1cc7t
4,Brown carbon characterization,National Science Foundation (NSF),"College, Harvey Mudd",0,2,3,https://doi.org/10.48321/d13w2m
5,A Political Ecology of Value: A Cohort-Based Ethnography of the Environmental Turn in Nicaraguan Urban Social Policy,National Science Foundation (NSF),Western Washington University,0,2,3,https://doi.org/10.48321/d10593
6,Finding Levers for Privacy and Security by Design in Mobile Development,National Science Foundation (NSF),"University Of Maryland, College Park",0,6,4,https://doi.org/10.48321/d1vc75
7,Use of telemetry and the Acoustic Wave Glider to study southern flounder migrations,National Science Foundation (NSF),East Carolina University,0,0,6,https://doi.org/10.48321/d1kw2z
8,"The Virgin Islands Partnership to Increase Participation and Engagement through Linked, Informal, Nurturing Experiences in STEM (V.I. PIPELINES)",National Science Foundation (NSF),University Of The Virgin Islands,0,0,7,https://doi.org/10.48321/d1qp4w
9,DMP for The Role of Temperature in Regulating Herbivory and Algal Biomass in Upwelling Systems,National Science Foundation (NSF),"University Of North Carolina, Chapel Hill",0,13,3,https://doi.org/10.48321/d1g59f


In [4]:
processTable("funder", f.value)

Unnamed: 0,DMP,Funder,Producer,NumDatasets,NumPublications,NumPeople,doi
0,Impacts of size-selective mortality on sex-changing fishes,Division of Ocean Sciences (nsf.gov),Oregon State University,0,4,4,https://doi.org/10.48321/d1101n
1,Turbulence-spurred settlement: Deciphering a newly recognized class of larval response,Division of Ocean Sciences (nsf.gov),San Francisco State University (Sfsu.Edu),0,4,6,https://doi.org/10.48321/d14s38
2,Collaborative Research: New Approaches to New Production,Division of Ocean Sciences (nsf.gov),University Of Southern California (Usc.Edu),0,7,4,https://doi.org/10.48321/d1w88t
3,Adaptations of fish and fishing communities to rapid climate change,Division of Ocean Sciences (nsf.gov),"University Of California, Santa Barbara (Ucsb.Edu)",1,10,9,https://doi.org/10.48321/d1h010
4,"Gene content, gene expression, and physiology in mesopelagic ammonia-oxidizing archaea",Division of Ocean Sciences (nsf.gov),J. Craig Venter Institute (Jcvi.Org),0,1,4,https://doi.org/10.48321/d1ms3m
5,Collaborative Research: Ocean Acidification and Coral Reefs: Scale Dependence and Adaptive Capacity,Division of Ocean Sciences (nsf.gov),"California State University, Northridge (Csun.Edu)",1,11,8,https://doi.org/10.48321/d1rg6w
6,"Collaborative research: Quantifying the biological, chemical, and physical linkages between chemosynthetic communities and the surrounding deep sea",Division of Ocean Sciences (nsf.gov),"University Of California, San Diego (Ucsd.Edu)",7,3,8,https://doi.org/10.48321/d17g67
7,Collaborative Research: Field test of larval behavior on transport and connectivity in an upwelling regime,Division of Ocean Sciences (nsf.gov),"University Of California, Davis (Ucdavis.Edu)",0,0,6,https://doi.org/10.48321/d1c885
8,Collaborative Research: Dissolved organic matter feedbacks in coral reef resilience: The genomic & geochemical basis for microbial modulation of algal phase shifts,Division of Ocean Sciences (nsf.gov),University Of Hawaii At Manoa (Manoa.Hawaii.Edu),0,10,6,https://doi.org/10.48321/d1001b
9,Quantifying the potential for biogeochemical feedbacks to create 'refugia' from ocean acidification on tropical coral reefs,Division of Ocean Sciences (nsf.gov),Carnegie Institution For Science (Carnegiescience.Edu),0,1,7,https://doi.org/10.48321/d13s3z


In [5]:
processTable("repository", f.value)

Unnamed: 0,DMP,Funder,Producer,NumDatasets,NumPublications,NumPeople,doi
0,DMPRoadmap: Making Data Management Plans Actionable,National Science Foundation (NSF),University Of California System,0,0,4,https://doi.org/10.48321/d1mw28
1,LTREB: Drivers of temperate forest carbon storage from canopy closure through successional time,National Science Foundation (NSF),University Of Michigan,1,3,5,https://doi.org/10.48321/d1h59r
2,"Late Season Productivity, Carbon, and Nutrient Dynamics in a Changing Arctic",National Science Foundation (NSF),Oregon State University,0,0,5,https://doi.org/10.48321/d17p4j
3,REU Site: A Multidisciplinary Research Experience in Engineered Bioactive Interfaces and Devices,National Science Foundation (NSF),University Of Kentucky,0,0,4,https://doi.org/10.48321/d1cc7t
4,Brown carbon characterization,National Science Foundation (NSF),"College, Harvey Mudd",0,2,3,https://doi.org/10.48321/d13w2m
5,A Political Ecology of Value: A Cohort-Based Ethnography of the Environmental Turn in Nicaraguan Urban Social Policy,National Science Foundation (NSF),Western Washington University,0,2,3,https://doi.org/10.48321/d10593
6,Finding Levers for Privacy and Security by Design in Mobile Development,National Science Foundation (NSF),"University Of Maryland, College Park",0,6,4,https://doi.org/10.48321/d1vc75
7,Use of telemetry and the Acoustic Wave Glider to study southern flounder migrations,National Science Foundation (NSF),East Carolina University,0,0,6,https://doi.org/10.48321/d1kw2z
8,"The Virgin Islands Partnership to Increase Participation and Engagement through Linked, Informal, Nurturing Experiences in STEM (V.I. PIPELINES)",National Science Foundation (NSF),University Of The Virgin Islands,0,0,7,https://doi.org/10.48321/d1qp4w
9,DMP for The Role of Temperature in Regulating Herbivory and Algal Biomass in Upwelling Systems,National Science Foundation (NSF),"University Of North Carolina, Chapel Hill",0,13,3,https://doi.org/10.48321/d1g59f
