{ "cells": [ { "cell_type": "markdown", "id": "7e8c941f-784b-46a4-b596-bd0ee3c140a4", "metadata": {}, "source": [ "# Create RO-Crate from RiOMar dataset\n", "\n", "\n", "## Context\n", "\n", "### Purpose\n", "\n", "We are showing how to create a RO-Crate for a dataset using the `rocrate` python library. This is a simple example with no specific RO-Crate profile. It follows RO-Crate v 1.1 specification.\n", "\n", "- **Standardized Metadata Packaging**: RO-Crates provide a standardized way to bundle datasets with rich metadata, making it easier to understand, share, and reuse the data.\n", "- **Enhanced FAIRness**: By including machine-readable metadata, RO-Crates improve the Findability, Accessibility, Interoperability, and Reusability (FAIR) of the dataset.\n", "- **Improved Discoverability**: Metadata in an RO-Crate allows datasets to be easily indexed and discovered through search engines and data repositories.\n", "- **Documentation and Provenance**: RO-Crates document essential information about the dataset, such as its source, authorship, and creation process, ensuring transparency and traceability.\n", "- **Facilitates Integration**: The structured metadata makes it easier to integrate the dataset with other tools, workflows, or datasets, enhancing its usability.\n", "- **Compliance with Standards**: Many funding agencies and journals now require datasets to be published with detailed metadata. RO-Crates align with these expectations and promote best practices in data management.\n", "\n", "\n", "### Description\n", "\n", "In this notebook, we will learn how to create a simple RO-Crate from the RiOMar data. We will then identify any missing metadata that needs to be added to the original dataset's metadata.\n", "\n", "## Contributions\n", "\n", "### Notebook\n", "\n", "- Anne Fouilloux (author), Simula Research Laboratory (Norway), @annefou\n", "- XX (reviewer)\n", "\n", "## Biblipgraphy and other interesting resources\n", "\n", "- [rocrate](https://pypi.org/project/rocrate/) Python package\n", "- [Research Object documentation](https://www.researchobject.org)" ] }, { "cell_type": "markdown", "id": "2b875a8b-74b1-4e5e-9448-7045d0494358", "metadata": {}, "source": [ "## Install and Import libraries" ] }, { "cell_type": "code", "execution_count": 42, "id": "0b1dff75-a254-4eae-9d03-fc4b01838052", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: rocrate in /srv/conda/envs/notebook/lib/python3.12/site-packages (0.13.0)\n", "Collecting rocrateValidator\n", " Downloading rocrateValidator-0.2.15-py3-none-any.whl.metadata (228 bytes)\n", "Requirement already satisfied: requests in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (2.32.3)\n", "Requirement already satisfied: arcp==0.2.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (0.2.1)\n", "Requirement already satisfied: jinja2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (3.1.4)\n", "Requirement already satisfied: python-dateutil in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (2.8.2)\n", "Requirement already satisfied: click in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (8.1.7)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from jinja2->rocrate) (2.1.5)\n", "Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from python-dateutil->rocrate) (1.16.0)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (3.3.2)\n", "Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (3.7)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (1.26.19)\n", "Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (2024.7.4)\n", "Downloading rocrateValidator-0.2.15-py3-none-any.whl (11 kB)\n", "Installing collected packages: rocrateValidator\n", "Successfully installed rocrateValidator-0.2.15\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "pip install rocrate rocrateValidator" ] }, { "cell_type": "code", "execution_count": 2, "id": "982e2483-b0b5-4fe7-b9e4-47be7fcb83f0", "metadata": {}, "outputs": [], "source": [ "import requests\n", "import json\n", "from rocrate.rocrate import ROCrate\n", "from rocrate.model.person import Person\n", "import pandas as pd\n", "from datetime import datetime\n", "import geopandas\n", "import shapely\n", "import xarray as xr\n", "import numpy as np\n", "import s3fs" ] }, { "cell_type": "markdown", "id": "0631c4e0-0e5c-4eb2-a850-44b49fa0c084", "metadata": {}, "source": [ "## Open RiOMar data to get metadata" ] }, { "cell_type": "code", "execution_count": 3, "id": "26d6ed4e-49bb-40c6-ae00-0b7009c7be31", "metadata": {}, "outputs": [], "source": [ "url_data = \"https://data-fair2adapt.ifremer.fr/riomar/small.zarr\"" ] }, { "cell_type": "code", "execution_count": 4, "id": "5b48eb64-383b-4eae-b3dd-812eb5c469c2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 498MB\n",
"Dimensions: (y_rho: 838, x_rho: 727, s_rho: 40, time_counter: 5)\n",
"Coordinates:\n",
" nav_lat_rho (y_rho, x_rho) float64 5MB dask.array<chunksize=(838, 727), meta=np.ndarray>\n",
" nav_lon_rho (y_rho, x_rho) float64 5MB dask.array<chunksize=(838, 727), meta=np.ndarray>\n",
" * s_rho (s_rho) float32 160B -0.9875 -0.9625 ... -0.0375 -0.0125\n",
" * time_counter (time_counter) datetime64[ns] 40B 2004-01-01T00:58:30 ... 2...\n",
" time_instant (time_counter) datetime64[ns] 40B dask.array<chunksize=(1,), meta=np.ndarray>\n",
"Dimensions without coordinates: y_rho, x_rho\n",
"Data variables:\n",
" ocean_mask (y_rho, x_rho) bool 609kB dask.array<chunksize=(838, 727), meta=np.ndarray>\n",
" temp (time_counter, s_rho, y_rho, x_rho) float32 487MB dask.array<chunksize=(1, 40, 838, 727), meta=np.ndarray>\n",
"Attributes: (12/45)\n",
" CPP-options: REGIONAL GAMAR MPI TIDES OBC_WEST OBC_NORTH XIOS USE_CALE...\n",
" Conventions: CF-1.6\n",
" Cs_r: have a look at variable Cs_r in this file\n",
" Cs_w: have a look at variable Cs_w in this file\n",
" SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...\n",
" Tcline: 15.0\n",
" ... ...\n",
" title: GAMAR_GLORYS\n",
" tnu4_expl: biharmonic mixing coefficient for tracers\n",
" units: meter4 second-1\n",
" uuid: 06f6b784-fcc0-4422-aceb-17da2a5aa9fa\n",
" v_sponge: 0.0\n",
" x_sponge: 0.0