{ "cells": [ { "cell_type": "markdown", "id": "suited-forty", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", "
LayoutLayout

Layout
\n", "
\n", "
" ] }, { "cell_type": "markdown", "id": "defensive-penguin", "metadata": {}, "source": [ "# NO2 over Spain with CAMS European air quality analysis using RELIANCE services\n", "\n", "## Analysis over a particular country and a town in the country of interest" ] }, { "cell_type": "markdown", "id": "duplicate-coaching", "metadata": {}, "source": [ "
\n", "How to discover RELIANCE datacube resources (spatial & temporal search and subsetting), share resources using EGI datahub, and use RoHub to create FAIR digital Objects
" ] }, { "cell_type": "markdown", "id": "streaming-utilization", "metadata": {}, "source": [ "This notebook shows how to discover and access the [Copernicus Atmosphere Monitoring](https://ads.atmosphere.copernicus.eu/#!/home) products available in the **RELIANCE** datacube resources, by using the functionalities provided in the **Adam API** . The process is structured in 7 steps, including example of data analysis and visualization with the Python libraries installed in the Jupyter environment as well as the creation of a FAIR digital object on [RoHUB](https://reliance.rohub.org/) where all the resources used and generated in this notebook are aggregated.\n", "\n", "You can customize this Jupyter notebook, for instance by updating the content of [Data Management](#1.Data_Management) section.\n", "\n", "- [1. Data Management](#1_Data_Management)\n", "- [2. Authentication](#2_Authentication)\n", "- [3. Datasets Discovery](#3_Datasets_Discovery)\n", "- [4. Products Discovery](#4_Products_Discovery)\n", "- [5. Data Access](#5_Data_Access)\n", "- [6. Data Analysis and Visualizarion](#6_Data_Analysis_Visualization)\n", "- [7. Create Research Object and Share my work](#7_rohub)" ] }, { "cell_type": "markdown", "id": "46e47939-5420-4275-b820-09b0eca5d5d0", "metadata": {}, "source": [ "\n", "## **Step 1: Data Management** " ] }, { "cell_type": "markdown", "id": "c2775b64-10f3-4726-b648-f26cc5947e5c", "metadata": {}, "source": [ "### Authors \n", "- Make sure you first register to RoHub at [https://reliance.rohub.org/](https://reliance.rohub.org/). \n", "- We recommend you use your [ORCID](https://orcid.org/) identifier to login and register to EOSC services.\n", "- In the list of authors, add any co-authors using the email address they used when they registered in RoHub." ] }, { "cell_type": "code", "execution_count": null, "id": "27ae98cd-b52c-4299-a559-22cc56f8e838", "metadata": {}, "outputs": [], "source": [ "author_emails = ['annefou@geo.uio.no']\n", "contributor_emails = ['jeani@uio.no', 'mantovani@meeo.it']" ] }, { "cell_type": "markdown", "id": "1fbddffc-cc3f-425a-ba76-61bdad445d33", "metadata": {}, "source": [ "### Add the University of Olso and the Nordic e-Infrastructure Collaboration as publishers " ] }, { "cell_type": "code", "execution_count": null, "id": "e8e7a032-a231-45b5-a05d-493e12eb0260", "metadata": {}, "outputs": [], "source": [ "UiO_organization = {\"org_id\":\"http://www.uio.no/english/\", \n", " \"display_name\": \"University of Oslo\", \n", " \"agent_type\": \"organization\",\n", " \"ror_identifier\":\"01xtthb56\",\n", " \"organization_url\": \"http://www.uio.no/english/\"}" ] }, { "cell_type": "code", "execution_count": null, "id": "62bf9523-3bf9-4348-806f-df641f29032f", "metadata": {}, "outputs": [], "source": [ "NeIC_organization = {\"org_id\":\"https://neic.no/\",\n", " \"display_name\": \"Nordic e-Infrastructure Collaboration\", \n", " \"agent_type\": \"organization\",\n", " \"ror_identifier\":\"04jcwf484\",\n", " \"organization_url\": \"https://neic.no/\"}" ] }, { "cell_type": "code", "execution_count": null, "id": "ce719f3a-eadf-43f1-b5fb-ea6e6116a89f", "metadata": {}, "outputs": [], "source": [ "list_publishers = [UiO_organization, NeIC_organization]" ] }, { "cell_type": "code", "execution_count": null, "id": "2b1f1057-e64f-4263-af61-05fc55b96d90", "metadata": {}, "outputs": [], "source": [ "list_copyright_holders = [UiO_organization]" ] }, { "cell_type": "markdown", "id": "522436d4-bd1c-44d1-a9a4-d76e4620d6bf", "metadata": {}, "source": [ "### Add the funding\n", "- if your work is not funded set \n", "```\n", "funded_by = {}\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "2c1ce214-c7ef-4ca7-a6d8-ed7d13e94e81", "metadata": {}, "outputs": [], "source": [ "funded_by = {\n", "\"grant_id\": \"101017502\",\n", "\"grant_Name\": \"RELIANCE\",\n", "\"grant_title\": \"Research Lifecycle Management for Earth Science Communities and Copernicus Users\",\n", "\"funder_name\": \"European Commission\",\n", "\"funder_doi\": \"10.13039/501100000781\",\n", "}" ] }, { "cell_type": "markdown", "id": "d2c308f8-4677-4083-a7d0-e4d859668e62", "metadata": {}, "source": [ "### Choose a license for your FAIR digital object" ] }, { "cell_type": "code", "execution_count": null, "id": "d52d2be0-321d-4979-802a-8c81921ccf06", "metadata": {}, "outputs": [], "source": [ "pip install rohub" ] }, { "cell_type": "code", "execution_count": null, "id": "345038aa-f1e2-4897-bf89-1d1ab3f38462", "metadata": {}, "outputs": [], "source": [ "import rohub" ] }, { "cell_type": "code", "execution_count": null, "id": "a3a20bb8-4d0f-4d64-b006-7fbfe2d1ff78", "metadata": { "tags": [] }, "outputs": [], "source": [ "licenses = rohub.list_available_licenses()\n", "# Update line below to print more licenses\n", "licenses[0:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "b8d1dcf9-22ac-4b9c-875e-857afd01df5a", "metadata": {}, "outputs": [], "source": [ "license = 'MIT'" ] }, { "cell_type": "markdown", "id": "521673e0-e4b9-45e8-9802-197dcb6d9f66", "metadata": {}, "source": [ "### Organize my data using EGI datahub\n", "- Define a prefix for my project (you may need to adjust it for your own usage on your infrastructure). \n", " - `input` folder where all the data used as input to my Jupyter Notebook is stored (and eventually shared)\n", " - `output` folder where all the results to keep are stored\n", " - `tool` folder where all the tools, including this Jupyter Notebook will be copied for sharing\n", "- Create all corresponding folders" ] }, { "cell_type": "markdown", "id": "a5457c11-6238-43d2-a98a-ef0b8bfecd9a", "metadata": {}, "source": [ "### Import Python packages" ] }, { "cell_type": "code", "execution_count": null, "id": "solar-timing", "metadata": {}, "outputs": [], "source": [ "import os\n", "import warnings\n", "import pathlib" ] }, { "cell_type": "code", "execution_count": null, "id": "heard-terminal", "metadata": {}, "outputs": [], "source": [ "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "id": "47e3232f-8bb1-4f3e-a23d-9fd0e54f5fdb", "metadata": {}, "source": [ "## Initialization\n", "- Choose a country and add its name and country code\n", "- Choose the variable to analyze (PM10, PM25, NO2, O3, etc.)\n", "- Choose the area for your analysis" ] }, { "cell_type": "markdown", "id": "c6303ab9-7790-4411-9483-efc6bda387ef", "metadata": {}, "source": [ "### Choose the country of interest" ] }, { "cell_type": "code", "execution_count": null, "id": "65ebd85f-819a-4208-8136-1054df5e6b9a", "metadata": {}, "outputs": [], "source": [ "country_code = 'ES' \n", "country_fullname = \"Spain\"\n", "town_fullname = 'Madrid' \n", "town_coordinates = {'latitude': 40.4168, 'longitude': 3.7038}\n", "variable_name = 'NO2'\n", "variable_unit = 'µg m-3'\n", "variable_long_name = 'Nitrogen Dioxide'\n", "month_name = 'April'\n", "month_number = '04'\n", "month_nb_days = '30'" ] }, { "cell_type": "markdown", "id": "58dbc448-7309-4d42-be5e-c1e9896aa24c", "metadata": {}, "source": [ "### Geojson for selecting data from ADAM\n", "- The geometry field is extracted from a GeoJSON file, retrieving the value of the \"feature\" element.\n", "- To create a geojson file for the area of interest, you can use https://geojson.io/\n", "- Then paste the result below in the geojson variable" ] }, { "cell_type": "code", "execution_count": null, "id": "7f127d8f-2822-42df-9969-51ba87ec4c11", "metadata": {}, "outputs": [], "source": [ "geojson = \"\"\"{\"type\": \"FeatureCollection\",\"features\": [{\"type\": \"Feature\",\"properties\": {},\"geometry\": {\"type\": \"Polygon\",\"coordinates\": [[[3.05419921875,42.601619944327965],[-1.69189453125,43.46886761482925],[-8.10791015625,43.866218006556394],[-9.60205078125,43.03677585761058],[-9.11865234375,42.24478535602799],[-9.03076171875,40.245991504199026],[-9.580078125,39.07890809706475],[-9.73388671875,38.70265930723801],[-9.25048828125,38.30718056188316],[-8.942871093749998,38.25543637637947],[-9.052734375,37.142803443716836],[-9.29443359375,36.79169061907076],[-8.10791015625,36.89719446989036],[-7.778320312499999,36.79169061907076],[-7.27294921875,37.07271048132943],[-6.78955078125,36.86204269508728],[-6.17431640625,36.06686213257888],[-5.69091796875,35.90684930677121],[-5.09765625,36.08462129606931],[-4.74609375,36.33282808737917],[-4.10888671875,36.59788913307022],[-3.09814453125,36.54494944148322],[-2.43896484375,36.56260003738545],[-2.04345703125,36.63316209558658],[-1.69189453125,37.16031654673677],[-1.34033203125,37.43997405227057],[-0.439453125,37.49229399862877],[-0.59326171875,37.75334401310656],[-0.37353515625,38.272688535980976],[0.263671875,38.59970036588819],[0.3955078125,38.839707613545144],[0.06591796875,38.94232097947902],[-0.17578125,39.2832938689385],[-0.19775390625,39.58875727696545],[0.24169921874999997,39.977120098439634],[0.68115234375,40.463666324587685],[1.07666015625,40.83043687764923],[1.58203125,41.062786068733026],[2.2412109375,41.178653972331674],[2.83447265625,41.541477666790286],[3.33984375,41.73852846935917],[3.3618164062499996,42.13082130188811],[3.05419921875,42.601619944327965]]]}}]}\"\"\"" ] }, { "cell_type": "markdown", "id": "27155406-3f0b-4302-8122-79889725a110", "metadata": {}, "source": [ "### Create folders" ] }, { "cell_type": "code", "execution_count": null, "id": "7e282cc7-748c-4785-af96-1aeed4d334de", "metadata": { "tags": [] }, "outputs": [], "source": [ "WORKDIR_FOLDER = os.path.join(os.environ['HOME'], \"datahub/Reliance/Climate\" + '_' + country_code + '_' + variable_name + '_' + month_name)\n", "print(\"WORKDIR FOLDER: \", WORKDIR_FOLDER)" ] }, { "cell_type": "code", "execution_count": null, "id": "f1a03583-8a6a-4c36-939b-f0ab479dd673", "metadata": {}, "outputs": [], "source": [ "INPUT_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'input')\n", "OUTPUT_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'output')\n", "TOOL_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'tool')\n", "\n", "list_folders = [INPUT_DATA_DIR, OUTPUT_DATA_DIR, TOOL_DATA_DIR]\n", "\n", "for folder in list_folders:\n", " pathlib.Path(folder).mkdir(parents=True, exist_ok=True)" ] }, { "cell_type": "markdown", "id": "18f4f0e0-27f4-4b94-af4d-31c9cd579c12", "metadata": {}, "source": [ "### Geojson file for selecting data from ADAM\n", "- We dissolve geojson in case we have more than one polygon and then save the results into a geojson file" ] }, { "cell_type": "code", "execution_count": null, "id": "c3fe35e9-8160-434d-8a51-2fc09f251756", "metadata": {}, "outputs": [], "source": [ "import cartopy\n", "import geopandas as gpd" ] }, { "cell_type": "code", "execution_count": null, "id": "a6b74f45-b70b-4a4c-b5d6-88ea7cfc7f24", "metadata": {}, "outputs": [], "source": [ "local_path_geom = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')\n", "local_path_geom" ] }, { "cell_type": "code", "execution_count": null, "id": "f81c3ab7-cc6f-4b06-9594-b50b28371887", "metadata": {}, "outputs": [], "source": [ "if (pathlib.Path(local_path_geom).exists()):\n", " os.remove(local_path_geom)\n", "f = open(local_path_geom, \"w\")\n", "f.write(geojson)\n", "f.close()" ] }, { "cell_type": "code", "execution_count": null, "id": "245ceacb-ab07-46a4-90f6-03e4c9a5a958", "metadata": { "tags": [] }, "outputs": [], "source": [ "data = gpd.read_file(local_path_geom)" ] }, { "cell_type": "code", "execution_count": null, "id": "91d2b98d-6873-4b3a-8cd4-7a5ad7540813", "metadata": {}, "outputs": [], "source": [ "single_shape = data.dissolve()" ] }, { "cell_type": "markdown", "id": "6e13dc84-8fe6-4b36-a45f-eabe28dd2d90", "metadata": {}, "source": [ "### Show area of interest" ] }, { "cell_type": "code", "execution_count": null, "id": "7f1ca9e4-99cf-4fa4-9f99-af28b7e68a91", "metadata": {}, "outputs": [], "source": [ "single_shape.plot()" ] }, { "cell_type": "code", "execution_count": null, "id": "4957dfeb-eac2-47ab-b821-bab70df6b54d", "metadata": {}, "outputs": [], "source": [ "if (pathlib.Path(local_path_geom).exists()):\n", " os.remove(local_path_geom)" ] }, { "cell_type": "code", "execution_count": null, "id": "98fa511a-9253-450b-b608-784b7bdb341e", "metadata": {}, "outputs": [], "source": [ "single_shape.to_file(local_path_geom, driver='GeoJSON')" ] }, { "cell_type": "markdown", "id": "functional-trade", "metadata": {}, "source": [ "## **Step 2: Authentication** " ] }, { "cell_type": "markdown", "id": "interesting-modification", "metadata": {}, "source": [ "The following lines of code will show the personal **Adam API-Key** of the user and the endpoint currently in use, that provides access to the products in the related catalogue. At the end of the execution, if the authentication process is successfull the personal token and the expiration time should be returned as outputs." ] }, { "cell_type": "code", "execution_count": null, "id": "split-nutrition", "metadata": {}, "outputs": [], "source": [ "pip install adamapi" ] }, { "cell_type": "code", "execution_count": null, "id": "740cf688-4717-4bc5-ab4a-7287496d33ea", "metadata": {}, "outputs": [], "source": [ "adam_key = open(os.path.join(os.environ['HOME'],\"adam-key\")).read().rstrip()" ] }, { "cell_type": "code", "execution_count": null, "id": "norwegian-evans", "metadata": {}, "outputs": [], "source": [ "import adamapi as adam\n", "a = adam.Auth()\n", "\n", "a.setKey(adam_key)\n", "a.setAdamCore('https://reliance.adamplatform.eu')\n", "a.authorize() " ] }, { "cell_type": "markdown", "id": "mathematical-writing", "metadata": { "jupyter": { "outputs_hidden": true }, "tags": [] }, "source": [ "## **Step 3: Datasets Discovery**" ] }, { "cell_type": "markdown", "id": "adult-warning", "metadata": {}, "source": [ "After authorization, the user can browse the whole catalogue, structured as a JSON object after a pagination process, displaying all the available datasets. This operation can be executed with the **getDatasets()** function without including any argument. Some lines of code should be added to parse the Json object and extract the names of the datasets.The Json object can be handled as a Python dictionary." ] }, { "cell_type": "markdown", "id": "premier-thomas", "metadata": {}, "source": [ "### Pre-filter datasets\n", "\n", "We will discover all the available datasets in the ADAM platform but will only print elements of interest **EU_CAMS** e.g. [European air quality datasets](https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-europe-air-quality-forecasts?tab=overview) from Copernicus Atmosphere Monitoring Service" ] }, { "cell_type": "code", "execution_count": null, "id": "controlled-soccer", "metadata": {}, "outputs": [], "source": [ "def list_datasets(a, search=\"\", dataset_name=\"\"):\n", " datasets = adam.Datasets(a)\n", " catalogue = datasets.getDatasets()\n", " datasetID = None\n", "\n", "# Extracting the size of the catalogue\n", "\n", " total = catalogue['properties']['totalResults']\n", " items = catalogue['properties']['itemsPerPage']\n", " pages = total // items\n", " \n", " print('\\033[1;34m')\n", " print('----------------------------------------------------------------------')\n", " print( 'List of available datasets:')\n", " print ('\\033[0;0m')\n", "\n", "# Extracting the list of datasets across the whole catalogue\n", "\n", " for i in range(0, pages):\n", " page = datasets.getDatasets(page=i)\n", " for element in page['content']:\n", " if search == \"\" or search in element['title']:\n", " print(element['title'] + \" --> datasetId = \" + element['datasetId'])\n", " if element['datasetId'].split(':')[1] == dataset_name:\n", " datasetID = element['datasetId']\n", " return datasets, datasetID" ] }, { "cell_type": "code", "execution_count": null, "id": "3ddaa981-bb77-4a8a-a610-fedd54d1e288", "metadata": { "tags": [] }, "outputs": [], "source": [ "datasets, datasetID = list_datasets(a, search=\"CAMS\", dataset_name = 'EU_CAMS_SURFACE_' + variable_name + '_G')" ] }, { "cell_type": "markdown", "id": "complimentary-daily", "metadata": {}, "source": [ "We are interested by one variable only so we will discover the corresponding dataset and print its metadata, showing the data provenance." ] }, { "cell_type": "code", "execution_count": null, "id": "7239ace3-3409-4f76-a147-523aab9bd361", "metadata": {}, "outputs": [], "source": [ "def get_metadata(datasetID, datasets, verbose=False):\n", " print('\\033[1;34m' + 'Metadata of ' + datasetID + ':')\n", " print ('\\033[0;0m')\n", " \n", " paged = datasets.getDatasets(datasetID)\n", " for i in paged.items():\n", " print(\"\\033[1m\" + str(i[0]) + \"\\033[0m\" + ': ' + str(i[1]))\n", " return paged" ] }, { "cell_type": "code", "execution_count": null, "id": "d2eca1f9-4f01-4056-9d59-5568e4023223", "metadata": { "tags": [] }, "outputs": [], "source": [ "metadata_variable = get_metadata(datasetID, datasets, verbose=True)" ] }, { "cell_type": "markdown", "id": "electronic-special", "metadata": {}, "source": [ "## **Step 4: Products Discovery**" ] }, { "cell_type": "markdown", "id": "ahead-terminal", "metadata": {}, "source": [ "The products discovery operation related to a specific dataset is implemented in the Adam API with the **getProducts()** function. A combined **spatial and temporal search** can be requested by specifying the **datasetId** for the selected dataset, the **geometry** argument that specifies the Area Of Interest, and a temporal range defined by `startDate` and `endDate` . The geometry must **always** be defined by a **GeoJson object** that describes the polygon in the **counterclockwise winding order**. The optional arguments `startIndex` and `maxRecords` can set the list of the results returned as an output. The results of the search are displayed with their metadata and they are sorted starting from the most recent product." ] }, { "cell_type": "markdown", "id": "twelve-president", "metadata": {}, "source": [ "### Search data" ] }, { "cell_type": "code", "execution_count": null, "id": "hydraulic-opening", "metadata": {}, "outputs": [], "source": [ "pip install geojson_rewind" ] }, { "cell_type": "code", "execution_count": null, "id": "capable-ministry", "metadata": {}, "outputs": [], "source": [ "from geojson_rewind import rewind\n", "import json" ] }, { "cell_type": "markdown", "id": "higher-burns", "metadata": {}, "source": [ "The GeoJson object needs to be rearranged according to the counterclockwise winding order. This operation is executed in the next few lines to obtain \n", "a geometry that meets the requirements of the method. **Geom_1** is the final result to be used in the discovery operation." ] }, { "cell_type": "code", "execution_count": null, "id": "offensive-rochester", "metadata": {}, "outputs": [], "source": [ "with open(local_path_geom) as f:\n", " geom_dict = json.load(f)\n", "output = rewind(geom_dict) \n", "geom_1 = str(geom_dict['features'][0]['geometry'])" ] }, { "cell_type": "markdown", "id": "pretty-shark", "metadata": {}, "source": [ "Copernicus air quality analyses are hourly product but when we select a given date, we will only get the first 10 products. \n", "Below, we make a list of the first 10 available products for the 1st day of the studied month in 2020 e.g. we restrict our search to this date." ] }, { "cell_type": "code", "execution_count": null, "id": "expensive-subsection", "metadata": {}, "outputs": [], "source": [ "start_date = '2019-' + month_number + '-01'\n", "end_date = start_date" ] }, { "cell_type": "code", "execution_count": null, "id": "functioning-gibraltar", "metadata": { "tags": [] }, "outputs": [], "source": [ "search = adam.Search( a )\n", "results = search.getProducts(\n", " datasetID, \n", " geometry=geom_1,\n", " startDate=start_date,\n", " endDate=end_date\n", " )\n", "\n", "# Printing the results\n", "\n", "print('\\033[1;34m' + 'List of available products (maximum 10 products printed):')\n", "print ('\\033[0;0m')\n", "\n", "count = 1\n", "for i in results['content']:\n", " print(\"\\033[1;31;1m\" + \"#\" + str(count))\n", " print ('\\033[0m')\n", " for k in i.items():\n", " print(str(k[0]) + ': ' + str(k[1]))\n", " count = count+1\n", " print('------------------------------------')" ] }, { "cell_type": "markdown", "id": "alone-ordinance", "metadata": {}, "source": [ "## **Step 5: Data Access**" ] }, { "cell_type": "markdown", "id": "brave-occupation", "metadata": {}, "source": [ "After the data discovery operation that retrieves the availability of products in the catalogue, it is possible to access the data with the **getData** function. Each product in the output list intersects the selected geometry and the following example shows how to access a specific product from the list of results obtained in the previous step. While the **datasetId** is always a mandatory parameter, for each data access request the **getData** function needs only one of the following arguments: **geometry** or **productId** , that is the value of the **_id** field in each product metadata. In the case of a **spatial and temporal search** the geometry must be provided to the function, together with the time range of interest. \n", "The output of the **getData** function is always a **.zip** file containing the data retrieved with the data access request, providing the spatial **subset** of the product. The zip file will contain a geotiff file for each of the spatial subsets extracted in the selected time range." ] }, { "cell_type": "markdown", "id": "collective-estimate", "metadata": {}, "source": [ "#### Define a function to select a time range and get data" ] }, { "cell_type": "code", "execution_count": null, "id": "indie-coaching", "metadata": {}, "outputs": [], "source": [ "def getZipData(auth, dataset_info):\n", " if not (pathlib.Path(pathlib.Path(dataset_info['outputFname']).stem).exists() or pathlib.Path(dataset_info['outputFname']).exists()):\n", " data=adam.GetData(auth)\n", " image = data.getData(\n", " datasetId = dataset_info['datasetID'],\n", " startDate = dataset_info['startDate'],\n", " endDate = dataset_info['endDate'],\n", " geometry = dataset_info['geometry'],\n", " outputFname = dataset_info['outputFname'])\n", " print(image)" ] }, { "cell_type": "markdown", "id": "younger-invalid", "metadata": {}, "source": [ "#### Get variable of interest for each day of the month we study for 2019, 2020 and 2021 (time 00:00:00)\n", "\n", "This process can take a bit of time so be patient!" ] }, { "cell_type": "code", "execution_count": null, "id": "charitable-nightmare", "metadata": {}, "outputs": [], "source": [ "import time\n", "from IPython.display import clear_output" ] }, { "cell_type": "code", "execution_count": null, "id": "sudden-insert", "metadata": { "tags": [] }, "outputs": [], "source": [ "start = time.time()\n", "\n", "for year in ['2019', '2020', '2021']:\n", " datasetInfo = {\n", " 'datasetID' : datasetID,\n", " 'startDate' : year + '-' + month_number + '-01',\n", " 'endDate' : year + '-' + month_number + '-' + month_nb_days,\n", " 'geometry' : geom_1,\n", " 'outputFname' : INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip'\n", " }\n", " getZipData(a, datasetInfo)\n", " \n", "end = time.time()\n", "clear_output(wait=True)\n", "delta1 = end - start\n", "print('\\033[1m'+'Processing time: ' + str(round(delta1,2)))" ] }, { "cell_type": "markdown", "id": "current-commercial", "metadata": {}, "source": [ "## **Step 6: Data Analysis and Visualization**" ] }, { "cell_type": "markdown", "id": "collectible-tragedy", "metadata": {}, "source": [ "The data retrieved via the Adam API is now available as a zip file that must be unzipped to directly handle the data in a geotiff format. Then with the Python packages provided in the Jupyter environment it is possible to process and visualized the requested product." ] }, { "cell_type": "markdown", "id": "separated-temple", "metadata": {}, "source": [ "#### Unzip data" ] }, { "cell_type": "code", "execution_count": null, "id": "eight-willow", "metadata": {}, "outputs": [], "source": [ "import zipfile" ] }, { "cell_type": "code", "execution_count": null, "id": "extensive-checkout", "metadata": {}, "outputs": [], "source": [ "def unzipData(filename, out_prefix):\n", " with zipfile.ZipFile(filename, 'r') as zip_ref:\n", " zip_ref.extractall(path = os.path.join(out_prefix, pathlib.Path(filename).stem))" ] }, { "cell_type": "code", "execution_count": null, "id": "354824b9-cdec-4b4c-a662-0027e4001106", "metadata": {}, "outputs": [], "source": [ "for year in ['2019', '2020', '2021']:\n", " filename = INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip'\n", " target_file = pathlib.Path(os.path.join(INPUT_DATA_DIR, pathlib.Path(pathlib.Path(filename).stem)))\n", " if not target_file.exists():\n", " unzipData(filename, INPUT_DATA_DIR)" ] }, { "cell_type": "markdown", "id": "initial-gregory", "metadata": {}, "source": [ "#### Read data and make a monthly average" ] }, { "cell_type": "code", "execution_count": null, "id": "looking-moderator", "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "import xesmf as xe\n", "import glob" ] }, { "cell_type": "markdown", "id": "8f8040ce-ee94-4fbd-ab2c-72c7d72c4e48", "metadata": {}, "source": [ "We need to regrid data." ] }, { "cell_type": "code", "execution_count": null, "id": "fb4571b6-23a5-4850-956c-fff6ec69b8e7", "metadata": {}, "outputs": [], "source": [ "def read_file(filename, variable, metadata, factor=1):\n", " tmp = xr.open_rasterio(filename, parse_coordinates=True)\n", " # Convert our xarray.DataArray into a xarray.Dataset\n", " tmp = tmp.to_dataset('band')*factor\n", " # Rename the dimensions to make it CF-convention compliant\n", " tmp = tmp.rename_dims({'y': 'latitude', 'x':'longitude'})\n", " # Rename the variable to a more useful name\n", " tmp = tmp.rename_vars({1: variable, 'y':'latitude', 'x':'longitude'})\n", " tmp[variable].attrs = {'units' : metadata['units'], 'long_name' : metadata['description']}\n", " return tmp" ] }, { "cell_type": "code", "execution_count": null, "id": "275b77d2-ae4f-41ac-bd23-ef2443a132cd", "metadata": { "tags": [] }, "outputs": [], "source": [ "output_grid = read_file(INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_2021/eu_cams_surface_' + variable_name.lower() + '_g_2021-' + month_number + '-' + month_nb_days + 't000000.tif', variable_name, metadata_variable)\n", "output_grid" ] }, { "cell_type": "markdown", "id": "twenty-venice", "metadata": {}, "source": [ "We now read these files using `xarray`. First, we make a list of all the geotiff files in a given folder. To ensure each raster is labelled correctly with its time, we can use a helper function `paths_to_datetimeindex()` to extract time information from the file paths we obtained above. We then load and concatenate each dataset along the time dimension using `xarray.open_rasterio()`, convert the resulting `xarray.DataArray` to a `xarray.Dataset`, and give the variable a more useful name (**PM10**)" ] }, { "cell_type": "code", "execution_count": null, "id": "backed-memory", "metadata": {}, "outputs": [], "source": [ "from datetime import datetime" ] }, { "cell_type": "code", "execution_count": null, "id": "upset-finance", "metadata": {}, "outputs": [], "source": [ "def paths_to_datetimeindex(paths):\n", " return [datetime.strptime(date.split('_')[-1].split('.')[0], '%Y-%m-%dt%f') for date in paths]" ] }, { "cell_type": "code", "execution_count": null, "id": "00ba1ed9-3de0-4a93-bdcd-36a78cbe781a", "metadata": {}, "outputs": [], "source": [ "def getData(dirtif, variable, metadata, factor=1, grid_out=None):\n", " geotiff_list = glob.glob(dirtif)\n", " # Create variable used for time axis\n", " time_var = xr.Variable('time', paths_to_datetimeindex(geotiff_list))\n", " # Load in and concatenate all individual GeoTIFFs\n", " xarray_list = []\n", " if grid_out is not None:\n", " nlats = len(grid_out.latitude.values)\n", " nlons = len(grid_out.longitude.values)\n", " for i in geotiff_list:\n", " tmp = read_file(i, variable, metadata, factor=factor)\n", " if grid_out is not None:\n", " print(\"regridding \", i)\n", " regridder = xe.Regridder(tmp, grid_out, 'conservative')\n", " tmp_regrid = regridder(tmp, keep_attrs=True)\n", " xarray_list.append(tmp_regrid)\n", " else:\n", " xarray_list.append(tmp)\n", " #print(xarray_list[0:2])\n", " geotiffs_da = xr.concat(xarray_list, dim=time_var)\n", " return geotiffs_da" ] }, { "cell_type": "code", "execution_count": null, "id": "owned-replication", "metadata": { "tags": [] }, "outputs": [], "source": [ "geotiff_ds = getData( INPUT_DATA_DIR + '/' + variable_name + '_'+ country_code + '_ADAMAPI_20*/*.tif', variable_name, metadata_variable, factor=1.e9, grid_out=output_grid)\n", "geotiff_ds[variable_name].attrs = {'units' : variable_unit, 'long_name' : variable_long_name }\n", "geotiff_ds" ] }, { "cell_type": "markdown", "id": "compliant-bahrain", "metadata": {}, "source": [ "#### Analyze data" ] }, { "cell_type": "markdown", "id": "indirect-english", "metadata": {}, "source": [ "Make yearly average for the month we study" ] }, { "cell_type": "code", "execution_count": null, "id": "indie-reconstruction", "metadata": {}, "outputs": [], "source": [ "geotiff_dm = geotiff_ds.groupby('time.year').mean('time', keep_attrs=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "dental-freight", "metadata": { "tags": [] }, "outputs": [], "source": [ "geotiff_dm" ] }, { "cell_type": "markdown", "id": "delayed-clearing", "metadata": {}, "source": [ "#### Visualize data" ] }, { "cell_type": "code", "execution_count": null, "id": "388ee206-daf4-42a2-ae2c-dcdf8acdf71e", "metadata": {}, "outputs": [], "source": [ "pip install cmaps \"holoviews<1.14.8\" GeoViews cartopy" ] }, { "cell_type": "code", "execution_count": null, "id": "known-thong", "metadata": {}, "outputs": [], "source": [ "import cartopy.crs as ccrs\n", "import matplotlib.pyplot as plt\n", "import cmaps" ] }, { "cell_type": "code", "execution_count": null, "id": "c5505520-0c2a-498f-ad12-01c3b2f095ca", "metadata": {}, "outputs": [], "source": [ "# To plot over Norway, taking a central longitude of 60 is fine. You may want to change it when plotting over different geographical areas\n", "central_longitude = town_coordinates['latitude']" ] }, { "cell_type": "code", "execution_count": null, "id": "mechanical-lottery", "metadata": {}, "outputs": [], "source": [ "# generate figure\n", "proj_plot = ccrs.Mercator(central_longitude=central_longitude)\n", "\n", "lcmap = cmaps.BlueYellowRed\n", "# Only plot values greater than 0\n", "p = geotiff_dm[variable_name].where(geotiff_dm[variable_name] > 0).plot(x='longitude', y='latitude',\n", " transform=ccrs.PlateCarree(),\n", " subplot_kws={\"projection\": proj_plot},\n", " size=8,\n", " col='year', col_wrap=3, robust=True,\n", " cmap=lcmap, add_colorbar=True)\n", "\n", "# We have to set the map's options on all four axes\n", "for ax,i in zip(p.axes.flat, geotiff_dm.year.values):\n", " ax.coastlines()\n", " ax.set_title('Surface ' + variable_name + '\\n' + month_name + ' ' + str(i), fontsize=10)\n", "\n", "plot_file = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')\n", "if os.path.exists(plot_file + '.bak'):\n", " os.remove(plot_file + '.bak')\n", "if os.path.exists(plot_file):\n", " os.rename(plot_file, plot_file + '.bak') \n", "plt.savefig(plot_file)" ] }, { "cell_type": "markdown", "id": "toxic-discretion", "metadata": {}, "source": [ "#### Plot one single date" ] }, { "cell_type": "code", "execution_count": null, "id": "legislative-adoption", "metadata": {}, "outputs": [], "source": [ "fig=plt.figure(figsize=(10,10))\n", "# Define the projection\n", "crs=ccrs.PlateCarree()\n", "\n", "# We're using cartopy and are plotting in Orthographic projection \n", "# (see documentation on cartopy)\n", "ax = plt.subplot(1, 1, 1, projection=ccrs.Mercator(central_longitude=central_longitude))\n", "ax.coastlines(resolution='10m')\n", "\n", "# custom colormap\n", "\n", "lcmap = cmaps.BlueYellowRed\n", "\n", "# We need to project our data to the new Mercator projection and for this we use `transform`.\n", "# we set the original data projection in transform (here PlateCarree)\n", "# we only plot values greather than 0\n", "img = geotiff_ds[variable_name].where(geotiff_ds[variable_name] > 0).sel(time='2021-' + month_number + '-15').plot(ax=ax,\n", " transform=ccrs.PlateCarree(),\n", " cmap=lcmap) \n", "\n", "# Title for plot\n", "plt.title('Surface ' + variable_name + '\\n 15th ' + month_name + ' 2021 over ' + country_fullname,\n", " fontsize = 16, fontweight = 'bold', pad=10)\n", "\n", "plot_file = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name +' _' + country_code + '_2021-' + month_number + '-15.png')\n", "if os.path.exists(plot_file + '.bak'):\n", " os.remove(plot_file + '.bak')\n", "if os.path.exists(plot_file):\n", " os.rename(plot_file, plot_file + '.bak') \n", "plt.savefig(plot_file)" ] }, { "cell_type": "code", "execution_count": null, "id": "3b55c774-cc88-4778-a843-d8746dd0e25b", "metadata": {}, "outputs": [], "source": [ "geotiff_ds = geotiff_ds.sortby('time')" ] }, { "cell_type": "markdown", "id": "00b430fb-cd20-4cae-a1c8-f7bab0dc05fd", "metadata": {}, "source": [ "#### Save Data Cube selection into netCDF" ] }, { "cell_type": "code", "execution_count": null, "id": "017e6edb-5478-4471-8036-4afcca4dc3aa", "metadata": {}, "outputs": [], "source": [ "output_file = os.path.join(OUTPUT_DATA_DIR, variable_name + \"_\" + month_name + \"_\" + country_code + \"_2019-2021.nc\")\n", "if os.path.exists(output_file + '.bak'):\n", " os.remove(output_file + '.bak')\n", "if os.path.exists(output_file):\n", " os.rename(output_file, output_file + '.bak') \n", "geotiff_ds.to_netcdf(output_file)" ] }, { "cell_type": "markdown", "id": "04c204a4-3a31-4805-aaf5-1d88cf3f3748", "metadata": {}, "source": [ "## **Step 7: Create Research Object and Share my work**" ] }, { "cell_type": "markdown", "id": "ded3d204-e278-4d46-9d93-a04e5df8956d", "metadata": {}, "source": [ "## Create Research Object in ROHUB" ] }, { "cell_type": "code", "execution_count": null, "id": "cfb8d48c-c378-4837-a7cd-99a759ef1c77", "metadata": {}, "outputs": [], "source": [ "pip install rohub" ] }, { "cell_type": "code", "execution_count": null, "id": "24d4ac14-50e1-4b7f-b233-09f66ec892a1", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pathlib\n", "from rohub import rohub, settings" ] }, { "cell_type": "markdown", "id": "95e3211d-783a-40c1-bedf-9b51794e44ca", "metadata": {}, "source": [ "#### Authenticating\n", "\n", "- If the code cell below fails, make sure you have created the two files:\n", " - `rohub-user`: contains your rohub username\n", " - `rohub-pwd`: contains your rohub password" ] }, { "cell_type": "code", "execution_count": null, "id": "98639345-9ea8-457d-9532-3c4aa6031801", "metadata": {}, "outputs": [], "source": [ "rohub_user = open(os.path.join(os.environ['HOME'],\"rohub-user\")).read().rstrip()\n", "rohub_pwd = open(os.path.join(os.environ['HOME'],\"rohub-pwd\")).read().rstrip()" ] }, { "cell_type": "code", "execution_count": null, "id": "5fcd2c68-0a16-4c3d-9087-64d3cb89fd7f", "metadata": { "tags": [] }, "outputs": [], "source": [ "rohub.login(username=rohub_user, password=rohub_pwd)" ] }, { "cell_type": "markdown", "id": "17fc23ec-5e0c-4879-b7de-410b74b90df0", "metadata": {}, "source": [ "## Create a new Exectuable RO" ] }, { "cell_type": "code", "execution_count": null, "id": "b6160f17-24ed-43dd-9920-e1f72152472d", "metadata": { "tags": [] }, "outputs": [], "source": [ "ro_title = variable_name + ' (' + month_name + ' 2019, 2020, 2021) in ' + country_fullname + \" Jupyter notebook demonstrating the usage of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services\"\n", "ro_research_areas = [\"Earth sciences\"]\n", "ro_description = \"This Research Object demonstrates how to use CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services and compute monthly map of \" + \\\n", " variable_name + \" over a given geographical area, here \" + country_fullname\n", "ro = rohub.ros_create(title=ro_title, research_areas=ro_research_areas, \n", " description=ro_description, \n", " use_template=True,\n", " ros_type=\"Executable Research Object\")" ] }, { "cell_type": "markdown", "id": "1b994bfa-3a31-4d73-a58f-6d7d8aa34d51", "metadata": {}, "source": [ "## Show metadata" ] }, { "cell_type": "code", "execution_count": null, "id": "b29df6ed-e245-41d3-bdd9-5a2189d3679f", "metadata": { "tags": [] }, "outputs": [], "source": [ "ro.show_metadata()" ] }, { "cell_type": "markdown", "id": "93ae4992-c839-4cc8-9fe1-0418a0a23363", "metadata": {}, "source": [ "## Add additional authors and/or contributors to our Research Object" ] }, { "cell_type": "code", "execution_count": null, "id": "b6ac1e0d-0cb9-4084-852b-8b8f663dbde2", "metadata": {}, "outputs": [], "source": [ "ro.set_authors(agents=author_emails)" ] }, { "cell_type": "code", "execution_count": null, "id": "baae2136-734d-472c-a56c-a962f44dc420", "metadata": {}, "outputs": [], "source": [ "ro.set_contributors(agents=contributor_emails)" ] }, { "cell_type": "markdown", "id": "d610d675-ce0a-4c28-8f70-cc898cb181ca", "metadata": {}, "source": [ "## Add publisher/copyright holder\n", "\n", "- Use [Research Organization Registry (ROR)](https://ror.org/) to find the identifier of your organization" ] }, { "cell_type": "markdown", "id": "90749caa-1db3-427a-a69e-07e5527aea88", "metadata": {}, "source": [ "### Add publishers " ] }, { "cell_type": "code", "execution_count": null, "id": "b9775742-1c33-45f1-bd3f-b224ad07f616", "metadata": { "tags": [] }, "outputs": [], "source": [ "ro.set_publishers(agents=list_publishers)" ] }, { "cell_type": "code", "execution_count": null, "id": "0f8ecf9d-8390-4d7e-b01a-a07272c7ad9b", "metadata": { "tags": [] }, "outputs": [], "source": [ "ro.set_copyright_holders(agents=list_copyright_holders)" ] }, { "cell_type": "code", "execution_count": null, "id": "a089ac8b-71c2-45ce-9c4f-9b2883bd2e24", "metadata": {}, "outputs": [], "source": [ "organizations = rohub.organizations_find()\n", "organizations" ] }, { "cell_type": "markdown", "id": "329a6446-3a61-4cb2-8596-3a647ea5d630", "metadata": {}, "source": [ "## Add RO Funding information" ] }, { "cell_type": "code", "execution_count": null, "id": "c5749469-af20-4bb1-8105-d203daa4225d", "metadata": { "tags": [] }, "outputs": [], "source": [ "if funded_by:\n", " ro.add_funding(grant_identifier=funded_by[\"grant_id\"], grant_name=funded_by[\"grant_Name\"],\n", " funder_name=funded_by[\"funder_name\"], grant_title=funded_by[\"grant_title\"],\n", " funder_doi=funded_by[\"funder_doi\"])" ] }, { "cell_type": "markdown", "id": "9df5260b-45f3-443d-a267-de81347dd029", "metadata": {}, "source": [ "## Add RO license" ] }, { "cell_type": "code", "execution_count": null, "id": "1720bd51-ee45-4092-a88a-f290dd2c7f1b", "metadata": {}, "outputs": [], "source": [ "ro.set_license(license_id=license) " ] }, { "cell_type": "markdown", "id": "d5ff9bb6-e970-4a56-95bf-3eed86b737c4", "metadata": {}, "source": [ "## Aggregate Resources\n", "\n", "- We will be adding all the resources generated by our notebook (data and plots)\n", "- Our data and plots can also be shared in B2DROP so we will get the shared link from B2DROP and add it to our research object" ] }, { "cell_type": "markdown", "id": "e9321b46-de48-4f9e-a729-3cc757f99473", "metadata": {}, "source": [ "### List RO folders for this type of RO" ] }, { "cell_type": "code", "execution_count": null, "id": "b8f82ad3-348f-424b-a239-3818ba3e4280", "metadata": { "tags": [] }, "outputs": [], "source": [ "myfolders = ro.list_folders()\n", "myfolders" ] }, { "cell_type": "markdown", "id": "0f4c1732-7431-4c44-b61e-e57b535367d3", "metadata": {}, "source": [ "## Aggregate internal resources" ] }, { "cell_type": "markdown", "id": "6200c842-c217-44a5-8ecf-7240c2ebb20b", "metadata": {}, "source": [ "### Add sketch to my RO" ] }, { "cell_type": "code", "execution_count": null, "id": "0908d97c-7db0-4527-9cac-96a135779e1a", "metadata": {}, "outputs": [], "source": [ "res_file_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')\n", "res_res_type = \"Sketch\"\n", "res_title = variable_long_name + \" [\" + variable_unit + \"] over \" + country_fullname + \" for \" + month_name + \" 2019, 2020 and 2021\"\n", "res_description = \"Monthly average maps of CAMS \" + variable_long_name + \" [\" + variable_unit + \"] over \" + country_fullname + \" in 2019, 2020 and 2021\"\n", "res_folder = 'output'\n", "\n", "ro.add_internal_resource(res_type=res_res_type, file_path=res_file_path, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "f3794372-e0df-48c0-863d-3c20d8bdbcd8", "metadata": {}, "source": [ "### Add conda environment to my RO" ] }, { "cell_type": "code", "execution_count": null, "id": "0fd00c09-8090-4072-ba6d-a29b2517851f", "metadata": {}, "outputs": [], "source": [ "def copy_conda_env(local_conda_path, shared_conda_path):\n", " bkfile = shared_conda_path + '.bak'\n", " if os.path.exists(bkfile):\n", " os.remove(bkfile)\n", " if os.path.exists(shared_conda_path):\n", " os.rename(shared_conda_path, bkfile)\n", " shutil.copy2(local_conda_path, shared_conda_path)" ] }, { "cell_type": "code", "execution_count": null, "id": "0fbe78ff-394f-4a92-93d9-fe75611a4ef0", "metadata": {}, "outputs": [], "source": [ "import datetime\n", "import shutil" ] }, { "cell_type": "code", "execution_count": null, "id": "5d13ae20-855f-469e-a046-403b3327429e", "metadata": {}, "outputs": [], "source": [ "for os_name, conda_filename in zip(['', 'linux-64', 'osx-64'], ['cams-conda.yml', 'conda-lock-linux-64.yml', 'conda-lock-osx-64.yml']):\n", " local_conda_path = os.path.join('./', conda_filename)\n", " shared_conda_path = os.path.join(TOOL_DATA_DIR, conda_filename)\n", " copy_conda_env(local_conda_path, shared_conda_path)\n", " \n", " res_file_path = os.path.join(INPUT_DATA_DIR, shared_conda_path)\n", " res_res_type = \"Script\"\n", " res_title = 'Conda environment ' + os_name\n", " if os_name == \"\":\n", " res_description = \"Conda environment used on EGI notebook on \" + datetime.date.today().strftime(\"%d/%m/%Y\")\n", " else:\n", " res_description = \"Conda environment generated with conda-lock for \" + os_name\n", " res_folder = 'input'\n", "\n", " ro.add_internal_resource(res_type=res_res_type, file_path=res_file_path, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "e8b80472-19c3-4e21-a3ae-2b606d918ff2", "metadata": {}, "source": [ "## Aggregate external resources" ] }, { "cell_type": "markdown", "id": "e855b117-90d5-468a-92f7-2a24c6573f90", "metadata": {}, "source": [ "### Get shared link from datahub\n", "1. Retrieve your client identifier, client secret and refresh token from https://aai.egi.eu/fedcloud/\n", "2. Create a new file in your HOME area for instance using nano (keep the `$` character in front of HOME; this is meant to be an environment variable):\n", "```\n", "nano $HOME/egi_fedcloud.cfg\n", "```\n", "3. Paste your client identifier, client secret and refresh token using the following syntax (do not forget `{` and `}` as well as comma `,` and columns `:`\n", "```\n", "{\n", "\"id\": \"XXXXXXXXXXXXXXXXX\",\n", "\"secret\": \"YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY\",\n", "\"token\": \"ffffffffffffffffffffffff\"\n", "}\n", "```\n", "4. You are ready to go. If you have issues, check first that the syntax is correct in your egi_fedcloud.cfg, that this file is located in the correct folder and then finally that your refresh token is still valid (check on https://aai.egi.eu/fedcloud/)" ] }, { "cell_type": "code", "execution_count": null, "id": "34371d8a-5192-4356-a3c3-040b0de3e469", "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import requests" ] }, { "cell_type": "markdown", "id": "3c6ab5eb-e1cb-41b1-b899-4891504bb7fa", "metadata": {}, "source": [ "#### EGI Datahub functions to initialize EGI datahub and get shared link" ] }, { "cell_type": "code", "execution_count": null, "id": "dfadd3d8-2d04-4ca5-aa43-dec5b5ebf418", "metadata": {}, "outputs": [], "source": [ "def egi_datahub_init():\n", " egi_fedcloud_filename = os.path.join(os.environ['HOME'], 'egi_fedcloud.cfg')\n", " with open(egi_fedcloud_filename, 'r') as convert_file:\n", " tmp = convert_file.read()\n", " egi_fedcloud_auth = json.loads(tmp)\n", " try:\n", " # Retrieving an OIDC token from Check-in\n", " data = {\n", " 'client_id': egi_fedcloud_auth['id'],\n", " 'client_secret': egi_fedcloud_auth['secret'],\n", " 'refresh_token': egi_fedcloud_auth['token'],\n", " 'scope': 'openid email profile',\n", " 'grant_type': 'refresh_token'\n", " }\n", " response = requests.post('https://aai.egi.eu/oidc/token', data=data, auth=(egi_fedcloud_auth['id'], \n", " egi_fedcloud_auth['secret']))\n", " #print(json.dumps(response.json(), indent=2))\n", " EGItoken = response.json()['access_token']\n", " headers = {\n", " 'X-Auth-Token': f\"egi:\" + EGItoken,\n", " 'Content-type': 'application/json',\n", " }\n", " # get current timestamp\n", " ts = datetime.datetime.now().timestamp()\n", " data = json.dumps({ \n", " 'name': 'REST and CDMI access token ' + str(ts), \n", " 'type': { \n", " 'accessToken': {} \n", " }, \n", " 'caveats': [ { \n", " 'type': 'interface', \n", " 'interface': 'rest' \n", " }] \n", " })\n", "\n", " response = requests.post('https://datahub.egi.eu/api/v3/onezone/user/tokens/named', headers=headers, data=data)\n", " DATAHUB_TOKEN = response.json()['token']\n", " return DATAHUB_TOKEN\n", " except:\n", " print(\"EGI Datahub Authentication problem: check your credentials\")" ] }, { "cell_type": "code", "execution_count": null, "id": "e348c819-cb8d-4457-ae3c-6e5f5c40f85d", "metadata": {}, "outputs": [], "source": [ "def egi_datahub_getlink(datahub_token, filename):\n", " bname = os.path.basename(filename)\n", " datahub_remote_prefix = 'https://cesnet-oneprovider-01.datahub.egi.eu/api/v3/oneprovider/files/'\n", " hname = filename.split('datahub/')[1]\n", " datahub_location = os.path.join(datahub_remote_prefix, hname)\n", " print(datahub_location)\n", " headers = { 'X-Auth-Token': datahub_token }\n", " response = requests.get(datahub_location, headers=headers)\n", " dh_fileid = response.json()[0]['id']\n", " \n", " headers = { 'X-Auth-Token': datahub_token, 'Content-Type': 'application/json',}\n", " data = json.dumps({ 'name': bname,\n", " 'fileId': dh_fileid\n", " })\n", " response = requests.post('https://cesnet-oneprovider-01.datahub.egi.eu/api/v3/oneprovider/shares', headers=headers, data=data)\n", " # print(json.dumps(response.json(), indent=2))\n", " shareIdGenerated=response.json()['shareId']\n", " \n", " headers = {'X-Auth-Token': datahub_token}\n", " response = requests.get('https://cesnet-oneprovider-01.datahub.egi.eu/api/v3/oneprovider/shares/'+shareIdGenerated, headers=headers) \n", " # print(json.dumps(response.json(), indent=2))\n", " publicURL = response.json()['publicUrl']\n", " return publicURL" ] }, { "cell_type": "markdown", "id": "e751890a-e99b-4afd-94c0-5d390bfce105", "metadata": {}, "source": [ "### EGI DataHub initialization " ] }, { "cell_type": "code", "execution_count": null, "id": "837cbef8-1f87-435f-89d4-598df91c22d9", "metadata": {}, "outputs": [], "source": [ "DATAHUB_TOKEN = egi_datahub_init()" ] }, { "cell_type": "markdown", "id": "2d6e6e90-4f5a-4bb0-a4b6-caf96debaf88", "metadata": {}, "source": [ "## Add inputs to my RO\n", "- I used ADAM to retrieve relevant data but I will be sharing what I retrieved from the data cube so that my collaborators do not have to re-download the same input data again." ] }, { "cell_type": "markdown", "id": "80b71bd0-b1ca-40e1-b389-5158a2505454", "metadata": {}, "source": [ "#### Geojson file used for rertieving data from ADAM data-cube" ] }, { "cell_type": "code", "execution_count": null, "id": "7009d65a-a7e9-4165-91b5-22fa3e22854c", "metadata": {}, "outputs": [], "source": [ "shared_input_path = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')\n", "print(shared_input_path)\n", "res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_input_path)\n", "res_type = \"Dataset\"\n", "res_title = \"Geojson for \" + country_fullname\n", "res_description = \"Geojson file used for retrieving data from the ADAM platform over \" + country_fullname\n", "res_folder = 'input'\n", "ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "3321bbac-3d1a-48b4-91a0-5ba0cd3e3435", "metadata": {}, "source": [ "#### Input data retrieved from ADAM data-cube" ] }, { "cell_type": "code", "execution_count": null, "id": "e48c303a-f07e-47c3-aa9c-9f6b89e4a2f1", "metadata": {}, "outputs": [], "source": [ "for year in ['2019', '2020', '2021']:\n", " shared_input_path = os.path.join(INPUT_DATA_DIR, variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip')\n", " print(shared_input_path)\n", " res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_input_path)\n", " res_type = \"Data Cube Product\"\n", " res_title = \"Data-Cube from ADAM platform over \" + country_fullname + \" in \" + month_name + \" \" + year\n", " res_description = \"This dataset is a data-Cube retrieved from the ADAM platform over \" + country_fullname + \" in \" + month_name + \" \" + year\n", " res_folder ='input'\n", " ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "30a76383-0dd3-4be5-832e-23ddfc39618e", "metadata": {}, "source": [ "### Add our Jupyter Notebook to our RO\n", "- Make a copy of the current notebook to the tool folder for sharing as an external resource" ] }, { "cell_type": "code", "execution_count": null, "id": "2ea55433-5a43-4146-8f2f-ad6a3b6203fa", "metadata": {}, "outputs": [], "source": [ "notebook_filename = 'RELIANCE_' + country_fullname + '_' + variable_name + '_month.ipynb'\n", "local_notebook_path = os.path.join('./', notebook_filename)\n", "shared_notebook_path = os.path.join(TOOL_DATA_DIR, notebook_filename)" ] }, { "cell_type": "code", "execution_count": null, "id": "fe99f43a-af32-48f3-907c-951e6e08af07", "metadata": {}, "outputs": [], "source": [ "shared_notebook_path" ] }, { "cell_type": "markdown", "id": "f2de8314-561a-4100-b0d4-b6e6e6b34351", "metadata": {}, "source": [ "#### Copy current notebook to shared datahub " ] }, { "cell_type": "code", "execution_count": null, "id": "08ba4a2c-76d7-470c-b85e-3965a7ff3de9", "metadata": {}, "outputs": [], "source": [ "bkfile = shared_notebook_path + '.bak'\n", "if os.path.exists(bkfile):\n", " os.remove(bkfile)\n", "if os.path.exists(shared_notebook_path):\n", " os.rename(shared_notebook_path, bkfile)\n", "shutil.copy2(local_notebook_path, shared_notebook_path)" ] }, { "cell_type": "markdown", "id": "d0f4098b-e8b1-49e4-851c-8d164380f3d4", "metadata": {}, "source": [ "### Create a shared link for my Jupyter Notebook" ] }, { "cell_type": "markdown", "id": "792f515b-988b-44d2-a555-57f961c166cc", "metadata": {}, "source": [ "#### Get public URL from EGI Datahub to share link in RO " ] }, { "cell_type": "code", "execution_count": null, "id": "ae96b55c-f027-410a-bd31-17590b2baf6c", "metadata": { "tags": [] }, "outputs": [], "source": [ "res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_notebook_path)\n", "res_type = \"Jupyter Notebook\"\n", "res_title = \"Jupyter Notebook of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services - Applied over \" + country_fullname + \" and variable \" + variable_long_name\n", "res_description = \"Jupyter Notebook for discovering, accessing and processing RELIANCE data cube, and creating a Research Object with results, and finally publish it in Zenodo\"\n", "res_folder = 'tool'\n", "ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "aa8b2477-3695-453d-94e3-98a76f3ce3c9", "metadata": {}, "source": [ "#### Add link to Jupyter notebook in github" ] }, { "cell_type": "code", "execution_count": null, "id": "65796b25-3c73-430e-a57a-9e91166ab157", "metadata": {}, "outputs": [], "source": [ "res_file_url = 'https://raw.githubusercontent.com/NordicESMhub/RELIANCE/main/content/demo-eosc-future/RELIANCE_Spain_NO2_month.ipynb'\n", "res_type = \"Jupyter Notebook\"\n", "res_title = \"Jupyter Notebook (Github) of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services - Applied over \" + country_fullname + \" and variable \" + variable_long_name\n", "res_description = \"Jupyter Notebook (stored in Github) for discovering, accessing and processing RELIANCE data cube, and creating a Research Object with results, and finally publish it in Zenodo\"\n", "res_folder = 'tool'\n", "ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "640d037c-c57b-4846-af37-813a2b8fe2cb", "metadata": {}, "source": [ "#### Add link to online rendered Jupyter notebook" ] }, { "cell_type": "code", "execution_count": null, "id": "3286bbe6-9594-4a02-93de-141b37875140", "metadata": {}, "outputs": [], "source": [ "res_file_url = 'https://nordicesmhub.github.io/RELIANCE/demo-eosc-future/RELIANCE_Spain_NO2_month.html'\n", "res_type = \"Result\"\n", "res_title = \"online rendered jupyter book of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services - Applied over \" + country_fullname + \" and variable \" + variable_long_name\n", "res_description = \"Jupyter Notebook (stored in Github) for discovering, accessing and processing RELIANCE data cube, and creating a Research Object with results, and finally publish it in Zenodo\"\n", "res_folder = 'tool'\n", "ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "174a59a9-528d-477d-95c2-d6a61a00bca9", "metadata": {}, "source": [ "#### Add first plot as Image to my Research Object (external resource from EGI Datahub)" ] }, { "cell_type": "code", "execution_count": null, "id": "bb7f9de9-cb74-4ddf-a525-113145d1a2c7", "metadata": { "tags": [] }, "outputs": [], "source": [ "shared_plot_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')\n", "print(shared_plot_path)\n", "res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_plot_path)\n", "res_type = \"Image\"\n", "res_title = variable_long_name + \" [\" + variable_unit + \"] over \" + country_fullname + \" for \" + month_name + \" 2019, 2020 and 2021\"\n", "res_description = \"Monthly average maps of CAMS \" + variable_long_name + \" [\" + variable_unit + \"] over \" + country_fullname + \" in 2019, 2020 and 2021\"\n", "res_folder = 'output'\n", "ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "4233eaa8-a6bd-4d86-b6fd-e5ed675ac944", "metadata": {}, "source": [ "#### Add second plot as Image to my Research Object (external resource from EGI Datahub)" ] }, { "cell_type": "code", "execution_count": null, "id": "c44512c5-a989-45a8-9b47-a66c8e0bc3c1", "metadata": {}, "outputs": [], "source": [ "shared_plot_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + ' _' + country_code + '_2021-' + month_number + '-15.png')\n", "res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_plot_path)\n", "res_type = \"Image\"\n", "res_title = variable_long_name + \" [\" + variable_unit + \"] over \" + country_fullname + \" on \" + month_name + \" 15, 2021\"\n", "res_description=\"Daily average maps of CAMS \" + variable_long_name + variable_unit + \"] over \" + country_fullname + \" on \" + month_name + \" 15, 2021\"\n", "res_folder = 'output'\n", "ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "92b3e6eb-14a9-4f65-936f-2eb061688b4f", "metadata": {}, "source": [ "#### Add netCDF file corresponding to Data cube selection (external resource from EGI Datahub)" ] }, { "cell_type": "code", "execution_count": null, "id": "dd008fe4-c185-4567-a717-1ba03dee4bc9", "metadata": {}, "outputs": [], "source": [ "shared_plot_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.nc')\n", "res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_plot_path)\n", "res_type=\"Result\"\n", "res_title=\"netCDF data for daily \" + variable_name + \"over \" + country_fullname + \" in \" + month_name + \" 2019, 2020 and 2021\"\n", "res_description=\"netCDF data corresponding to daily average of CAMS \" + variable_long_name + \" [\" + variable_unit + \"] over \" + country_fullname + \" for \" + month_name + \" 2019, \" + month_name + \" 2020 and \" + month_name + \" 2021\"\n", "res_folder = 'output'\n", "ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)" ] }, { "cell_type": "markdown", "id": "9fd455a2-6217-45c7-a69f-1249e82780be", "metadata": {}, "source": [ "## Additional metadata for the RO" ] }, { "cell_type": "markdown", "id": "8503028a-155b-4f37-a357-81cd08a23b79", "metadata": {}, "source": [ "### Add geolocation to my Research Object\n", "- We need to transform our geojson file into geojson-ld" ] }, { "cell_type": "code", "execution_count": null, "id": "bf9eb7fe-c60c-46e7-9683-208dfc656dc4", "metadata": {}, "outputs": [], "source": [ "from geojson_rewind import rewind\n", "import json" ] }, { "cell_type": "code", "execution_count": null, "id": "fa9ea80e-df4c-4e01-97a3-c1d665e3a540", "metadata": {}, "outputs": [], "source": [ "geojson_ld_file = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo-ld.json')\n", "bkfile = geojson_ld_file + '.bak'\n", "if os.path.exists(bkfile):\n", " os.remove(bkfile)\n", "if os.path.exists(geojson_ld_file):\n", " os.rename(geojson_ld_file, bkfile)\n", "shutil.copy2(os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json'), geojson_ld_file)\n", "\n", "with open(geojson_ld_file , 'r+') as f:\n", " data = json.load(f)\n", " output = rewind(data)\n", " output['@context'] = { \"geojson\": \"https://purl.org/geojson/vocab#\" } \n", " f.seek(0) \n", " json.dump(output, f, indent=None)\n", " f.truncate()" ] }, { "cell_type": "code", "execution_count": null, "id": "96583bed-485e-44b7-9f28-a427af5f73cf", "metadata": {}, "outputs": [], "source": [ "geolocation_file_path = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')\n", "ro.add_geolocation(body_specification_json=geolocation_file_path)" ] }, { "cell_type": "markdown", "id": "cead124a-adc1-43f0-b7fa-d846df3c9d8b", "metadata": {}, "source": [ "### Add tags" ] }, { "cell_type": "code", "execution_count": null, "id": "9553b056-37c4-4366-a6dd-ba30360b39df", "metadata": {}, "outputs": [], "source": [ "ro.add_keywords(keywords=[country_fullname, \"CAMS\", \"air quality\", \"copernicus\", variable_name, \"jupyter-notebook\"])" ] }, { "cell_type": "markdown", "id": "5a3a8e18-c457-4577-b30f-4dfad3448997", "metadata": {}, "source": [ "### Export to RO-crate" ] }, { "cell_type": "code", "execution_count": null, "id": "7e35fe3b-c7fd-48f4-9549-e1d438eabf01", "metadata": {}, "outputs": [], "source": [ "ro.export_to_rocrate(filename=\"climate_EU-CAMS_\" + country_code + \"_\" + variable_name + \"_ro-crate\", use_format=\"zip\")" ] }, { "cell_type": "markdown", "id": "5899f543-fc33-4bfc-aba8-fe2351efbf8a", "metadata": { "tags": [ "EOSC", "Jupyter", "Notebook", "Pangeo" ] }, "source": [ "### Take a snapshot of my RO" ] }, { "cell_type": "code", "execution_count": null, "id": "57ac1fb9-fdbf-4325-8095-a8fcc1bedb9d", "metadata": {}, "outputs": [], "source": [ "#snapshot_id=ro.snapshot()" ] }, { "cell_type": "markdown", "id": "23626464-5bc4-458f-9826-f106b22c159a", "metadata": {}, "source": [ "### Archive and publish to Zenodo, optionally assign DOI " ] }, { "cell_type": "code", "execution_count": null, "id": "f55b77a8-dcef-4544-8681-cb894951b156", "metadata": {}, "outputs": [], "source": [ "snapshot_title=\"Jupyter Notebook Analysing the Air quality during Covid-19 pandemic using Copernicus Atmosphere Monitoring Service - Applied over \" + country_fullname + ' (' + month_name + \" 2019, 2020, 2021) with \" + variable_long_name\n", "#snapshot_id_pub=ro.snapshot(title=snapshot_title, create_doi=True, publication_services=[\"Zenodo\"])\n", "#snapshot_id_pub" ] }, { "cell_type": "markdown", "id": "33b6ef6e-d811-40e2-9852-e92925324939", "metadata": {}, "source": [ "### Load the published Research Object" ] }, { "cell_type": "code", "execution_count": null, "id": "0c96bf47-7bd6-49db-8107-0267210beaec", "metadata": { "tags": [] }, "outputs": [], "source": [ "#published_ro = rohub.ros_load(identifier=snapshot_id)" ] }, { "cell_type": "markdown", "id": "5f36bdf4-06a0-4511-a376-3a5d5fe99c37", "metadata": {}, "source": [ "### Show the DOI and get the link" ] }, { "cell_type": "code", "execution_count": null, "id": "0cb0e238-72cb-4817-a531-7a035e503046", "metadata": {}, "outputs": [], "source": [ "#published_ro.show_publication()" ] }, { "cell_type": "markdown", "id": "a6ccc5b0-23bb-4159-833a-57fb1b45f4f4", "metadata": {}, "source": [ "### Fork and reuse existing RO to create derivative work" ] }, { "cell_type": "code", "execution_count": null, "id": "9d4482db-b29b-417a-9654-3cb9197566e6", "metadata": {}, "outputs": [], "source": [ "#fork_id=ro.fork(title=\"Forked Jupyter Notebook to analyze the Air quality during Covid-19 pandemic using Copernicus Atmosphere Monitoring Service\")\n", "#forked_ro = rohub.ros_load(identifier=fork_id)\n", "#forked_ro.show_metadata()" ] }, { "cell_type": "code", "execution_count": null, "id": "29a825b8-63df-4d91-9ffc-aa3cc295d977", "metadata": {}, "outputs": [], "source": [ "ro.delete()" ] }, { "cell_type": "code", "execution_count": null, "id": "428dea99-ba3b-46c2-b756-2d53745ceacb", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }