{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cosmos-UK Soil Moisture (UKCEH)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Context\n", "### Purpose\n", "To load and visualise daily hydrometeorological and soil data from the 2013-2019 public COSMOS-UK dataset {cite:p}`cosmosuk_data21`.\n", "\n", "### Sensor description\n", "Since 2013 the UK Centre for Ecology & Hydrology (UKCEH) has established the world’s most spatially dense national network of cosmic-ray neutron sensors (CRNSs) {cite:p}`Zredaetal2012_cosmosreview` to monitor soil moisture across the UK. The Cosmic-ray Soil Moisture Observing System for the UK (COSMOS-UK) delivers field-scale soil water volumetric content (VWC) measurements for around 50 sites in near-real time. In addition to measuring field-scale (or local) soil moisture, the network collects a large number of hydrometeorological and soil data variables, including VWC measured by point-scale (or site) soil moisture sensors {cite:p}`Evans2016_cosmosuk`.\n", "\n", "This notebook explores a subset of 4 out of 51 stations available in the public COSMOS-UK dataset {cite:p}`cosmosuk_data21`. These stations represent the first sites to prototype COSMOS sensors in the UK, see further details in Evans *et al.* (2016) and they are situated in human-intervened areas (grassland and cropland), except for one in a woodland land cover site.\n", "\n", "The media below, available in the UKCEH YouTube channel, summarises the concept of cosmic-ray neutron sensors and how they provide non-invasive soil moisture measurments at field scale. \n", "\n", ":::{iframe} https://www.youtube.com/embed/3roY_cHsn9c\n", ":width: 100%\n", "COSMOS-UK using cosmic-ray neutron sensors to monitor soil moisture. Source: UKCEH.\n", ":::\n", "\n", "### Highlights\n", "* Fetch COSMOS-UK dataset files through `intake`.\n", "* Inspect the available metadata with information about the sites, their locations and other site-specific attributes.\n", "* Explore relationships between daily mean soil moisture and potential evapotranspiration derived from the meteorological measurements at the site.\n", "* Analyse yearly change of daily mean soil moisture observations.\n", "* Compare local and site soil moisture measurements at daily resolution.\n", "\n", "### Contributions\n", "\n", "#### Dataset originator/creator\n", "* UK Centre for Ecology & Hydrology (creator)\n", "* Natural Environment Research Council (support)\n", "\n", ":::{important}\n", "Data from COSMOS-UK up to the end of 2019 are available for download from the UKCEH Environmental Information Data Centre (EIDC). The data are accompanied by documentation that describes the site-specific instrumentation, data and processing including quality control. The dataset is [available for download](https://doi.org/10.5285/b5c190e4-e35d-40ea-8fbe-598da03a1185) under the terms of the Open Government Licence.\n", "\n", "COSMOS-UK work was supported by the Natural Environment Research Council award number NE/R016429/1 as part of the UK-SCAPE programme delivering National Capability.\n", ":::" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import intake\n", "import holoviews as hv\n", "import panel as pn\n", "import matplotlib.pyplot as plt\n", "from bokeh.models.formatters import DatetimeTickFormatter\n", "from datetime import datetime\n", "\n", "import hvplot.pandas\n", "import hvplot.xarray # noqa\n", "\n", "import pooch\n", "\n", "import warnings\n", "warnings.filterwarnings(action='ignore')\n", "\n", "pd.options.display.max_columns = 10\n", "hv.extension('bokeh')\n", "pn.extension()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set project structure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "notebook_folder = './notebook'\n", "if not os.path.exists(notebook_folder):\n", " os.makedirs(notebook_folder)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetch and load data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's download the sample data. We use pooch to fetch and unzip them directly from a Zenodo repository." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pooch.retrieve(\n", " url=\"doi:10.5281/zenodo.6567018/subset_COSMOS-UK_HydroSoil_Daily_2013-2019.zip\",\n", " known_hash=\"md5:3755cb069bc48c5efc081905110e169b\",\n", " processor=pooch.Unzip(extract_dir=os.path.join(notebook_folder,'data')),\n", " path=f\".\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load an intake catalog for the downloaded data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# set catalogue location\n", "catalog_file = os.path.join(notebook_folder, 'catalog.yaml')\n", "\n", "with open(catalog_file, 'w') as f:\n", " f.write('''\n", "sources:\n", " data_siteid:\n", " driver: intake.source.csv.CSVSource\n", " parameters:\n", " stationid:\n", " description: five letter code for the COSMOS-UK site\n", " type: str\n", " default: CHIMN\n", " resolution:\n", " description: temporal resolution\n", " type: str\n", " default: Daily\n", " allowed:\n", " - Daily\n", " - Hourly\n", " - SH\n", " args:\n", " urlpath: \"{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019/COSMOS-UK_{{stationid}}_HydroSoil_{{resolution}}_2013-2019.csv\"\n", " csv_kwargs:\n", " na_values: [-9999]\n", " parse_dates: ['DATE_TIME']\n", "\n", " data_all:\n", " driver: intake.source.csv.CSVSource\n", " parameters:\n", " resolution:\n", " description: temporal resolution\n", " type: str\n", " default: Daily\n", " allowed:\n", " - Daily\n", " - Hourly\n", " - SH\n", " args:\n", " urlpath: \"{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019/COSMOS-UK_*.csv\"\n", " csv_kwargs:\n", " na_values: [-9999]\n", " parse_dates: ['DATE_TIME']\n", "\n", " metadata_sites:\n", " driver: intake.source.csv.CSVSource\n", " args:\n", " urlpath: \"{{ CATALOG_DIR }}/data/COSMOS-UK_SiteMetadata_2013-2019.csv\"\n", " csv_kwargs:\n", " header: 0\n", " parse_dates: [ 'START_DATE','END_DATE']\n", "\n", " metadata_measurements:\n", " driver: intake.source.csv.CSVSource\n", " parameters:\n", " resolution:\n", " description: temporal resolution\n", " type: str\n", " default: Daily\n", " allowed:\n", " - Daily\n", " - Hourly\n", " - SH\n", " args:\n", " urlpath: \"{{ CATALOG_DIR }}/data/COSMOS-UK_HydroSoil_{{resolution}}_2013-2019_Metadata.csv\"\n", " csv_kwargs:\n", " header: 0\n", "\n", " location:\n", " driver: intake_xarray.image.ImageSource\n", " parameters:\n", " stationid:\n", " description: five letter code for the COSMOS-UK site\n", " type: str\n", " default: CHIMN\n", " args:\n", " urlpath: \"https://eip.ceh.ac.uk/hydrodata/cosmos-uk/maps/airphoto/1000px/{{stationid}}.jpg\"\n", " storage_options: {'anon': True}\n", "''')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cat = intake.open_catalog(catalog_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load metadata\n", "\n", "Here we load COSMOS-UK metadata into memory. The metadata contains multiple columns about the sites, their locations and other site-specific attributes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metadata = cat.metadata_sites().read()\n", "print(metadata.columns.tolist())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this example, we will explore a subset of four stations, all of them with start date in 2013. Only the Wytham Woods station ceased on 10th January 2016. This station is situated in a Broadleaf woodland land cover which also hosts Environmental Change Network (ECN) and FLUXNET monitoring sites (see further details [here](https://cosmos.ceh.ac.uk/sites/WYTH1)). The dataframe contains each site name, id and corresponding land cover. CHIMN and WADDN are located situated in improved grassland, and SHEEP is in arable and horticulture. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metadata[['SITE_NAME','SITE_ID','LAND_COVER']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A key feature of COSMOS-UK stations is their capability of monitoring field-scale soil moisture. The CRNSs VWC value is an average soil moisture measurement (%) across an estimated, variable footprint of radius up to 200 m and estimated variable measurement depth of between approximately 0.1 and 0.8 m. It is worth mentioning the measurement depth depends on the soil moisture content as well as lattice water and soil organic matter water equivalent (see [Cooper et al. 2021](https://doi.org/10.5194/essd-13-1737-2021)). The greater the actual soil water content, the shallower the penetrative depth. Let's explore the notional footprint of the analysed stations from the CEH COSMOS-UK website." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# set sliders\n", "station_list = list(metadata.SITE_ID.tolist())\n", "\n", "target_station = pn.widgets.Select(name = 'Station', options = station_list)\n", "\n", "@pn.depends(target_station.param.value)\n", "def plot_footprint(station):\n", " location_da = cat.location(stationid=station).to_dask()\n", " p = location_da.hvplot.rgb(x='x', y='y', bands='channel', data_aspect=1, flip_yaxis=True, xaxis=False, yaxis=None, hover=False)\n", " return p\n", "\n", "plot_stations = pn.Row(\n", " plot_footprint,\n", " pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode=\"fixed\"),\n", " width_policy='max', height_policy='max',\n", ")\n", "\n", "plot_stations.embed()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load daily data\n", "\n", "Here we load COSMOS-UK daily data into memory. The daily data is the level with the highest processing and derived from subhourly data. Note only certain variables are provided at this level." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "site_daily_all = cat.data_all(resolution='Daily').read()\n", "print(site_daily_all.columns.tolist())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To further understand the meaning of above columns, the COSMOS-UK dataset include a separate metadata file by time resolution. Let's explore the metadata for daily measurements. The dataframe below includes further details of each variable, including the unit, aggregation and data type. For instance, soil moisture measurements at daily resolution refer to the daily mean derived from CRNSs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metadata_daily = cat.metadata_measurements(resolution='Daily').read()\n", "metadata_daily" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Timeseries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The plot below shows two timeseries, soil moisture and potential evapotranspiration (PE), provided by the daily COSMOS-UK dataset. PE refers to the potential evaporation from soils plus transpiration by plants (so called evapotranspiration). PE assumes there is always adequate moisture to match the evapotranspiration demand. We evidence this relationship in the daily aggregated data of both variables as it is shown in the plot below. We also note each station has a different time span with the SITE_ID equal to WYTH1 containing the shortest records." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# set sliders\n", "station_list = list(metadata.SITE_ID.tolist())\n", "\n", "target_station = pn.widgets.Select(name = 'Station', options = station_list)\n", "\n", "# set formater for dates\n", "formatter = DatetimeTickFormatter(months='%b %Y')\n", "\n", "@pn.depends(target_station.param.value)\n", "def plot_pe_vwc(station):\n", " daily_dataset = cat.data_siteid(resolution='Daily', stationid=station).read()\n", " daily_dataset.dropna(subset = ['COSMOS_VWC','PE'], inplace=True) #remove empty rows\n", " \n", " p1=daily_dataset.hvplot(x='DATE_TIME', y=['COSMOS_VWC'], xformatter=formatter, xlabel = 'Date', ylabel = 'Volumetric Water Content (%)', color='blue', title='Soil Moisture (CRNS VWC)', line_width=0.8, fontscale=1.2, padding=0.2)\n", " p2=daily_dataset.hvplot(x='DATE_TIME', y=['PE'], xformatter=formatter, xlabel = 'Date', ylabel = 'Potential Evapotranspiration (mm)', color='red', title='Potential Evapotranspiration (1 day)', line_width=0.8, fontscale=1.2, padding=0.2)\n", "\n", " return (p1 + p2).cols(1)\n", "\n", "plot_scatterplot = pn.Row(\n", " plot_pe_vwc,\n", " pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode=\"fixed\"),\n", " width_policy='max', height_policy='max',\n", ")\n", "\n", "plot_scatterplot.embed()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Correlation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To explore further the seasonal dynamics of the above variables, let's generate correlation charts grouped by season. The highest values of PE are in the summer followed by spring, fall and winter. The forest site, WYTH1, has higher soil moisture values than the human-intervened places, CHIMN and WADDN (improved grassland), and SHEEP (arable and horticulture site)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "def season(df):\n", " \"\"\"Add season column based on lat and month\n", " \"\"\"\n", " seasons = {3: 'spring', 4: 'spring', 5: 'spring',\n", " 6: 'summer', 7: 'summer', 8: 'summer',\n", " 9: 'fall', 10: 'fall', 11: 'fall',\n", " 12: 'winter', 1: 'winter', 2: 'winter'}\n", " return df.assign(season=df.DATE_TIME.dt.month.map(seasons))\n", "\n", "site_daily_all = season(site_daily_all)\n", "\n", "custom_dict = {'winter': 0, 'spring': 1, 'summer': 3, 'fall':4}\n", "plot_season = site_daily_all.sort_values('season', key=lambda x: x.map(custom_dict)).hvplot.scatter(x='COSMOS_VWC', y='PE',\n", " row='season', col='SITE_ID', alpha=0.2, ylabel='PE (mm)', xlabel='VWC (%)',\n", " fontsize = {'title': 15, 'xticks': 9, 'yticks': 9, 'labels':11}, shared_axes=True \n", ")\n", "plot_season" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Heatmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The heatmap below allow us to discover temporal patterns from daily means of soil moisture. We observe 2018 contains the lowest consecutive values of VWC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "plot_heatmap = site_daily_all.hvplot.heatmap(\n", " x='DATE_TIME',\n", " y='SITE_ID',\n", " C='COSMOS_VWC',\n", " xformatter=formatter,\n", " title='Time series of CRNS soil moisture',\n", " cmap='RdYlBu',\n", " width=600,\n", " height=300,\n", " xlabel='',\n", " ylabel='Site ID',\n", " fontsize = {'title': 15, 'xticks': 12, 'yticks': 15}\n", ")\n", "plot_heatmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load sub-hourly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The subhourly data contains all preprocessed weather and soil variables, except CRNSs. Let's explore the columns of the subhourly datasets of one of the stations, SHEEP." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subhourly_dataset = cat.data_siteid(resolution='SH', stationid='SHEEP').read()\n", "print(subhourly_dataset.columns.tolist())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similar to the daily observation, the metadata file for subhourly resolution informs variable long names, their resolution, units, aggregation details and data types. In this case, most of the variables are measured. For soil moisture, the measurements provided are by time domain transmissometry (TDT) sensors. These sensors provide point measurements of soil moisture at different depths as it commonly conducted in soil moisture in-situ sensing." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "output_scroll" ] }, "outputs": [], "source": [ "metadata_subhourly = cat.metadata_measurements(resolution='SH').read()\n", "metadata_subhourly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison of soil moisture probes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To compare CNRSs (local) and TDT (site) soil moisture measurements at daily resolution, it is necessary to resample the TDT measurements from subhourly to daily. The cell below defines a function to resample and join daily CNRS and resampled TDT. The function yields an interactive `hvplot` by station ID from the merged observations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "@pn.depends(target_station.param.value)\n", "def site_daily(target_station):\n", " \"\"\"Timeseries plot showing the daily mean soil moisture by sensor type\"\"\"\n", "\n", " # subhourly\n", " daily_dataset = cat.data_siteid(resolution='Daily', stationid=target_station).read()\n", " daily_dataset.index = daily_dataset.DATE_TIME.astype('datetime64[ns]')\n", "\n", " subhourly_dataset = cat.data_siteid(resolution='SH', stationid=target_station).read()\n", "\n", " target_columns = subhourly_dataset.columns.str.endswith('_VWC')\n", "\n", " daily_aggregate = subhourly_dataset.groupby(subhourly_dataset['DATE_TIME'].dt.date, as_index=True)[subhourly_dataset.columns[subhourly_dataset.columns.str.endswith('_VWC')]].mean()\n", " daily_aggregate.index = daily_aggregate.index.astype('datetime64[ns]')\n", "\n", " daily_joined = daily_dataset.join(daily_aggregate)\n", " target_columns = subhourly_dataset.columns[subhourly_dataset.columns.str.endswith('_VWC')].tolist() + ['COSMOS_VWC']\n", " daily_joined = daily_joined[target_columns]\n", " daily_joined = daily_joined.reset_index()\n", " daily_joined.index = daily_joined.DATE_TIME.astype('datetime64[ns]')\n", " daily_joined.dropna(axis=1, how='all', inplace=True)\n", "\n", " daily_joined_long = pd.melt(daily_joined, id_vars='DATE_TIME',\n", " var_name=\"Sensor\", value_name=\"VWC\")\n", "\n", " plot_daily = daily_joined_long.hvplot(x='DATE_TIME', y='VWC', by='Sensor',\n", " xformatter=formatter,\n", " label='Variation in VWC by sensor type',\n", " ylabel='Volumetric Water Content (%)',\n", " xlabel='Time', xlim=(datetime(2014,1,1), datetime(2019,12,31)))\n", "\n", " return plot_daily.opts(legend_position='top', **settings_lineplots)\n", "\n", "settings_lineplots = dict(padding=0.1, height=400, width=700, fontsize={'title': '120%','labels': '120%', 'ticks': '100%'})\n", "\n", "plot_timeseries = pn.Row(\n", " site_daily,\n", " pn.Column(pn.Spacer(height=5), target_station, background='#f0f0f0', sizing_mode=\"fixed\"),\n", " width_policy='max', height_policy='max',\n", ")\n", "\n", "plot_timeseries.embed()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We conclude all sites contain at least two TDT probes, and their temporal sequence follow a similar pattern as the CRNS. It is worth mentioning the pattern might differ when we explore other stations in the full COSMOS-UK dataset which can contain more than two TDT probes.\n", "\n", "Soils contain a complex porous structure which means moisture can be non-uniformly distributed horizontally and vertically. For site measurements such as TDTS even distanced a few metres apart they measure \"extremely local\" moisture (and can sometimes be trapped in a water pocket leading to artificially high VWC or be pressed against a rock and produce artificially low VWC). In contrast, local measurements such as CRNS average over all of this heterogeneity but introduces its own sources of noise (biomass water, surface water, variable depth and horizontal footprint)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "This notebook has demonstrated the use of certain open-source python packages to explore the 2013-2019 COSMOS-UK dataset:\n", "\n", "* `intake` to easily fetch and manipulate daily and subhourly data, their metadata and other data types (remote images).\n", "* `hvplot` to propose some interactive visualisations of hydrometeorological and soil data.\n", "* `pandas` to resample subhourly data and merge them into a daily dataset of soil moisture." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Citing this Notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please see [CITATION.cff](https://github.com/eds-book/435f534c-e49b-43c3-9bd6-3393100bef3f/blob/main/CITATION.cff) for the full citation information. The citation file can be exported to APA or BibTex formats (learn more [here](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional information\n", "\n", "**Review**: This notebook has been reviewed by one or more members of the Environmental Data Science book community. The open review is available [here](https://github.com/alan-turing-institute/environmental-ds-book/pull/50).\n", "\n", "**Dataset**: 2013-2019 COSMOS-UK dataset (further details of the version in @cosmosuk_data21).\n", "\n", "**License**: The code in this notebook is licensed under the MIT License. The Environmental Data Science book is licensed under the Creative Commons by Attribution 4.0 license. See further details [here](https://github.com/alan-turing-institute/environmental-ds-book/blob/main/LICENSE).\n", "\n", "**Contact**: If you have any suggestion or report an issue with this notebook, feel free to [create an issue](https://github.com/alan-turing-institute/environmental-ds-book/issues/new/choose) or send a direct message to [environmental.ds.book@gmail.com](mailto:environmental.ds.book@gmail.com)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "remove-input" ] }, "outputs": [], "source": [ "from datetime import date\n", "\n", "print('Notebook repository version: v2025.6.0')\n", "print(f'Last tested: {date.today()}')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }