{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "setup, include=FALSE", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "knitr::opts_chunk$set(echo = TRUE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Introduction\n", "\n", "This document showcases a completely reproducible [particulate](https://en.wikipedia.org/wiki/Particulates) matter analysis.\n", "Starting from the used measurement devices, via software for data hosting, to the analysis and visualisation environment.\n", "It stands on the shoulders of communities who provide free and open resources in the spirit of [Open Science](https://en.wikipedia.org/wiki/Open_science):\n", "\n", "- **open hardware**: the [senseBox](https://sensebox.de/en/) project, [Arduino](https://en.wikipedia.org/wiki/Arduino)\n", "- **open data**: the [openSenseMap](https://opensensemap.org/) and its [API](https://api.opensensemap.org/) provide environmental data licensed under [PDDL 1.0](http://opendatacommons.org/licenses/pddl/summary/)\n", "- **free and open source software**: [software by the senseBox team](https://github.com/sensebox/) to host their services and [download the data into R](https://github.com/noerw/opensensmapR), [R](https://jupyter.org/) with a large number of packages (e.g. `rmarkdown`, `sf`, `dplyr`, ...), [Project Jupyter](https://jupyter.org/), [binder](https://mybinder.org/), [Rocker](https://www.rocker-project.org/), [Docker](https://docker.com/), ... and many more\n", "\n", "The actual analysis is based on the [opensensmapR vignette `osem-intro`](https://noerw.github.io/opensensmapR/inst/doc/osem-intro.html).\n", "The code and it's environment are published, documented, and packaged to support [reproducible research](https://doi.org/10.1045/january2017-nuest).\n", "\n", "The code repository is [https://github.com/nuest/sensebox-binder](https://github.com/nuest/sensebox-binder) and the [git version hash](https://git-scm.com/docs/git-rev-parse) is `r system(\"git rev-parse HEAD\", intern = TRUE)`.\n", "The repository can be opened interactively at [http://mybinder.org/v2/gh/nuest/sensebox-binder/master](http://mybinder.org/v2/gh/nuest/sensebox-binder/master).\n", "The code (this document as either [R Markdown](http://rmarkdown.rstudio.com/) or [Jupyter Notebook](https://nbformat.readthedocs.io/en/latest/)) and environment (a [Docker image](https://docs.docker.com/glossary/?term=image)) are archived with a [DOI](https://en.wikipedia.org/wiki/Digital_object_identifier): [`10.5281/zenodo.1135140`](https://doi.org/10.5281/zenodo.1135140).\n", "\n", "## Analysis\n", "\n", "In the remainder of this file, code \"chunks\" and text are [interspersed](https://en.wikipedia.org/wiki/Literate_programming) to provide a transparent and understandable workflow.\n", "\n", "The analysis of takes a look at _fine particulate matter measured in Germany at New Year's Eve 2018_.\n", "\n", "**Note**: The data is included in the archive as a backup in [JSON](https://en.wikipedia.org/wiki/JSON) format.\n", "This document by default can only be compiled as long as the openSenseMap API is available.\n", "To use local backup data, set the variable `online` to `FALSE` in the second code chunk.\n", "\n", "### Load required libraries\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "packages, warning=FALSE, message=FALSE", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "library(\"opensensmapr\")\n", "library(\"dplyr\")\n", "library(\"lubridate\")\n", "library(\"units\")\n", "library(\"sf\")\n", "library(\"leaflet\")\n", "library(\"readr\")\n", "library(\"jsonlite\")\n", "library(\"here\")\n", "library(\"maps\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "[output hidden]\n", "\n", "### Load data on senseBoxes\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "load_box_data", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "online <- TRUE # access online API or use local data backup?\n", "\n", "analysis_date <- lubridate::as_datetime(\"2018-01-01 00:00:00\")\n", "\n", "if(online) {\n", " # retrieve data from openSenseMap API\n", " all_boxes <- osem_boxes()\n", " pm25_boxes <- osem_boxes(\n", " exposure = 'outdoor',\n", " date = analysis_date, # ±4 hours\n", " phenomenon = 'PM2.5'\n", " )\n", "\n", " # update local data\n", " all_json <- toJSON(all_boxes, digits = NA, pretty = TRUE)\n", " write(all_json, file = here(\"data/all_boxes.json\"))\n", " pm25_json <- toJSON(pm25_boxes, digits = NA, pretty = TRUE)\n", " write(pm25_json, file = here(\"data/pm25_boxes.json\"))\n", "} else {\n", " # load data from file and fix column types\n", " all_boxes_file <- fromJSON(here(\"data/all_boxes.json\"))\n", " all_boxes <- type_convert(all_boxes_file,\n", " col_types = cols(\n", " exposure = col_factor(levels = NULL),\n", " model = col_factor(levels = NULL),\n", " grouptag = col_factor(levels = NULL)))\n", " class(all_boxes) <- c(\"sensebox\", class(all_boxes))\n", " pm25_boxes_file <- fromJSON(here(\"data/pm25_boxes.json\"))\n", " pm25_boxes <- type_convert(pm25_boxes_file,\n", " col_types = cols(\n", " exposure = col_factor(levels = NULL),\n", " model = col_factor(levels = NULL),\n", " grouptag = col_factor(levels = NULL)))\n", " class(pm25_boxes) <- c(\"sensebox\", class(pm25_boxes))\n", "}\n", "\n", "knitr::kable(data.frame(nrow(all_boxes), nrow(pm25_boxes)),\n", " col.names = c(\n", " \"# senseBoxes\",\n", " paste(\"# senseBoxes with PM2.5 measurements around\", format(analysis_date, \"%Y-%m-%d %T %Z\"))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Exploring openSenseMap\n", "\n", "The openSenseMap currently provides access to `r nrow(all_boxes)` senseBoxes of which `r nrow(pm25_boxes)` provide measurements of [PM2.5](https://www.umweltbundesamt.de/sites/default/files/medien/377/dokumente/infoblatt_feinstaub_pm2_5_en.pdf) around `r format(analysis_date, \"%Y-%m-%d %T %Z\")`.\n", "\n", "The following map shows the PM2.5 sensor locations, which are mostly deployed in central Europe.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "plot_outdoor_feinstaub", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "plot(pm25_boxes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Particulates at New Year's Eve in Münster\n", "\n", "_How many senseBoxes in Münster measure PM2.5?_\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "muenster_boxes", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "ms <- st_sfc(st_point(c(7.62571, 51.96236)))\n", "st_crs(ms) <- 4326\n", "\n", "pm25_boxes_sf <- st_as_sf(pm25_boxes, remove = FALSE, agr = \"identity\")\n", "names(pm25_boxes_sf) <- c(names(pm25_boxes), \"geometry\")\n", "\n", "pm25_boxes_sf <- cbind(pm25_boxes_sf, dist_to_ms = st_distance(ms, pm25_boxes_sf))\n", "max_dist <- set_units(7, km) # km from city center\n", "\n", "ms_boxes <- pm25_boxes_sf[pm25_boxes_sf$dist_to_ms < max_dist,c(\"X_id\", \"name\")]\n", "ms_boxes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "_Where are the sensors in Münster?_ [Does not work in Jupyter Notebook]\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "muenster_plot_map", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "sense_icon <- awesomeIcons(\n", " icon = 'cube',\n", " iconColor = '#ffffff',\n", " library = 'fa',\n", " markerColor = 'green'\n", ")\n", "\n", "leaflet() %>%\n", " addTiles() %>%\n", " addAwesomeMarkers(data = ms_boxes,\n", " popup = ~paste0(\"Name: \", name, \"
Id: \",\n", " \"\", X_id, \"\"),\n", " label = ~name,\n", " icon = sense_icon)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Now we retrieve data for `r nrow(ms_boxes)` senseBoxes with values in the area of interest.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "muenster_data", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "if(online) {\n", " class(ms_boxes) <- c(\"sensebox\", class(ms_boxes))\n", " ms_data <- osem_measurements(ms_boxes, phenomenon = \"PM2.5\",\n", " from = lubridate::as_datetime(\"2017-12-31 20:00:00\"),\n", " to = lubridate::as_datetime(\"2018-01-01 04:00:00\"),\n", " columns = c(\"value\", \"createdAt\", \"lat\", \"lon\", \"boxId\",\n", " \"boxName\", \"exposure\", \"sensorId\",\n", " \"phenomenon\", \"unit\", \"sensorType\"))\n", " # update local data\n", " data_json <- toJSON(ms_data, digits = NA, pretty = TRUE)\n", " write(data_json, file = here(\"data/ms_data.json\"))\n", "} else {\n", " # load data from file and fix column types\n", " ms_data_file <- fromJSON(here(\"data/ms_data.json\"))\n", " ms_data <- type_convert(ms_data_file,\n", " col_types = cols(\n", " sensorId = col_factor(levels = NULL),\n", " unit = col_factor(levels = NULL)))\n", " class(ms_data) <- c(\"sensebox\", class(ms_data))\n", "}\n", "\n", "summary(ms_data %>%\n", " select(value,sensorId,unit))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We can now plot `r nrow(ms_data)` measurements.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "muenster_plot_timeseries", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "plot(value~createdAt, ms_data,\n", " type = \"p\", pch = '*', cex = 2, # new year's style\n", " col = factor(ms_data$sensorId),\n", " xlab = NA,\n", " ylab = unique(ms_data$unit),\n", " main = \"Particulates measurements (PM2.5) on New Year 2017/2018\",\n", " sub = paste(nrow(ms_boxes), \"stations in Münster, Germany\\n\",\n", " \"Data by openSenseMap.org licensed under\",\n", " \"Public Domain Dedication and License 1.0\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "You can see, it was a [very \"particular\" celebration](http://www.dw.com/en/new-years-eve-are-fireworks-harming-the-environment/a-41957523).\n", "\n", "_Who are the record holders?_\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "top_three_boxes", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "top_measurements <- ms_data %>%\n", " arrange(desc(value))\n", "top_boxes <- top_measurements %>%\n", " distinct(sensorId, .keep_all = TRUE)\n", "knitr::kable(x = top_boxes %>%\n", " select(value, createdAt, boxName) %>%\n", " head(n = 3),\n", " caption = \"Top 3 boxes\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Note:** The timestamp is UTC and the local time is [CET](https://en.wikipedia.org/wiki/CET) (`UTC+1:00`).\n", "The value `999.9` is the maximum value measured by the used sensor [SDS011](https://nettigo.pl/attachments/398).\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "top_boxes", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "knitr::kable(top_boxes %>% filter(value == max(top_boxes$value)) %>%\n", " select(sensorId, boxName),\n", " col.names = c(\"Top sensor identifier\", \"Top box name\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Congratulations (?) to boxes for holding the record values just after the new year started.\n", "\n", "_Where are the record holding boxes?_\n", "\n", "**Static plot**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "top_box_plot", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "top_boxes_sf <- top_boxes %>%\n", " filter(value == max(top_boxes$value)) %>%\n", " st_as_sf(coords = c('lon', 'lat'), crs = 4326)\n", "bbox <- sf::st_bbox(top_boxes_sf)\n", "\n", "world <- map(\"world\", plot = FALSE, fill = TRUE) %>%\n", " sf::st_as_sf() %>%\n", " sf::st_geometry()\n", "\n", "plot(world,\n", " xlim = round(bbox[c(1,3)], digits = 1),\n", " ylim = round(bbox[c(2,4)], digits = 1),\n", " axes = TRUE, las = 1)\n", "plot(top_boxes_sf, add = TRUE, col = \"red\", cex = 2)\n", "title(\"senseBox stations in Münster with highest PM2.5 measurements\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Interactive map**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "top_box_map", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "fireworks_icon <- makeIcon(\n", " # icon source: https://commons.wikimedia.org/wiki/File:Fireworks_2.png\n", " iconUrl = \"320px-Fireworks_2.png\", iconWidth = 160)\n", "\n", "leaflet(data = top_boxes_sf) %>%\n", " addTiles() %>%\n", " addMarkers(popup = ~as.character(boxName),\n", " label = ~as.character(boxName),\n", " icon = fireworks_icon)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Jupyter Notebook conversion\n", "\n", "A converted version of this file can in Jupyter Notebook format is automatically created with each rendering using [`ipyrmd`](https://pypi.python.org/pypi/ipyrmd/0.4.3).\n", "`ipyrmd` is installed with other dependencies in the file `install.R`.\n", "The Jupyter Notebook is intended to increase accessability for users unfamiliar with R Markdown.\n", "The automatic conversion does not handle code statements within sentences.\n", "\n", "```{bash convert_to_ipynb}\n", "ipyrmd --to ipynb --from Rmd -y -o sensebox-analysis.ipynb sensebox-analysis.Rmd\n", "```\n", "\n", "## Conclusion\n", "\n", "This document creates a reproducible workflow of open data from a public API.\n", "It leverages software to create a transparent analysis, which can be easily opened, investigated, and even developed further with a web browser by opening the public code repository on a free cloud platform.\n", "To increase reproducibility, the data is cached manually as CSV files (i.e. text-based data format) and stored next to the analysis file.\n", "A use may adjust this workflow to her own needs, like different location or time period, by adjust the R code and deleting the data files.\n", "In case the exploration platform ceases to exist, users may still recreate the environment themselves based on the files in the code repository.\n", "A snapshot of the files from the code repository, i.e. data, code, and runtime environment (as a Docker image) are stored in a reliable data repository.\n", "While the manual workflow of building the image and running it is very likely to work in the future, the archived image captures the exact version of the software the original author used.\n", "\n", "The presented solution might seem complex.\n", "But it caters to many different levels of expertise (one-click open in browser vs. self-building of images and local inspection) and has several fail-safes (binder may disappear, GitHub repository may be lost, Docker may stop working).\n", "The additional work is much outweighed by the advantages in transparency and openness.\n", "\n", "## License\n", "\n", "\"Creative\n", "\n", "This document is licensed under a [Creative Commons Attribution 4.0 International ](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0).\n", "\n", "## Metadata\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "Rmd_chunk_options": "metadata", "autoscroll": false, "collapsed": true }, "outputs": [], "source": [ "devtools::session_info()" ] } ], "metadata": { "Rmd_header": { "author": "Daniel Nüst", "date": "`r format(Sys.time(), '%Y-%m-%d %T %Z')`", "output": "html_document", "title": "Reproducible Environmental Observations and Analysis" }, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r" } }, "nbformat": 4, "nbformat_minor": 0 }