--- layout: post title: Handling oceanographic data-quality flags tags: [oce, R] category: R year: 2016 month: 4 day: 11 summary: Flags will be a key feature of the next release of the oce R package. This posting illustrates the basics of how flags can be used for hydrographic section data. --- **Preface.** The code of this blog posting will only work with the latest `development`-branch of the `oce` source. # Introduction The `section` dataset from `oce` provides a good example of a dataset containing flags. # Methods ```{r} library(oce) data(section) Sflag <- section[['salinityFlag']] ``` A good first step is to see what flags are being used ```{r} table(Sflag) ``` This dataset uses the WHP convention for flags (see `?section`), in which a flag value of 2 is used to indicate data considered to be acceptable. Thus, the table indicates that only 3/4 of the salinity measurements are considered to be acceptable. This makes this a good dataset to illustrate the handling of flags. First, extract some relevant data. ```{r} S <- section[['salinity']] T <- section[['temperature']] theta <- section[['theta']] p <- section[['pressure']] Sflag <- section[['salinityFlag']] ``` Next, plot salinity flag vs salinity ```{r} plot(S, Sflag, pch=Sflag-1) ``` This suggests that, apart from one distinct outlier at a salinity of 26, the salinities of bad data are generally in the range of the salinities of good data. Next, examine temperature and salinity together. ```{r} plotTS(as.ctd(S, T, p), pch=Sflag-1) ``` The last two plots suggest that one of the points marked as being bad (flag=4) is distinctly anomalous compared with all the other data. A detailed analysis could be made of that point (e.g. first isolate the station, then plot it in detail) but time may be better spent simply focussing on data that have been assessed as being reasonable during the data-archiving process. ```{r} ok <- Sflag == 2 plotTS(as.ctd(S[ok], T[ok], p[ok])) ``` Another approach is to use `handleFlags` to select the good data ```{r} section2 <- handleFlags(section) plotTS(section2) ``` where we have used the fact that `plotTS` can recognize section objects. The use of `handleFlags` is also recommended because it carries over to other types of plots, e.g. a salinity section. For example, a salinity section of all the data is produced with ```{r} plot(section, which="salinity") ``` while one of just the acceptable data is produced with ```{r} plot(section2, which="salinity") ``` # Exercises 1. Find which station has the very low salinity, and examine that station in more detail. 2. Try as above, but only discarding data with `salinityFlag==4`, which are known to be bad (i.e. retain both acceptable and questionable data). 3. Continue step 2, with other types of analysis (e.g. examine spatial dependence). 4. Look online for the source of the `section` dataset, to find out more about how the data-quality flags were assigned. # Resources * R source code used here: [2016-04-10-flags.Rmd]({{ site.url }}/assets/2016-04-10-flags.R). * Jekyll source code for this blog entry: [2016-04-10-flags.Rmd](https://raw.github.com/dankelley/dankelley.github.io/master/assets/2016-04-10-flags.Rmd)