---
title: "Relational operators for intervals with the intrval R package"
layout: default
published: true
category: Code
tags: [R, functions, special, intrval]
disqus: petersolymos
promote: false
---
I recently posted a piece about [how to write and document special functions in R](http://peter.solymos.org/code/2016/11/26/how-to-write-and-document-special-functions-in-r.html). I meant that as a prelude for the topic I am writing about in this post. Let me start at the beginning. The other day Dirk Eddelbuettel tweeted about the new release of the [**data.table**](https://cran.r-project.org/package=data.table) package (v1.9.8).
There were [new features announced](https://cran.r-project.org/web/packages/data.table/news.html) for joins based on `%inrange%` and `%between%`. That got me thinking: it would be really cool to generalize this idea for different intervals, for example as `x %[]% c(a, b)`.
## Motivation
We want to evaluate if values of `x` satisfy the condition `x >= a & x <= b` given that `a <= b`. Typing `x %[]% c(a, b)` instead of the previous expression is not much shorter (14 vs. 15 characters with counting spaces). But considering the `a <= b` condition as well, it becomes a saving (`x >= min(a, b) & x <= mmax(a, b)` is 31 characters long). And sorting is really important, because by flipping `a` and `b`, we get quite different answers:
```
x <- 5
x >= 1 & x <= 10
# [1] TRUE
x >= 10 & x <= 1
# [1] FALSE
```
Also, `min` and `max` will not be very useful when we want to vectorize the expression. We need to use `pmin` and `pmax` for obvious reasons:
```
x >= min(1:10, 10:1) & x <= max(10:1, 1:10)
# [1] TRUE
x >= pmin(1:10, 10:1) & x <= pmax(10:1, 1:10)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
```
If interval endpoints can also be open or closed, and allowing them to flip around makes the semantics of left/right closed/open interval definitions hard. We can thus all agree that there is a need for an expression, like `x %[]% c(a, b)`, that is _compact_, _flexible_, and _invariant_ to endpoint sorting. This is exactly what the [**intrval**](https://github.com/psolymos/intrval) package is for!
## What's in the package
Functions for evaluating if values of vectors are within
different open/closed intervals
(`x %[]% c(a, b)`), or if two closed
intervals overlap (`c(a1, b1) %[o]% c(a2, b2)`).
Operators for negation and directional relations also implemented.
### Value-to-interval relations
Values of `x` are compared to interval endpoints `a` and `b` (`a <= b`).
Endpoints can be defined as a vector with two values (`c(a, b)`): these values will be compared as a single interval with each value in `x`.
If endpoints are stored in a matrix-like object or a list,
comparisons are made element-wise.
```
x <- rep(4, 5)
a <- 1:5
b <- 3:7
cbind(x=x, a=a, b=b)
x %[]% cbind(a, b) # matrix
x %[]% data.frame(a=a, b=b) # data.frame
x %[]% list(a, b) # list
```
If lengths do not match, shorter objects are recycled. Return values are logicals.
Note: interval endpoints are sorted internally thus ensuring the condition
`a <= b` is not necessary.
These value-to-interval operators work for numeric (integer, real) and ordered vectors, and object types which are measured at least on ordinal scale (e.g. dates).
#### Closed and open intervals
The following special operators are used to indicate closed (`[`, `]`) or open (`(`, `)`) interval endpoints:
Operator | Expression | Condition
---------|------------------|-------------------
`%[]%` | `x %[]% c(a, b)` | `x >= a & x <= b`
`%[)%` | `x %[)% c(a, b)` | `x >= a & x < b`
`%(]%` | `x %(]% c(a, b)` | `x > a & x <= b`
`%()%` | `x %()% c(a, b)` | `x > a & x < b`
#### Negation and directional relations
Equal | Not equal | Less than | Greater than
---------|-----------|-----------|----------------
`%[]%` | `%)(%` | `%[<]%` | `%[>]%`
`%[)%` | `%)[%` | `%[<)%` | `%[>)%`
`%(]%` | `%](%` | `%(<]%` | `%(>]%`
`%()%` | `%][%` | `%(<)%` | `%(>)%`
The helper function `intrval_types` can be used to
print/plot the following summary:
### Interval-to-interval relations
The overlap of two closed intervals, [`a1`, `b1`] and [`a2`, `b2`],
is evaluated by the `%[o]%` operator (`a1 <= b1`, `a2 <= b2`).
Endpoints can be defined as a vector with two values
(`c(a1, b1)`)or can be stored in matrix-like objects or a lists
in which case comparisons are made element-wise.
Note: interval endpoints
are sorted internally thus ensuring the conditions
`a1 <= b1` and `a2 <= b2` is not necessary.
```
c(2:3) %[o]% c(0:1)
list(0:4, 1:5) %[o]% c(2:3)
cbind(0:4, 1:5) %[o]% c(2:3)
data.frame(a=0:4, b=1:5) %[o]% c(2:3)
```
If lengths do not match, shorter objects are recycled.
These value-to-interval operators work for numeric (integer, real)
and ordered vectors, and object types which are measured at
least on ordinal scale (e.g. dates).
`%)o(%` is used for the negation,
directional evaluation is done via the operators `%[]%`.
Equal | Not equal | Less than | Greater than
----------|------------|------------|----------------
`%[o]%` | `%)o(%` | `%[]%`
### Operators for discrete variables
The previous operators will return `NA` for unordered factors.
Set overlap can be evaluated by the base `%in%` operator and its negation
`%nin%`. (This feature is really [redundant](http://peter.solymos.org/code/2016/11/26/how-to-write-and-document-special-functions-in-r.html), I know, but decided to include regardless...)
## Install
Install development version from GitHub (not yet on CRAN):
```R
library(devtools)
install_github("psolymos/intrval")
```
The package is licensed under [GPL-2](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html).
## Examples
```
library(intrval)
## bounding box
set.seed(1)
n <- 10^4
x <- runif(n, -2, 2)
y <- runif(n, -2, 2)
d <- sqrt(x^2 + y^2)
iv1 <- x %[]% c(-0.25, 0.25) & y %[]% c(-1.5, 1.5)
iv2 <- x %[]% c(-1.5, 1.5) & y %[]% c(-0.25, 0.25)
iv3 <- d %()% c(1, 1.5)
plot(x, y, pch = 19, cex = 0.25, col = iv1 + iv2 + 1,
main = "Intersecting bounding boxes")
plot(x, y, pch = 19, cex = 0.25, col = iv3 + 1,
main = "Deck the halls:\ndistance range from center")
## time series filtering
x <- seq(0, 4*24*60*60, 60*60)
dt <- as.POSIXct(x, origin="2000-01-01 00:00:00")
f <- as.POSIXlt(dt)$hour %[]% c(0, 11)
plot(sin(x) ~ dt, type="l", col="grey",
main = "Filtering date/time objects")
points(sin(x) ~ dt, pch = 19, col = f + 1)
## QCC
library(qcc)
data(pistonrings)
mu <- mean(pistonrings$diameter[pistonrings$trial])
SD <- sd(pistonrings$diameter[pistonrings$trial])
x <- pistonrings$diameter[!pistonrings$trial]
iv <- mu + 3 * c(-SD, SD)
plot(x, pch = 19, col = x %)(% iv +1, type = "b", ylim = mu + 5 * c(-SD, SD),
main = "Shewhart quality control chart\ndiameter of piston rings")
abline(h = mu)
abline(h = iv, lty = 2)
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
## compare 95% confidence intervals with 0
(CI.D9 <- confint(lm.D9))
# 2.5 % 97.5 %
# (Intercept) 4.56934 5.4946602
# groupTrt -1.02530 0.2833003
0 %[]% CI.D9
# (Intercept) groupTrt
# FALSE TRUE
lm.D90 <- lm(weight ~ group - 1) # omitting intercept
## compare 95% confidence of the 2 groups to each other
(CI.D90 <- confint(lm.D90))
# 2.5 % 97.5 %
# groupCtl 4.56934 5.49466
# groupTrt 4.19834 5.12366
CI.D90[1,] %[o]% CI.D90[2,]
# 2.5 %
# TRUE
DATE <- as.Date(c("2000-01-01","2000-02-01", "2000-03-31"))
DATE %[<]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] TRUE FALSE FALSE
DATE %[]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE TRUE FALSE
DATE %[>]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE FALSE TRUE
```
For more examples, see the [unit-testing script](https://github.com/psolymos/intrval/blob/master/tests/tests.R).
## Feedback
Please check out the package and use the [issue tracker](https://github.com/psolymos/intrval/issues)
to suggest a new feature or report a problem.
#### Update (2016-12-04)
Sergey Kashin [pointed out](https://twitter.com/sergeykashin/status/805501566123966464/photo/1) that some operators are redundant. It is now explained in the manual:
Note that some operators return identical results but
are syntactically different:
`%[<]%` and `%[<)%` both evaluate `x < a`;
`%[>]%` and `%(>]%` both evaluate `x > b`;
`%(<]%` and `%(<)%` evaluate `x <= a`;
`%[>)%` and `%(>)%` both evaluate `x >= b`.
This is so because we evaluate only one end of the interval
but still conceptually referring to the relationship
defined by the right-hand-side interval object.
This implies 2 conditional logical evaluations
instead of treating it as a single 3-level ordered factor.
#### Update (2016-12-06)
**intrval** R package v0.1 is on CRAN: [https://CRAN.R-project.org/package=intrval]( https://CRAN.R-project.org/package=intrval)