--- title: "Functional pearl: Nested Datacubes" published: 2017-04-12T18:30:00Z tags: haskell, maps, query, aggregation, grouping, datacubes, cubes, OLAP description: "Multi-dimensional aggregation and grouping using nested datacubes in Haskell." --- Introduction ------------ Multi-dimensional aggregation and grouping are common forms of analysis traditionally done with SQL databases, OLAP tools (Online Analytical Processing) and/or Excel pivot tables. These traditional tools use a *flat* table representation, whereby *tuples* are mapped to *measures* without any nesting. Increasingly, users are using general purpose programming languages for further offline analysis: languages like R, Python, Scala and Haskell. These often give us the opportunity to work with *nested* (inductive) structures, which functional languages like Haskell are particularly good at dealing with. Hopefully this post will prove that point. Flat versus nested ------------------ Let's start with an example flat dataset: \
10001London2017-04-019.99
10003London2017-04-014.99
10002Shanghai2017-04-0286.38
10004Shanghai2017-04-0343.15
Location
London →
10001 →2017-04-019.99
10003 →2017-04-014.99
Shanghai →
10002 →2017-04-0286.38
10004 →2017-05-0343.15
Location
London →
Year
2017 →
Month
04 →
DayValue
2017-04-01 →14.98
Shanghai →
Year
2017 →
Month
04 →
DayValue
2017-04-0286.38
05 →
DayValue
2017-05-0343.15
\ Note that aggregations can have types that look similar to unnesting/ungrouping, but of course they do not form isomorphisms with our grouping operators. In general, we cannot undo an aggregation. > -- the type looks similar to unnest, but this function aggregates! > rollup1 :: (Monoid m, Key k, Key k') => (v -> m) -> MMap k (MMap k' v) -> MMap k m > rollup1 = fmap . foldMap Parallel Aggregation -------------------- Since MMap has the general monoid instance we defined earlier, our cube aggregations will also all have monoid instances (by induction). This means we can compute partial cubes for input chunks in parallel and then mconcat/reduce them to obtain the final cube. > mapReduce :: (Monoid m) => (v -> m) -> [v] -> m > mapReduce f = mconcat . map f -- parallel map ~~~{.haskell} λ> :t mapReduce myCubeSum mapReduce myCubeSum :: Key k => [k :. Trade] -> Location :. Year :. Month :. Day :. Sum Double ~~~ Subcubes -------- I'm using the word "subcube" here to describe cubes built from aggregating larger/deeper grouping definitions. Instead of creating a series of operators to aggregate over each successive level of nesting, we will use a positional style, as we did with the grouping definitions. First, let's create a new name for fmap: > keep :: (Monoid m) => (v -> m) -> MMap a v -> MMap a m > keep = fmap We can now just specify in the term, what we want to happen to each dimension as it appears in the type. For example to rollup just the Year and Month, we can write the following: > -- a "subcube" definition, parameterized by an aggregation function > locationByMonth :: (Monoid m) => (v -> m) > -> Location :. Year :. Month :. Day :. v > -> Location :. Month :. m > locationByMonth = keep . rollup . keep . rollup I think the above is rather nice! \ Note: the literal haskell for this entire post can be found [here](https://raw.githubusercontent.com/willtim/timphilipwilliams.com/master/posts/2017-04-12-nested-datacubes.lhs). * * * * * * * * References ---------- \[1\] [Map Comprehensions][MapComp] \[2\] [TransformListComp][TransformListComp] [MapComp]: 2014-06-05-map-comprehensions.html (Map Comprehensions) [TransformListComp]: https://downloads.haskell.org/~ghc/8.0.1/docs/html/users_guide/glasgow_exts.html#generalised-sql-like-list-comprehensions (Generalised (SQL-like) List Comprehensions) \ Appendix --------

#### Auxiliary functions

> year :: Day -> Year > year d = let (y, _, _) = toGregorian d in Year y > month :: Day -> Month > month d = let (_, m, _) = toGregorian d in Month m