--- layout: docu railroad: query_syntax/groupby.js redirect_from: - /docs/sql/query_syntax/grouping_sets title: GROUPING SETS --- `GROUPING SETS`, `ROLLUP` and `CUBE` can be used in the `GROUP BY` clause to perform a grouping over multiple dimensions within the same query. Note that this syntax is not compatible with [`GROUP BY ALL`]({% link docs/stable/sql/query_syntax/groupby.md %}#group-by-all). ## Examples Compute the average income along the provided four different dimensions: ```sql -- the syntax () denotes the empty set (i.e., computing an ungrouped aggregate) SELECT city, street_name, avg(income) FROM addresses GROUP BY GROUPING SETS ((city, street_name), (city), (street_name), ()); ``` Compute the average income along the same dimensions: ```sql SELECT city, street_name, avg(income) FROM addresses GROUP BY CUBE (city, street_name); ``` Compute the average income along the dimensions `(city, street_name)`, `(city)` and `()`: ```sql SELECT city, street_name, avg(income) FROM addresses GROUP BY ROLLUP (city, street_name); ``` ## Description `GROUPING SETS` perform the same aggregate across different `GROUP BY clauses` in a single query. ```sql CREATE TABLE students (course VARCHAR, type VARCHAR); INSERT INTO students (course, type) VALUES ('CS', 'Bachelor'), ('CS', 'Bachelor'), ('CS', 'PhD'), ('Math', 'Masters'), ('CS', NULL), ('CS', NULL), ('Math', NULL); ``` ```sql SELECT course, type, count(*) FROM students GROUP BY GROUPING SETS ((course, type), course, type, ()); ``` | course | type | count_star() | |--------|----------|-------------:| | Math | NULL | 1 | | NULL | NULL | 7 | | CS | PhD | 1 | | CS | Bachelor | 2 | | Math | Masters | 1 | | CS | NULL | 2 | | Math | NULL | 2 | | CS | NULL | 5 | | NULL | NULL | 3 | | NULL | Masters | 1 | | NULL | Bachelor | 2 | | NULL | PhD | 1 | In the above query, we group across four different sets: `course, type`, `course`, `type` and `()` (the empty group). The result contains `NULL` for a group which is not in the grouping set for the result, i.e., the above query is equivalent to the following statement of `UNION ALL` clauses: ```sql -- Group by course, type: SELECT course, type, count(*) FROM students GROUP BY course, type UNION ALL -- Group by type: SELECT NULL AS course, type, count(*) FROM students GROUP BY type UNION ALL -- Group by course: SELECT course, NULL AS type, count(*) FROM students GROUP BY course UNION ALL -- Group by nothing: SELECT NULL AS course, NULL AS type, count(*) FROM students; ``` `CUBE` and `ROLLUP` are syntactic sugar to easily produce commonly used grouping sets. The `ROLLUP` clause will produce all “sub-groups” of a grouping set, e.g., `ROLLUP (country, city, zip)` produces the grouping sets `(country, city, zip), (country, city), (country), ()`. This can be useful for producing different levels of detail of a group by clause. This produces `n+1` grouping sets where n is the amount of terms in the `ROLLUP` clause. `CUBE` produces grouping sets for all combinations of the inputs, e.g., `CUBE (country, city, zip)` will produce `(country, city, zip), (country, city), (country, zip), (city, zip), (country), (city), (zip), ()`. This produces `2^n` grouping sets. ## Identifying Grouping Sets with `GROUPING_ID()` The super-aggregate rows generated by `GROUPING SETS`, `ROLLUP` and `CUBE` can often be identified by `NULL`-values returned for the respective column in the grouping. But if the columns used in the grouping can themselves contain actual `NULL`-values, then it can be challenging to distinguish whether the value in the resultset is a “real” `NULL`-value coming out of the data itself, or a `NULL`-value generated by the grouping construct. The `GROUPING_ID()` or `GROUPING()` function is designed to identify which groups generated the super-aggregate rows in the result. `GROUPING_ID()` is an aggregate function that takes the column expressions that make up the grouping(s). It returns a `BIGINT` value. The return value is `0` for the rows that are not super-aggregate rows. But for the super-aggregate rows, it returns an integer value that identifies the combination of expressions that make up the group for which the super-aggregate is generated. At this point, an example might help. Consider the following query: ```sql WITH days AS ( SELECT year("generate_series") AS y, quarter("generate_series") AS q, month("generate_series") AS m FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY) ) SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id()" FROM days GROUP BY GROUPING SETS ( (y, q, m), (y, q), (y), () ) ORDER BY y, q, m; ``` These are the results: | y | q | m | grouping_id() | |-----:|-----:|-----:|--------------:| | 2023 | 1 | 1 | 0 | | 2023 | 1 | 2 | 0 | | 2023 | 1 | 3 | 0 | | 2023 | 1 | NULL | 1 | | 2023 | 2 | 4 | 0 | | 2023 | 2 | 5 | 0 | | 2023 | 2 | 6 | 0 | | 2023 | 2 | NULL | 1 | | 2023 | 3 | 7 | 0 | | 2023 | 3 | 8 | 0 | | 2023 | 3 | 9 | 0 | | 2023 | 3 | NULL | 1 | | 2023 | 4 | 10 | 0 | | 2023 | 4 | 11 | 0 | | 2023 | 4 | 12 | 0 | | 2023 | 4 | NULL | 1 | | 2023 | NULL | NULL | 3 | | NULL | NULL | NULL | 7 | In this example, the lowest level of grouping is at the month level, defined by the grouping set `(y, q, m)`. Result rows corresponding to that level are simply aggregate rows and the `GROUPING_ID(y, q, m)` function returns `0` for those. The grouping set `(y, q)` results in super-aggregate rows over the month level, leaving a `NULL`-value for the `m` column, and for which `GROUPING_ID(y, q, m)` returns `1`. The grouping set `(y)` results in super-aggregate rows over the quarter level, leaving `NULL`-values for the `m` and `q` column, for which `GROUPING_ID(y, q, m)` returns `3`. Finally, the `()` grouping set results in one super-aggregate row for the entire resultset, leaving `NULL`-values for `y`, `q` and `m` and for which `GROUPING_ID(y, q, m)` returns `7`. To understand the relationship between the return value and the grouping set, you can think of `GROUPING_ID(y, q, m)` writing to a bitfield, where the first bit corresponds to the last expression passed to `GROUPING_ID()`, the second bit to the one-but-last expression passed to `GROUPING_ID()`, and so on. This may become clearer by casting `GROUPING_ID()` to `BIT`: ```sql WITH days AS ( SELECT year("generate_series") AS y, quarter("generate_series") AS q, month("generate_series") AS m FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY) ) SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id(y, q, m)", right(GROUPING_ID(y, q, m)::BIT::VARCHAR, 3) AS "y_q_m_bits" FROM days GROUP BY GROUPING SETS ( (y, q, m), (y, q), (y), () ) ORDER BY y, q, m; ``` Which returns these results: | y | q | m | grouping_id(y, q, m) | y_q_m_bits | |-----:|-----:|-----:|---------------------:|------------| | 2023 | 1 | 1 | 0 | 000 | | 2023 | 1 | 2 | 0 | 000 | | 2023 | 1 | 3 | 0 | 000 | | 2023 | 1 | NULL | 1 | 001 | | 2023 | 2 | 4 | 0 | 000 | | 2023 | 2 | 5 | 0 | 000 | | 2023 | 2 | 6 | 0 | 000 | | 2023 | 2 | NULL | 1 | 001 | | 2023 | 3 | 7 | 0 | 000 | | 2023 | 3 | 8 | 0 | 000 | | 2023 | 3 | 9 | 0 | 000 | | 2023 | 3 | NULL | 1 | 001 | | 2023 | 4 | 10 | 0 | 000 | | 2023 | 4 | 11 | 0 | 000 | | 2023 | 4 | 12 | 0 | 000 | | 2023 | 4 | NULL | 1 | 001 | | 2023 | NULL | NULL | 3 | 011 | | NULL | NULL | NULL | 7 | 111 | Note that the number of expressions passed to `GROUPING_ID()`, or the order in which they are passed is independent from the actual group definitions appearing in the `GROUPING SETS`-clause (or the groups implied by `ROLLUP` and `CUBE`). As long as the expressions passed to `GROUPING_ID()` are expressions that appear some where in the `GROUPING SETS`-clause, `GROUPING_ID()` will set a bit corresponding to the position of the expression whenever that expression is rolled up to a super-aggregate. ## Syntax <div id="rrdiagram"></div>