---
layout: docu
railroad: query_syntax/groupby.js
redirect_from:
- /docs/sql/query_syntax/grouping_sets
- /cal/7
- /cal/07
title: GROUPING SETS
---

`GROUPING SETS`, `ROLLUP` and `CUBE` can be used in the `GROUP BY` clause to perform a grouping over multiple dimensions within the same query.
Note that this syntax is not compatible with [`GROUP BY ALL`]({% link docs/stable/sql/query_syntax/groupby.md %}#group-by-all).

## Examples

Compute the average income along the provided four different dimensions:

```sql
-- the syntax () denotes the empty set (i.e., computing an ungrouped aggregate)
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY GROUPING SETS ((city, street_name), (city), (street_name), ());
```

Compute the average income along the same dimensions:

```sql
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY CUBE (city, street_name);
```

Compute the average income along the dimensions `(city, street_name)`, `(city)` and `()`:

```sql
SELECT city, street_name, avg(income)
FROM addresses
GROUP BY ROLLUP (city, street_name);
```

## Description

`GROUPING SETS` perform the same aggregate across different `GROUP BY clauses` in a single query.

```sql
CREATE TABLE students (course VARCHAR, type VARCHAR);
INSERT INTO students (course, type)
VALUES
    ('CS', 'Bachelor'), ('CS', 'Bachelor'), ('CS', 'PhD'), ('Math', 'Masters'),
    ('CS', NULL), ('CS', NULL), ('Math', NULL);
```

```sql
SELECT course, type, count(*)
FROM students
GROUP BY GROUPING SETS ((course, type), course, type, ());
```

| course |   type   | count_star() |
|--------|----------|-------------:|
| Math   | NULL     | 1            |
| NULL   | NULL     | 7            |
| CS     | PhD      | 1            |
| CS     | Bachelor | 2            |
| Math   | Masters  | 1            |
| CS     | NULL     | 2            |
| Math   | NULL     | 2            |
| CS     | NULL     | 5            |
| NULL   | NULL     | 3            |
| NULL   | Masters  | 1            |
| NULL   | Bachelor | 2            |
| NULL   | PhD      | 1            |

In the above query, we group across four different sets: `course, type`, `course`, `type` and `()` (the empty group). The result contains `NULL` for a group which is not in the grouping set for the result, i.e., the above query is equivalent to the following statement of `UNION ALL` clauses:

```sql
-- Group by course, type:
SELECT course, type, count(*)
FROM students
GROUP BY course, type
UNION ALL
-- Group by type:
SELECT NULL AS course, type, count(*)
FROM students
GROUP BY type
UNION ALL
-- Group by course:
SELECT course, NULL AS type, count(*)
FROM students
GROUP BY course
UNION ALL
-- Group by nothing:
SELECT NULL AS course, NULL AS type, count(*)
FROM students;
```

`CUBE` and `ROLLUP` are syntactic sugar to easily produce commonly used grouping sets.

The `ROLLUP` clause will produce all “sub-groups” of a grouping set, e.g., `ROLLUP (country, city, zip)` produces the grouping sets `(country, city, zip), (country, city), (country), ()`. This can be useful for producing different levels of detail of a group by clause. This produces `n+1` grouping sets where n is the amount of terms in the `ROLLUP` clause.

`CUBE` produces grouping sets for all combinations of the inputs, e.g., `CUBE (country, city, zip)` will produce `(country, city, zip), (country, city), (country, zip), (city, zip), (country), (city), (zip), ()`. This produces `2^n` grouping sets.

## Identifying Grouping Sets with `GROUPING_ID()`

The super-aggregate rows generated by `GROUPING SETS`, `ROLLUP` and `CUBE` can often be identified by `NULL`-values returned for the respective column in the grouping. But if the columns used in the grouping can themselves contain actual `NULL`-values, then it can be challenging to distinguish whether the value in the resultset is a “real” `NULL`-value coming out of the data itself, or a `NULL`-value generated by the grouping construct. The `GROUPING_ID()` or `GROUPING()` function is designed to identify which groups generated the super-aggregate rows in the result.

`GROUPING_ID()` is an aggregate function that takes the column expressions that make up the grouping(s). It returns a `BIGINT` value. The return value is `0` for the rows that are not super-aggregate rows. But for the super-aggregate rows, it returns an integer value that identifies the combination of expressions that make up the group for which the super-aggregate is generated. At this point, an example might help. Consider the following query:

```sql
WITH days AS (
    SELECT
        year("generate_series")    AS y,
        quarter("generate_series") AS q,
        month("generate_series")   AS m
    FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id()"
FROM days
GROUP BY GROUPING SETS (
    (y, q, m),
    (y, q),
    (y),
    ()
)
ORDER BY y, q, m;
```

These are the results:

|  y   |  q   |  m   | grouping_id() |
|-----:|-----:|-----:|--------------:|
| 2023 | 1    | 1    | 0             |
| 2023 | 1    | 2    | 0             |
| 2023 | 1    | 3    | 0             |
| 2023 | 1    | NULL | 1             |
| 2023 | 2    | 4    | 0             |
| 2023 | 2    | 5    | 0             |
| 2023 | 2    | 6    | 0             |
| 2023 | 2    | NULL | 1             |
| 2023 | 3    | 7    | 0             |
| 2023 | 3    | 8    | 0             |
| 2023 | 3    | 9    | 0             |
| 2023 | 3    | NULL | 1             |
| 2023 | 4    | 10   | 0             |
| 2023 | 4    | 11   | 0             |
| 2023 | 4    | 12   | 0             |
| 2023 | 4    | NULL | 1             |
| 2023 | NULL | NULL | 3             |
| NULL | NULL | NULL | 7             |

In this example, the lowest level of grouping is at the month level, defined by the grouping set `(y, q, m)`. Result rows corresponding to that level are simply aggregate rows and the `GROUPING_ID(y, q, m)` function returns `0` for those. The grouping set `(y, q)` results in super-aggregate rows over the month level, leaving a `NULL`-value for the `m` column, and for which `GROUPING_ID(y, q, m)` returns `1`. The grouping set `(y)` results in super-aggregate rows over the quarter level, leaving `NULL`-values for the `m` and `q` column, for which `GROUPING_ID(y, q, m)` returns `3`. Finally, the `()` grouping set results in one super-aggregate row for the entire resultset, leaving `NULL`-values for `y`, `q` and `m` and for which `GROUPING_ID(y, q, m)` returns `7`.

To understand the relationship between the return value and the grouping set, you can think of `GROUPING_ID(y, q, m)` writing to a bitfield, where the first bit corresponds to the last expression passed to `GROUPING_ID()`, the second bit to the one-but-last expression passed to `GROUPING_ID()`, and so on. This may become clearer by casting `GROUPING_ID()` to `BIT`:

```sql
WITH days AS (
    SELECT
        year("generate_series")    AS y,
        quarter("generate_series") AS q,
        month("generate_series")   AS m
    FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY)
)
SELECT
    y, q, m,
    GROUPING_ID(y, q, m) AS "grouping_id(y, q, m)",
    right(GROUPING_ID(y, q, m)::BIT::VARCHAR, 3) AS "y_q_m_bits"
FROM days
GROUP BY GROUPING SETS (
    (y, q, m),
    (y, q),
    (y),
    ()
)
ORDER BY y, q, m;
```

Which returns these results:

|  y   |  q   |  m   | grouping_id(y, q, m) | y_q_m_bits |
|-----:|-----:|-----:|---------------------:|------------|
| 2023 | 1    | 1    | 0                    | 000        |
| 2023 | 1    | 2    | 0                    | 000        |
| 2023 | 1    | 3    | 0                    | 000        |
| 2023 | 1    | NULL | 1                    | 001        |
| 2023 | 2    | 4    | 0                    | 000        |
| 2023 | 2    | 5    | 0                    | 000        |
| 2023 | 2    | 6    | 0                    | 000        |
| 2023 | 2    | NULL | 1                    | 001        |
| 2023 | 3    | 7    | 0                    | 000        |
| 2023 | 3    | 8    | 0                    | 000        |
| 2023 | 3    | 9    | 0                    | 000        |
| 2023 | 3    | NULL | 1                    | 001        |
| 2023 | 4    | 10   | 0                    | 000        |
| 2023 | 4    | 11   | 0                    | 000        |
| 2023 | 4    | 12   | 0                    | 000        |
| 2023 | 4    | NULL | 1                    | 001        |
| 2023 | NULL | NULL | 3                    | 011        |
| NULL | NULL | NULL | 7                    | 111        |

Note that the number of expressions passed to `GROUPING_ID()`, or the order in which they are passed is independent from the actual group definitions appearing in the `GROUPING SETS`-clause (or the groups implied by `ROLLUP` and `CUBE`). As long as the expressions passed to `GROUPING_ID()` are expressions that appear some where in the `GROUPING SETS`-clause, `GROUPING_ID()` will set a bit corresponding to the position of the expression whenever that expression is rolled up to a super-aggregate.

## Syntax

<div id="rrdiagram"></div>