--- layout: docu railroad: expressions/window.js title: Window Functions --- DuckDB supports [window functions](https://en.wikipedia.org/wiki/Window_function_(SQL)), which can use multiple rows to calculate a value for each row. Window functions are [blocking operators]({% link docs/1.3/guides/performance/how_to_tune_workloads.md %}#blocking-operators), i.e., they require their entire input to be buffered, making them one of the most memory-intensive operators in SQL. Window function are available in SQL since [SQL:2003](https://en.wikipedia.org/wiki/SQL:2003) and are supported by major SQL database systems. ## Examples Generate a `row_number` column to enumerate rows: ```sql SELECT row_number() OVER () FROM sales; ``` > Tip If you only need a number for each row in a table, you can use the [`rowid` pseudocolumn]({% link docs/1.3/sql/statements/select.md %}#row-ids). Generate a `row_number` column to enumerate rows, ordered by `time`: ```sql SELECT row_number() OVER (ORDER BY time) FROM sales; ``` Generate a `row_number` column to enumerate rows, ordered by `time` and partitioned by `region`: ```sql SELECT row_number() OVER (PARTITION BY region ORDER BY time) FROM sales; ``` Compute the difference between the current and the previous-by-`time` `amount`: ```sql SELECT amount - lag(amount) OVER (ORDER BY time) FROM sales; ``` Compute the percentage of the total `amount` of sales per `region` for each row: ```sql SELECT amount / sum(amount) OVER (PARTITION BY region) FROM sales; ``` ## Syntax
Window functions can only be used in the `SELECT` clause. To share `OVER` specifications between functions, use the statement's [`WINDOW` clause]({% link docs/1.3/sql/query_syntax/window.md %}) and use the `OVER ⟨window_name⟩`{:.language-sql .highlight} syntax. ## General-Purpose Window Functions The table below shows the available general window functions. | Name | Description | |:--|:-------| | [`cume_dist([ORDER BY ordering])`](#cume_distorder-by-ordering) | The cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows. | | [`dense_rank()`](#dense_rank) | The rank of the current row *without gaps;* this function counts peer groups. | | [`first_value(expr[ ORDER BY ordering][ IGNORE NULLS])`](#first_valueexpr-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is the first row (with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. | | [`lag(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])`](#lagexpr-offset-default-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is `offset` rows (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) before the current row within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. | | [`last_value(expr[ ORDER BY ordering][ IGNORE NULLS])`](#last_valueexpr-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is the last row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. | | [`lead(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])`](#leadexpr-offset-default-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is `offset` rows after the current row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. | | [`nth_value(expr, nth[ ORDER BY ordering][ IGNORE NULLS])`](#nth_valueexpr-nth-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the nth row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame (counting from 1); `NULL` if no such row. | | [`ntile(num_buckets[ ORDER BY ordering])`](#ntilenum_buckets-order-by-ordering) | An integer ranging from 1 to `num_buckets`, dividing the partition as equally as possible. | | [`percent_rank([ORDER BY ordering])`](#percent_rankorder-by-ordering) | The relative rank of the current row: `(rank() - 1) / (total partition rows - 1)`. | | [`rank_dense()`](#rank_dense) | The rank of the current row *without gaps*. | | [`rank([ORDER BY ordering])`](#rankorder-by-ordering) | The rank of the current row *with gaps;* same as `row_number` of its first peer. | | [`row_number([ORDER BY ordering])`](#row_numberorder-by-ordering) | The number of the current row within the partition, counting from 1. | #### `cume_dist([ORDER BY ordering])` | **Description** | The cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows. If an `ORDER BY` clause is specified, the distribution is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `DOUBLE` | | **Example** | `cume_dist()` | #### `dense_rank()` | **Description** | The rank of the current row *without gaps;* this function counts peer groups. | | **Return type** | `BIGINT` | | **Example** | `dense_rank()` | | **Aliases** | `rank_dense()` | #### `first_value(expr[ ORDER BY ordering][ IGNORE NULLS])` | **Description** | Returns `expr` evaluated at the row that is the first row (with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. If an `ORDER BY` clause is specified, the first row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Example** | `first_value(column)` | #### `lag(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])` | **Description** | Returns `expr` evaluated at the row that is `offset` rows (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) before the current row within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the lagged row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Aliases** | `lag(column, 3, 0)` | #### `last_value(expr[ ORDER BY ordering][ IGNORE NULLS])` | **Description** | Returns `expr` evaluated at the row that is the last row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the last row is determined within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Example** | `last_value(column)` | #### `lead(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])` | **Description** | Returns `expr` evaluated at the row that is `offset` rows after the current row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the leading row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Aliases** | `lead(column, 3, 0)` | #### `nth_value(expr, nth[ ORDER BY ordering][ IGNORE NULLS])` | **Description** | Returns `expr` evaluated at the nth row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame (counting from 1); `NULL` if no such row. If an `ORDER BY` clause is specified, the nth row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Aliases** | `nth_value(column, 2)` | #### `ntile(num_buckets[ ORDER BY ordering])` | **Description** | An integer ranging from 1 to `num_buckets`, dividing the partition as equally as possible. If an `ORDER BY` clause is specified, the ntile is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `BIGINT` | | **Example** | `ntile(4)` | #### `percent_rank([ORDER BY ordering])` | **Description** | The relative rank of the current row: `(rank() - 1) / (total partition rows - 1)`. If an `ORDER BY` clause is specified, the relative rank is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `DOUBLE` | | **Example** | `percent_rank()` | #### `rank_dense()` | **Description** | The rank of the current row *without gaps*. | | **Return type** | `BIGINT` | | **Example** | `rank_dense()` | | **Aliases** | `dense_rank()` | #### `rank([ORDER BY ordering])` | **Description** | The rank of the current row *with gaps*; same as `row_number` of its first peer. If an `ORDER BY` clause is specified, the rank is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `BIGINT` | | **Example** | `rank()` | #### `row_number([ORDER BY ordering])` | **Description** | The number of the current row within the partition, counting from 1. If an `ORDER BY` clause is specified, the row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `BIGINT` | | **Example** | `row_number()` | ## Aggregate Window Functions All [aggregate functions]({% link docs/1.3/sql/functions/aggregates.md %}) can be used in a windowing context, including the optional [`FILTER` clause]({% link docs/1.3/sql/query_syntax/filter.md %}). The `first` and `last` aggregate functions are shadowed by the respective general-purpose window functions, with the minor consequence that the `FILTER` clause is not available for these but `IGNORE NULLS` is. ## DISTINCT Arguments All aggregate window functions support using a `DISTINCT` clause for the arguments. When the `DISTINCT` clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the `COUNT` aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system. There are some aggregates that are insensitive to duplicate values (e.g., `min`, `max`) and for them this clause is parsed and ignored. ```sql -- Count the number of distinct users at a given point in time SELECT count(DISTINCT name) OVER (ORDER BY time) FROM sales; -- Concatenate those distinct users into a list SELECT list(DISTINCT name) OVER (ORDER BY time) FROM sales; ``` ## ORDER BY Arguments All aggregate window functions support using an `ORDER BY` argument clause that is *different* from the window ordering. When the `ORDER BY` argument clause is provided, the values being aggregated are sorted before applying the function. Usually this is not important, but there are some order-sensitive aggregates that can have indeterminate results (e.g., `mode`, `list` and `string_agg`). These can be made deterministic by ordering the arguments. For order-insensitive aggregates, this clause is parsed and ignored. ```sql -- Compute the modal value up to each time, breaking ties in favour of the most recent value. SELECT mode(value ORDER BY time DESC) OVER (ORDER BY time) FROM sales; ``` The SQL standard does not provide for using `ORDER BY` with general-purpose window functions, but we have extended all of these functions (except `dense_rank`) to accept this syntax and use framing to restrict the range that the secondary ordering applies to. ```sql -- Compare each athlete's time in an event with the best time to date SELECT event, date, athlete, time first_value(time ORDER BY time DESC) OVER w AS record_time, first_value(athlete ORDER BY time DESC) OVER w AS record_athlete, FROM meet_results WINDOW w AS (PARTITION BY event ORDER BY datetime) ORDER BY ALL ``` Note that there is no comma separating the arguments from the `ORDER BY` clause. ## Nulls All [general-purpose window functions](#general-purpose-window-functions) that accept `IGNORE NULLS` respect nulls by default. This default behavior can optionally be made explicit via `RESPECT NULLS`. In contrast, all [aggregate window functions](#aggregate-window-functions) (except for `list` and its aliases, which can be made to ignore nulls via a `FILTER`) ignore nulls and do not accept `RESPECT NULLS`. For example, `sum(column) OVER (ORDER BY time) AS cumulativeColumn` computes a cumulative sum where rows with a `NULL` value of `column` have the same value of `cumulativeColumn` as the row that precedes them. ## Evaluation Windowing works by breaking a relation up into independent *partitions*, *ordering* those partitions, and then computing a new column for each row as a function of the nearby values. Some window functions depend only on the partition boundary and the ordering, but a few (including all the aggregates) also use a *frame*. Frames are specified as a number of rows on either side (*preceding* or *following*) of the *current row*. The distance can be specified as a number of *rows*, as a *range* of values using the partition's ordering value and a distance, or as a number of *groups* (sets of rows with the same sort value). The full syntax is shown in the diagram at the top of the page, and this diagram visually illustrates computation environment: