---
github_directory: https://github.com/duckdb/duckdb/tree/main/extension/tpch
layout: docu
redirect_from:
- /docs/extensions/tpch
title: TPC-H Extension
---

The `tpch` extension implements the data generator and queries for the [TPC-H benchmark](https://www.tpc.org/tpch/).

## Installing and Loading

The `tpch` extension is shipped by default in some DuckDB builds, otherwise it will be transparently [autoloaded]({% link docs/stable/extensions/overview.md %}#autoloading-extensions) on first use.
If you would like to install and load it manually, run:

```sql
INSTALL tpch;
LOAD tpch;
```

## Usage

### Generating Data

To generate data for scale factor 1, use:

```sql
CALL dbgen(sf = 1);
```

Calling `dbgen` does not clean up existing TPC-H tables.
To clean up existing tables, use `DROP TABLE` before running `dbgen`:

```sql
DROP TABLE IF EXISTS customer;
DROP TABLE IF EXISTS lineitem;
DROP TABLE IF EXISTS nation;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS part;
DROP TABLE IF EXISTS partsupp;
DROP TABLE IF EXISTS region;
DROP TABLE IF EXISTS supplier;
```

### Running a Query

To run a query, e.g., query 4, use:

```sql
PRAGMA tpch(4);
```

| o_orderpriority | order_count |
|-----------------|------------:|
| 1-URGENT        | 10594       |
| 2-HIGH          | 10476       |
| 3-MEDIUM        | 10410       |
| 4-NOT SPECIFIED | 10556       |
| 5-LOW           | 10487       |

### Listing Queries

To list all 22 queries, run:

```sql
FROM tpch_queries();
```

This function returns a table with columns `query_nr` and `query`.

### Listing Expected Answers

To produced the expected results for all queries on scale factors 0.01, 0.1, and 1, run:

```sql
FROM tpch_answers();
```

This function returns a table with columns `query_nr`, `scale_factor`, and `answer`.

## Generating the Schema

It's possible to generate the schema of TPC-H without any data by setting the scale factor to 0:

```sql
CALL dbgen(sf = 0);
```

## Data Generator Parameters

The data generator function `dbgen` has the following parameters:

| Name | Type | Description |
|--|--|------------|
| `catalog`   | `VARCHAR`  | Target catalog                                                                                                                    |
| `children`  | `UINTEGER` | Number of partitions                                                                                                              |
| `overwrite` | `BOOLEAN`  | (Not used)                                                                                                                        |
| `sf`        | `DOUBLE`   | Scale factor                                                                                                                      |
| `step`      | `UINTEGER` | Defines the partition to be generated, indexed from 0 to `children` - 1. Must be defined when the `children` arguments is defined |
| `suffix`    | `VARCHAR`  | Append the `suffix` to table names                                                                                                |

## Pre-Generated Data Sets

Pre-generated DuckDB databases for TPC-H are available for download:

* [`tpch-sf1.db`](https://blobs.duckdb.org/data/tpch-sf1.db) (250 MB)
* [`tpch-sf3.db`](https://blobs.duckdb.org/data/tpch-sf3.db) (754 MB)
* [`tpch-sf10.db`](https://blobs.duckdb.org/data/tpch-sf10.db) (2.5 GB)
* [`tpch-sf30.db`](https://blobs.duckdb.org/data/tpch-sf30.db) (7.6 GB)
* [`tpch-sf100.db`](https://blobs.duckdb.org/data/tpch-sf100.db) (26 GB)
* [`tpch-sf300.db`](https://blobs.duckdb.org/data/tpch-sf300.db) (78 GB)
* [`tpch-sf1000.db`](https://blobs.duckdb.org/data/tpch-sf1000.db) (265 GB)
* [`tpch-sf3000.db`](https://blobs.duckdb.org/data/tpch-sf3000.db) (796 GB)

## Resource Usage of the Data Generator

Generating TPC-H data sets for large scale factors takes a significant amount of time.
Additionally, when the generation is done in a single step, it requires a large amount of memory.
The following table gives an estimate on the resources required to produce DuckDB database files containing the generated TPC-H data set using 128 threads.

| Scale factor | Database size | Data generation time | Generator's memory usage |
|-------------:|--------------:|---------------------:|-------------------------:|
|          100 |         26 GB | 17 minutes           |                    71 GB |
|          300 |         78 GB | 51 minutes           |                   211 GB |
|        1,000 |        265 GB | 2 h 53 minutes       |                   647 GB |
|        3,000 |        796 GB | 8 h 30 minutes       |                  1799 GB |

The numbers shown above were achieved by running the `dbgen` function in a single step, for example:

```sql
CALL dbgen(sf = 300);
```

If you have a limited amount of memory available, you can run the `dbgen` function in steps.
For example, you may generate SF300 in 10 steps:

```sql
CALL dbgen(sf = 300, children = 10, step = 0);
CALL dbgen(sf = 300, children = 10, step = 1);
...
CALL dbgen(sf = 300, children = 10, step = 9);
```

## Limitation

The `tpch(⟨query_id⟩)`{:.language-sql .highlight} function runs a fixed TPC-H query with pre-defined bind parameters (a.k.a. substitution parameters). It is not possible to change the query parameters using the `tpch` extension. To run the queries with the parameters prescribed by the TPC-H benchmark, use a TPC-H framework implementation.