--- layout: docu title: Relational API --- The Relational API is an alternative API that can be used to incrementally construct queries. The API is centered around `DuckDBPyRelation` nodes. The relations can be seen as symbolic representations of SQL queries. ## Lazy Evaluation The relations do not hold any data – and nothing is executed – until [a method that triggers execution](#output) is called. For example, we create a relation, which loads 1 billion rows: ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("from range(1_000_000_000)") ``` At the moment of execution, `rel` does not hold any data and no data is retrieved from the database. By calling `rel.show()` or simply printing `rel` on the terminal, the first 10K rows are fetched. If there are more than 10K rows, the output window will show >9999 rows (as the amount of rows in the relation is unknown). By calling an [output](#output) method, the data is retrieved and stored in the specified format: ```python rel.to_table("example_rel") # 100% ▕████████████████████████████████████████████████████████████▏ ``` ## Relation Creation This section contains the details on how a relation is created. The methods are [lazy evaluated](#lazy-evaluation). | Name | Description | |:--|:-------| | [`from_arrow`](#from_arrow) | Create a relation object from an Arrow object | | [`from_csv_auto`](#from_csv_auto) | Create a relation object from the CSV file in 'name' | | [`from_df`](#from_df) | Create a relation object from the DataFrame in df | | [`from_parquet`](#from_parquet) | Create a relation object from the Parquet files | | [`from_query`](#from_query) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. | | [`query`](#query) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. | | [`read_csv`](#read_csv) | Create a relation object from the CSV file in 'name' | | [`read_json`](#read_json) | Create a relation object from the JSON file in 'name' | | [`read_parquet`](#read_parquet) | Create a relation object from the Parquet files | | [`sql`](#sql) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. | | [`table`](#table) | Create a relation object for the named table | | [`table_function`](#table_function) | Create a relation object from the named table function with given parameters | | [`values`](#values) | Create a relation object from the passed values | | [`view`](#view) | Create a relation object for the named view | #### `from_arrow` ##### Signature ```python from_arrow(self: duckdb.duckdb.DuckDBPyConnection, arrow_object: object) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object from an Arrow object ##### Parameters - **arrow_object** : pyarrow.Table, pyarrow.RecordBatch Arrow object to create a relation from ##### Example ```python import duckdb import pyarrow as pa ids = pa.array([1], type=pa.int8()) texts = pa.array(['a'], type=pa.string()) example_table = pa.table([ids, texts], names=["id", "text"]) duckdb_conn = duckdb.connect() rel = duckdb_conn.from_arrow(example_table) rel.show() ``` ##### Result ```text ┌──────┬─────────┐ │ id │ text │ │ int8 │ varchar │ ├──────┼─────────┤ │ 1 │ a │ └──────┴─────────┘ ``` ---- #### `from_csv_auto` ##### Signature ```python from_csv_auto(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object from the CSV file in 'name' **Aliases**: [`read_csv`](#read_csv) ##### Parameters - **path_or_buffer** : Union[str, StringIO, TextIOBase] Path to the CSV file or buffer to read from. - **header** : Optional[bool], Optional[int] Row number(s) to use as the column names, or None if no header. - **compression** : Optional[str] Compression type (e.g., 'gzip', 'bz2'). - **sep** : Optional[str] Delimiter to use; defaults to comma. - **delimiter** : Optional[str] Alternative delimiter to use. - **dtype** : Optional[Dict[str, str]], Optional[List[str]] Data types for columns. - **na_values** : Optional[str], Optional[List[str]] Additional strings to recognize as NA/NaN. - **skiprows** : Optional[int] Number of rows to skip at the start. - **quotechar** : Optional[str] Character used to quote fields. - **escapechar** : Optional[str] Character used to escape delimiter or quote characters. - **encoding** : Optional[str] Encoding to use for UTF when reading/writing. - **parallel** : Optional[bool] Enable parallel reading. - **date_format** : Optional[str] Format to parse dates. - **timestamp_format** : Optional[str] Format to parse timestamps. - **sample_size** : Optional[int] Number of rows to sample for schema inference. - **all_varchar** : Optional[bool] Treat all columns as VARCHAR. - **normalize_names** : Optional[bool] Normalize column names to lowercase. - **null_padding** : Optional[bool] Enable null padding for rows with missing columns. - **names** : Optional[List[str]] List of column names to use. - **lineterminator** : Optional[str] Character to break lines on. - **columns** : Optional[Dict[str, str]] Column mapping for schema. - **auto_type_candidates** : Optional[List[str]] List of columns for automatic type inference. - **max_line_size** : Optional[int] Maximum line size in bytes. - **ignore_errors** : Optional[bool] Ignore parsing errors. - **store_rejects** : Optional[bool] Store rejected rows. - **rejects_table** : Optional[str] Table name to store rejected rows. - **rejects_scan** : Optional[str] Scan to use for rejects. - **rejects_limit** : Optional[int] Limit number of rejects stored. - **force_not_null** : Optional[List[str]] List of columns to force as NOT NULL. - **buffer_size** : Optional[int] Buffer size in bytes. - **decimal** : Optional[str] Character to recognize as decimal point. - **allow_quoted_nulls** : Optional[bool] Allow quoted NULL values. - **filename** : Optional[bool], Optional[str] Add filename column or specify filename. - **hive_partitioning** : Optional[bool] Enable Hive-style partitioning. - **union_by_name** : Optional[bool] Union files by column name instead of position. - **hive_types** : Optional[Dict[str, str]] Hive types for columns. - **hive_types_autocast** : Optional[bool] Automatically cast Hive types. - **connection** : DuckDBPyConnection DuckDB connection to use. ##### Example ```python import csv import duckdb duckdb_conn = duckdb.connect() with open('code_example.csv', 'w', newline='') as csvfile: fieldnames = ['id', 'text'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerow({'id': '1', 'text': 'a'}) rel = duckdb_conn.from_csv_auto("code_example.csv") rel.show() ``` ##### Result ```text ┌───────┬─────────┐ │ id │ text │ │ int64 │ varchar │ ├───────┼─────────┤ │ 1 │ a │ └───────┴─────────┘ ``` ---- #### `from_df` ##### Signature ```python from_df(self: duckdb.duckdb.DuckDBPyConnection, df: pandas.DataFrame) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object from the DataFrame in df ##### Parameters - **df** : pandas.DataFrame A pandas DataFrame to be converted into a DuckDB relation. ##### Example ```python import duckdb import pandas as pd df = pd.DataFrame(data = {'id': [1], "text":["a"]}) duckdb_conn = duckdb.connect() rel = duckdb_conn.from_df(df) rel.show() ``` ##### Result ```text ┌───────┬─────────┐ │ id │ text │ │ int64 │ varchar │ ├───────┼─────────┤ │ 1 │ a │ └───────┴─────────┘ ``` ---- #### `from_parquet` ##### Signature ```python from_parquet(*args, **kwargs) Overloaded function. 1. from_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_glob 2. from_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_globs: list[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_globs ``` ##### Description Create a relation object from the Parquet files **Aliases**: [`read_parquet`](#read_parquet) ##### Parameters - **file_glob** : str File path or glob pattern pointing to Parquet files to be read. - **binary_as_string** : bool, default: False Interpret binary columns as strings instead of blobs. - **file_row_number** : bool, default: False Add a column containing the row number within each file. - **filename** : bool, default: False Add a column containing the name of the file each row came from. - **hive_partitioning** : bool, default: False Enable automatic detection of Hive-style partitions in file paths. - **union_by_name** : bool, default: False Union Parquet files by matching column names instead of positions. - **compression** : object Optional compression codec to use when reading the Parquet files. ##### Example ```python import duckdb import pyarrow as pa import pyarrow.parquet as pq ids = pa.array([1], type=pa.int8()) texts = pa.array(['a'], type=pa.string()) example_table = pa.table([ids, texts], names=["id", "text"]) pq.write_table(example_table, "code_example.parquet") duckdb_conn = duckdb.connect() rel = duckdb_conn.from_parquet("code_example.parquet") rel.show() ``` ##### Result ```text ┌──────┬─────────┐ │ id │ text │ │ int8 │ varchar │ ├──────┼─────────┤ │ 1 │ a │ └──────┴─────────┘ ``` ---- #### `from_query` ##### Signature ```python from_query(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. **Aliases**: [`query`](#query), [`sql`](#sql) ##### Parameters - **query** : object The SQL query or subquery to be executed and converted into a relation. - **alias** : str, default: '' Optional alias name to assign to the resulting relation. - **params** : object Optional query parameters to be used in the SQL query. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.from_query("from range(1,2) tbl(id)") rel.show() ``` ##### Result ```text ┌───────┐ │ id │ │ int64 │ ├───────┤ │ 1 │ └───────┘ ``` ---- #### `query` ##### Signature ```python query(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. **Aliases**: [`from_query`](#from_query), [`sql`](#sql) ##### Parameters - **query** : object The SQL query or subquery to be executed and converted into a relation. - **alias** : str, default: '' Optional alias name to assign to the resulting relation. - **params** : object Optional query parameters to be used in the SQL query. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.query("from range(1,2) tbl(id)") rel.show() ``` ##### Result ```text ┌───────┐ │ id │ │ int64 │ ├───────┤ │ 1 │ └───────┘ ``` ---- #### `read_csv` ##### Signature ```python read_csv(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object from the CSV file in 'name' **Aliases**: [`from_csv_auto`](#from_csv_auto) ##### Parameters - **path_or_buffer** : Union[str, StringIO, TextIOBase] Path to the CSV file or buffer to read from. - **header** : Optional[bool], Optional[int] Row number(s) to use as the column names, or None if no header. - **compression** : Optional[str] Compression type (e.g., 'gzip', 'bz2'). - **sep** : Optional[str] Delimiter to use; defaults to comma. - **delimiter** : Optional[str] Alternative delimiter to use. - **dtype** : Optional[Dict[str, str]], Optional[List[str]] Data types for columns. - **na_values** : Optional[str], Optional[List[str]] Additional strings to recognize as NA/NaN. - **skiprows** : Optional[int] Number of rows to skip at the start. - **quotechar** : Optional[str] Character used to quote fields. - **escapechar** : Optional[str] Character used to escape delimiter or quote characters. - **encoding** : Optional[str] Encoding to use for UTF when reading/writing. - **parallel** : Optional[bool] Enable parallel reading. - **date_format** : Optional[str] Format to parse dates. - **timestamp_format** : Optional[str] Format to parse timestamps. - **sample_size** : Optional[int] Number of rows to sample for schema inference. - **all_varchar** : Optional[bool] Treat all columns as VARCHAR. - **normalize_names** : Optional[bool] Normalize column names to lowercase. - **null_padding** : Optional[bool] Enable null padding for rows with missing columns. - **names** : Optional[List[str]] List of column names to use. - **lineterminator** : Optional[str] Character to break lines on. - **columns** : Optional[Dict[str, str]] Column mapping for schema. - **auto_type_candidates** : Optional[List[str]] List of columns for automatic type inference. - **max_line_size** : Optional[int] Maximum line size in bytes. - **ignore_errors** : Optional[bool] Ignore parsing errors. - **store_rejects** : Optional[bool] Store rejected rows. - **rejects_table** : Optional[str] Table name to store rejected rows. - **rejects_scan** : Optional[str] Scan to use for rejects. - **rejects_limit** : Optional[int] Limit number of rejects stored. - **force_not_null** : Optional[List[str]] List of columns to force as NOT NULL. - **buffer_size** : Optional[int] Buffer size in bytes. - **decimal** : Optional[str] Character to recognize as decimal point. - **allow_quoted_nulls** : Optional[bool] Allow quoted NULL values. - **filename** : Optional[bool], Optional[str] Add filename column or specify filename. - **hive_partitioning** : Optional[bool] Enable Hive-style partitioning. - **union_by_name** : Optional[bool] Union files by column name instead of position. - **hive_types** : Optional[Dict[str, str]] Hive types for columns. - **hive_types_autocast** : Optional[bool] Automatically cast Hive types. - **connection** : DuckDBPyConnection DuckDB connection to use. ##### Example ```python import csv import duckdb duckdb_conn = duckdb.connect() with open('code_example.csv', 'w', newline='') as csvfile: fieldnames = ['id', 'text'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerow({'id': '1', 'text': 'a'}) rel = duckdb_conn.read_csv("code_example.csv") rel.show() ``` ##### Result ```text ┌───────┬─────────┐ │ id │ text │ │ int64 │ varchar │ ├───────┼─────────┤ │ 1 │ a │ └───────┴─────────┘ ``` ---- #### `read_json` ##### Signature ```python read_json(self: duckdb.duckdb.DuckDBPyConnection, path_or_buffer: object, *, columns: typing.Optional[object] = None, sample_size: typing.Optional[object] = None, maximum_depth: typing.Optional[object] = None, records: typing.Optional[str] = None, format: typing.Optional[str] = None, date_format: typing.Optional[object] = None, timestamp_format: typing.Optional[object] = None, compression: typing.Optional[object] = None, maximum_object_size: typing.Optional[object] = None, ignore_errors: typing.Optional[object] = None, convert_strings_to_integers: typing.Optional[object] = None, field_appearance_threshold: typing.Optional[object] = None, map_inference_threshold: typing.Optional[object] = None, maximum_sample_files: typing.Optional[object] = None, filename: typing.Optional[object] = None, hive_partitioning: typing.Optional[object] = None, union_by_name: typing.Optional[object] = None, hive_types: typing.Optional[object] = None, hive_types_autocast: typing.Optional[object] = None) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object from the JSON file in 'name' ##### Parameters - **path_or_buffer** : object File path or file-like object containing JSON data to be read. - **columns** : object Optional list of column names to project from the JSON data. - **sample_size** : object Number of rows to sample for inferring JSON schema. - **maximum_depth** : object Maximum depth to which JSON objects should be parsed. - **records** : str Format string specifying whether JSON is in records mode. - **format** : str Format of the JSON data (e.g., 'auto', 'newline_delimited'). - **date_format** : object Format string for parsing date fields. - **timestamp_format** : object Format string for parsing timestamp fields. - **compression** : object Compression codec used on the JSON data (e.g., 'gzip'). - **maximum_object_size** : object Maximum size in bytes for individual JSON objects. - **ignore_errors** : object If True, skip over JSON records with parsing errors. - **convert_strings_to_integers** : object If True, attempt to convert strings to integers where appropriate. - **field_appearance_threshold** : object Threshold for inferring optional fields in nested JSON. - **map_inference_threshold** : object Threshold for inferring maps from JSON object patterns. - **maximum_sample_files** : object Maximum number of files to sample for schema inference. - **filename** : object If True, include a column with the source filename for each row. - **hive_partitioning** : object If True, enable Hive partitioning based on directory structure. - **union_by_name** : object If True, align JSON columns by name instead of position. - **hive_types** : object If True, use Hive types from directory structure for schema. - **hive_types_autocast** : object If True, automatically cast data types to match Hive types. ##### Example ```python import duckdb import json with open("code_example.json", mode="w") as f: json.dump([{'id': 1, "text":"a"}], f) duckdb_conn = duckdb.connect() rel = duckdb_conn.read_json("code_example.json") rel.show() ``` ##### Result ```text ┌───────┬─────────┐ │ id │ text │ │ int64 │ varchar │ ├───────┼─────────┤ │ 1 │ a │ └───────┴─────────┘ ``` ---- #### `read_parquet` ##### Signature ```python read_parquet(*args, **kwargs) Overloaded function. 1. read_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_glob 2. read_parquet(self: duckdb.duckdb.DuckDBPyConnection, file_globs: list[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_globs ``` ##### Description Create a relation object from the Parquet files **Aliases**: [`from_parquet`](#from_parquet) ##### Parameters - **file_glob** : str File path or glob pattern pointing to Parquet files to be read. - **binary_as_string** : bool, default: False Interpret binary columns as strings instead of blobs. - **file_row_number** : bool, default: False Add a column containing the row number within each file. - **filename** : bool, default: False Add a column containing the name of the file each row came from. - **hive_partitioning** : bool, default: False Enable automatic detection of Hive-style partitions in file paths. - **union_by_name** : bool, default: False Union Parquet files by matching column names instead of positions. - **compression** : object Optional compression codec to use when reading the Parquet files. ##### Example ```python import duckdb import pyarrow as pa import pyarrow.parquet as pq ids = pa.array([1], type=pa.int8()) texts = pa.array(['a'], type=pa.string()) example_table = pa.table([ids, texts], names=["id", "text"]) pq.write_table(example_table, "code_example.parquet") duckdb_conn = duckdb.connect() rel = duckdb_conn.read_parquet("code_example.parquet") rel.show() ``` ##### Result ```text ┌──────┬─────────┐ │ id │ text │ │ int8 │ varchar │ ├──────┼─────────┤ │ 1 │ a │ └──────┴─────────┘ ``` ---- #### `sql` ##### Signature ```python sql(self: duckdb.duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. **Aliases**: [`from_query`](#from_query), [`query`](#query) ##### Parameters - **query** : object The SQL query or subquery to be executed and converted into a relation. - **alias** : str, default: '' Optional alias name to assign to the resulting relation. - **params** : object Optional query parameters to be used in the SQL query. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("from range(1,2) tbl(id)") rel.show() ``` ##### Result ```text ┌───────┐ │ id │ │ int64 │ ├───────┤ │ 1 │ └───────┘ ``` ---- #### `table` ##### Signature ```python table(self: duckdb.duckdb.DuckDBPyConnection, table_name: str) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object for the named table ##### Parameters - **table_name** : str Name of the table to create a relation from. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() duckdb_conn.sql("create table code_example as select * from range(1,2) tbl(id)") rel = duckdb_conn.table("code_example") rel.show() ``` ##### Result ```text ┌───────┐ │ id │ │ int64 │ ├───────┤ │ 1 │ └───────┘ ``` ---- #### `table_function` ##### Signature ```python table_function(self: duckdb.duckdb.DuckDBPyConnection, name: str, parameters: object = None) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object from the named table function with given parameters ##### Parameters - **name** : str Name of the table function to call. - **parameters** : object Optional parameters to pass to the table function. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() duckdb_conn.sql(""" create macro get_record_for(x) as table select x*range from range(1,2) """) rel = duckdb_conn.table_function(name="get_record_for", parameters=[1]) rel.show() ``` ##### Result ```text ┌───────────────┐ │ (1 * "range") │ │ int64 │ ├───────────────┤ │ 1 │ └───────────────┘ ``` ---- #### `values` ##### Signature ```python values(self: duckdb.duckdb.DuckDBPyConnection, *args) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object from the passed values ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.values([1, 'a']) rel.show() ``` ##### Result ```text ┌───────┬─────────┐ │ col0 │ col1 │ │ int32 │ varchar │ ├───────┼─────────┤ │ 1 │ a │ └───────┴─────────┘ ``` ---- #### `view` ##### Signature ```python view(self: duckdb.duckdb.DuckDBPyConnection, view_name: str) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create a relation object for the named view ##### Parameters - **view_name** : str Name of the view to create a relation from. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() duckdb_conn.sql("create table code_example as select * from range(1,2) tbl(id)") rel = duckdb_conn.view("code_example") rel.show() ``` ##### Result ```text ┌───────┐ │ id │ │ int64 │ ├───────┤ │ 1 │ └───────┘ ``` ## Relation Definition Details This section contains the details on how to inspect a relation. | Name | Description | |:--|:-------| | [`alias`](#alias) | Get the name of the current alias | | [`columns`](#columns) | Return a list containing the names of the columns of the relation. | | [`describe`](#describe) | Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation. | | [`description`](#description) | Return the description of the result | | [`dtypes`](#dtypes) | Return a list containing the types of the columns of the relation. | | [`explain`](#explain) | explain(self: duckdb.duckdb.DuckDBPyRelation, type: duckdb.duckdb.ExplainType = 'standard') -> str | | [`query`](#query-1) | Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object | | [`set_alias`](#set_alias) | Rename the relation object to new alias | | [`shape`](#shape) | Tuple of # of rows, # of columns in relation. | | [`show`](#show) | Display a summary of the data | | [`sql_query`](#sql_query) | Get the SQL query that is equivalent to the relation | | [`type`](#type) | Get the type of the relation. | | [`types`](#types) | Return a list containing the types of the columns of the relation. | #### `alias` ##### Description Get the name of the current alias ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.alias ``` ##### Result ```text unnamed_relation_43c808c247431be5 ``` ---- #### `columns` ##### Description Return a list containing the names of the columns of the relation. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.columns ``` ##### Result ```text ['id', 'description', 'value', 'created_timestamp'] ``` ---- #### `describe` ##### Signature ```python describe(self: duckdb.duckdb.DuckDBPyRelation) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.describe() ``` ##### Result ```text ┌─────────┬──────────────────────────────────────┬─────────────────┬────────────────────┬────────────────────────────┐ │ aggr │ id │ description │ value │ created_timestamp │ │ varchar │ varchar │ varchar │ double │ varchar │ ├─────────┼──────────────────────────────────────┼─────────────────┼────────────────────┼────────────────────────────┤ │ count │ 9 │ 9 │ 9.0 │ 9 │ │ mean │ NULL │ NULL │ 5.0 │ NULL │ │ stddev │ NULL │ NULL │ 2.7386127875258306 │ NULL │ │ min │ 08fdcbf8-4e53-4290-9e81-423af263b518 │ value is even │ 1.0 │ 2025-04-09 15:41:20.642+02 │ │ max │ fb10390e-fad5-4694-91cb-e82728cb6f9f │ value is uneven │ 9.0 │ 2025-04-09 15:49:20.642+02 │ │ median │ NULL │ NULL │ 5.0 │ NULL │ └─────────┴──────────────────────────────────────┴─────────────────┴────────────────────┴────────────────────────────┘ ``` ---- #### `description` ##### Description Return the description of the result ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.description ``` ##### Result ```text [('id', 'UUID', None, None, None, None, None), ('description', 'STRING', None, None, None, None, None), ('value', 'NUMBER', None, None, None, None, None), ('created_timestamp', 'DATETIME', None, None, None, None, None)] ``` ---- #### `dtypes` ##### Description Return a list containing the types of the columns of the relation. **Aliases**: [`types`](#types) ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.dtypes ``` ##### Result ```text [UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE] ``` ---- #### `explain` ##### Description explain(self: duckdb.duckdb.DuckDBPyRelation, type: duckdb.duckdb.ExplainType = 'standard') -> str ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.explain() ``` ##### Result ```text ┌───────────────────────────┐ │ PROJECTION │ │ ──────────────────── │ │ id │ │ description │ │ value │ │ created_timestamp │ │ │ │ ~9 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ RANGE │ │ ──────────────────── │ │ Function: RANGE │ │ │ │ ~9 Rows │ └───────────────────────────┘ ``` ---- #### `query` ##### Signature ```python query(self: duckdb.duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object ##### Parameters - **virtual_table_name** : str The name to assign to the current relation when referenced in the SQL query. - **sql_query** : str The SQL query string that uses the virtual table name to query the relation. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.query(virtual_table_name="rel_view", sql_query="from rel") duckdb_conn.sql("show rel_view") ``` ##### Result ```text ┌───────────────────┬──────────────────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├───────────────────┼──────────────────────────┼─────────┼─────────┼─────────┼─────────┤ │ id │ UUID │ YES │ NULL │ NULL │ NULL │ │ description │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ value │ BIGINT │ YES │ NULL │ NULL │ NULL │ │ created_timestamp │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │ └───────────────────┴──────────────────────────┴─────────┴─────────┴─────────┴─────────┘ ``` ---- #### `set_alias` ##### Signature ```python set_alias(self: duckdb.duckdb.DuckDBPyRelation, alias: str) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Rename the relation object to new alias ##### Parameters - **alias** : str The alias name to assign to the relation. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.set_alias('abc').select('abc.id') ``` ##### Result ```text In the SQL query, the alias will be `abc` ``` ---- #### `shape` ##### Description Tuple of # of rows, # of columns in relation. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.shape ``` ##### Result ```text (9, 4) ``` ---- #### `show` ##### Signature ```python show(self: duckdb.duckdb.DuckDBPyRelation, *, max_width: typing.Optional[int] = None, max_rows: typing.Optional[int] = None, max_col_width: typing.Optional[int] = None, null_value: typing.Optional[str] = None, render_mode: object = None) -> None ``` ##### Description Display a summary of the data ##### Parameters - **max_width** : int Maximum display width for the entire output in characters. - **max_rows** : int Maximum number of rows to display. - **max_col_width** : int Maximum number of characters to display per column. - **null_value** : str String to display in place of NULL values. - **render_mode** : object Render mode for displaying the output. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.show() ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 642ea3d7-793d-4867-a759-91c1226c25a0 │ value is uneven │ 1 │ 2025-04-09 15:41:20.642+02 │ │ 6817dd31-297c-40a8-8e40-8521f00b2d08 │ value is even │ 2 │ 2025-04-09 15:42:20.642+02 │ │ 45143f9a-e16e-4e59-91b2-3a0800eed6d6 │ value is uneven │ 3 │ 2025-04-09 15:43:20.642+02 │ │ fb10390e-fad5-4694-91cb-e82728cb6f9f │ value is even │ 4 │ 2025-04-09 15:44:20.642+02 │ │ 111ced5c-9155-418e-b087-c331b814db90 │ value is uneven │ 5 │ 2025-04-09 15:45:20.642+02 │ │ 66a870a6-aef0-4085-87d5-5d1b35d21c66 │ value is even │ 6 │ 2025-04-09 15:46:20.642+02 │ │ a7e8e796-bca0-44cd-a269-1d71090fb5cc │ value is uneven │ 7 │ 2025-04-09 15:47:20.642+02 │ │ 74908d48-7f2d-4bdd-9c92-1e7920b115b5 │ value is even │ 8 │ 2025-04-09 15:48:20.642+02 │ │ 08fdcbf8-4e53-4290-9e81-423af263b518 │ value is uneven │ 9 │ 2025-04-09 15:49:20.642+02 │ └──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘ ``` ---- #### `sql_query` ##### Signature ```python sql_query(self: duckdb.duckdb.DuckDBPyRelation) -> str ``` ##### Description Get the SQL query that is equivalent to the relation ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.sql_query() ``` ##### Result ```sql SELECT gen_random_uuid() AS id, concat('value is ', CASE WHEN ((mod("range", 2) = 0)) THEN ('even') ELSE 'uneven' END) AS description, "range" AS "value", (now() + CAST(concat("range", ' ', 'minutes') AS INTERVAL)) AS created_timestamp FROM "range"(1, 10) ``` ---- #### `type` ##### Description Get the type of the relation. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.type ``` ##### Result ```text QUERY_RELATION ``` ---- #### `types` ##### Description Return a list containing the types of the columns of the relation. **Aliases**: [`dtypes`](#dtypes) ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.types ``` ##### Result ```text [UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE] ``` ## Transformation This section contains the methods which can be used to chain queries. The methods are [lazy evaluated](#lazy-evaluation). | Name | Description | |:--|:-------| | [`aggregate`](#aggregate) | Compute the aggregate aggr_expr by the optional groups group_expr on the relation | | [`apply`](#apply) | Compute the function of a single column or a list of columns by the optional groups on the relation | | [`cross`](#cross) | Create cross/cartesian product of two relational objects | | [`except_`](#except_) | Create the set except of this relation object with another relation object in other_rel | | [`filter`](#filter) | Filter the relation object by the filter in filter_expr | | [`insert`](#insert) | Inserts the given values into the relation | | [`insert_into`](#insert_into) | Inserts the relation object into an existing table named table_name | | [`intersect`](#intersect) | Create the set intersection of this relation object with another relation object in other_rel | | [`join`](#join) | Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti' | | [`limit`](#limit) | Only retrieve the first n rows from this relation object, starting at offset | | [`map`](#map) | Calls the passed function on the relation | | [`order`](#order) | Reorder the relation object by order_expr | | [`project`](#project) | Project the relation object by the projection in project_expr | | [`select`](#select) | Project the relation object by the projection in project_expr | | [`sort`](#sort) | Reorder the relation object by the provided expressions | | [`union`](#union) | Create the set union of this relation object with another relation object in other_rel | | [`update`](#update) | Update the given relation with the provided expressions | #### `aggregate` ##### Signature ```python aggregate(self: duckdb.duckdb.DuckDBPyRelation, aggr_expr: object, group_expr: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Compute the aggregate aggr_expr by the optional groups group_expr on the relation ##### Parameters - **aggr_expr** : str, list[Expression] The list of columns and aggregation functions. - **group_expr** : str, default: '' The list of columns to be included in `group_by`. If `None`, `group by all` is applied. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.aggregate('max(value)') ``` ##### Result ```text ┌──────────────┐ │ max("value") │ │ int64 │ ├──────────────┤ │ 9 │ └──────────────┘ ``` ---- #### `apply` ##### Signature ```python apply(self: duckdb.duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Compute the function of a single column or a list of columns by the optional groups on the relation ##### Parameters - **function_name** : str Name of the function to apply over the relation. - **function_aggr** : str The list of columns to apply the function over. - **group_expr** : str, default: '' Optional SQL expression for grouping. - **function_parameter** : str, default: '' Optional parameters to pass into the function. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.apply( function_name="count", function_aggr="id", group_expr="description", projected_columns="description" ) ``` ##### Result ```text ┌─────────────────┬───────────┐ │ description │ count(id) │ │ varchar │ int64 │ ├─────────────────┼───────────┤ │ value is uneven │ 5 │ │ value is even │ 4 │ └─────────────────┴───────────┘ ``` ---- #### `cross` ##### Signature ```python cross(self: duckdb.duckdb.DuckDBPyRelation, other_rel: duckdb.duckdb.DuckDBPyRelation) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create cross/cartesian product of two relational objects ##### Parameters - **other_rel** : duckdb.duckdb.DuckDBPyRelation Another relation to perform a cross product with. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.cross(other_rel=rel.set_alias("other_rel")) ``` ##### Result ```text ┌─────────────────────────────┬─────────────────┬───────┬───────────────────────────┬──────────────────────────────────────┬─────────────────┬───────┬───────────────────────────┐ │ id │ description │ value │ created_timestamp │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ uuid │ varchar │ int64 │ timestamp with time zone │ ├─────────────────────────────┼─────────────────┼───────┼───────────────────────────┼──────────────────────────────────────┼─────────────────┼───────┼───────────────────────────┤ │ cb2b453f-1a06-4f5e-abe1-b… │ value is uneven │ 1 │ 2025-04-10 09:53:29.78+02 │ cb2b453f-1a06-4f5e-abe1-bfd413581bcf │ value is uneven │ 1 │ 2025-04-10 09:53:29.78+02 │ ... ``` ---- #### `except_` ##### Signature ```python except_(self: duckdb.duckdb.DuckDBPyRelation, other_rel: duckdb.duckdb.DuckDBPyRelation) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create the set except of this relation object with another relation object in other_rel ##### Parameters - **other_rel** : duckdb.duckdb.DuckDBPyRelation The relation to subtract from the current relation (set difference). ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.except_(other_rel=rel.set_alias("other_rel")) ``` ##### Result ```text The relation query is executed twice, therefore generating different ids and timestamps: ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ f69ed6dd-a7fe-4de2-b6af-1c2418096d69 │ value is uneven │ 3 │ 2025-04-10 11:43:05.711+02 │ │ 08ad11dc-a9c2-4aaa-9272-760b27ad1f5d │ value is uneven │ 7 │ 2025-04-10 11:47:05.711+02 │ ... ``` ---- #### `filter` ##### Signature ```python filter(self: duckdb.duckdb.DuckDBPyRelation, filter_expr: object) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Filter the relation object by the filter in filter_expr ##### Parameters - **filter_expr** : str, Expression The filter expression to apply over the relation. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.filter("value = 2") ``` ##### Result ```text ┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────┤ │ b0684ab7-fcbf-41c5-8e4a-a51bdde86926 │ value is even │ 2 │ 2025-04-10 09:54:29.78+02 │ └──────────────────────────────────────┴───────────────┴───────┴───────────────────────────┘ ``` ---- #### `insert` ##### Signature ```python insert(self: duckdb.duckdb.DuckDBPyRelation, values: object) -> None ``` ##### Description Inserts the given values into the relation ##### Parameters - **values** : object A tuple of values matching the relation column list, to be inserted. ##### Example ```python import duckdb from datetime import datetime from uuid import uuid4 duckdb_conn = duckdb.connect() duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ).to_table("code_example") rel = duckdb_conn.table("code_example") rel.insert( ( uuid4(), 'value is even', 10, datetime.now() ) ) rel.filter("value = 10") ``` ##### Result ```text ┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────────┤ │ c6dfab87-fae6-4213-8f76-1b96a8d179f6 │ value is even │ 10 │ 2025-04-10 10:02:24.652218+02 │ └──────────────────────────────────────┴───────────────┴───────┴───────────────────────────────┘ ``` ---- #### `insert_into` ##### Signature ```python insert_into(self: duckdb.duckdb.DuckDBPyRelation, table_name: str) -> None ``` ##### Description Inserts the relation object into an existing table named table_name ##### Parameters - **table_name** : str The table name to insert the data into. The relation must respect the column order of the table. ##### Example ```python import duckdb from datetime import datetime from uuid import uuid4 duckdb_conn = duckdb.connect() duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ).to_table("code_example") rel = duckdb_conn.values( [ uuid4(), 'value is even', 10, datetime.now() ] ) rel.insert_into("code_example") duckdb_conn.table("code_example").filter("value = 10") ``` ##### Result ```text ┌──────────────────────────────────────┬───────────────┬───────┬───────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼───────────────┼───────┼───────────────────────────────┤ │ 271c5ddd-c1d5-4638-b5a0-d8c7dc9e8220 │ value is even │ 10 │ 2025-04-10 14:29:18.616379+02 │ └──────────────────────────────────────┴───────────────┴───────┴───────────────────────────────┘ ``` ---- #### `intersect` ##### Signature ```python intersect(self: duckdb.duckdb.DuckDBPyRelation, other_rel: duckdb.duckdb.DuckDBPyRelation) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create the set intersection of this relation object with another relation object in other_rel ##### Parameters - **other_rel** : duckdb.duckdb.DuckDBPyRelation The relation to intersect with the current relation (set intersection). ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.intersect(other_rel=rel.set_alias("other_rel")) ``` ##### Result ```text The relation query is executed once with `rel` and once with `other_rel`, therefore generating different ids and timestamps: ┌──────┬─────────────┬───────┬──────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────┴─────────────┴───────┴──────────────────────────┤ │ 0 rows │ └───────────────────────────────────────────────────────┘ ``` ---- #### `join` ##### Signature ```python join(self: duckdb.duckdb.DuckDBPyRelation, other_rel: duckdb.duckdb.DuckDBPyRelation, condition: object, how: str = 'inner') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti' Depending on how the `condition` parameter is provided, the JOIN clause generated is: - `USING` ```python import duckdb duckdb_conn = duckdb.connect() rel1 = duckdb_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)") rel2 = duckdb_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)") rel1.join(rel2, condition="id", how="inner").sql_query() ``` with following SQL: ```sql SELECT * FROM ( SELECT "range" AS id, concat('dummy 1', "range") AS "text" FROM "range"(1, 10) ) AS unnamed_relation_41bc15e744037078 INNER JOIN ( SELECT "range" AS id, concat('dummy 2', "range") AS "text" FROM "range"(5, 7) ) AS unnamed_relation_307e245965aa2c2b USING (id) ``` - `ON` ```python import duckdb duckdb_conn = duckdb.connect() rel1 = duckdb_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)") rel2 = duckdb_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)") rel1.join(rel2, condition=f"{rel1.alias}.id = {rel2.alias}.id", how="inner").sql_query() ``` with the following SQL: ```sql SELECT * FROM ( SELECT "range" AS id, concat('dummy 1', "range") AS "text" FROM "range"(1, 10) ) AS unnamed_relation_41bc15e744037078 INNER JOIN ( SELECT "range" AS id, concat('dummy 2', "range") AS "text" FROM "range"(5, 7) ) AS unnamed_relation_307e245965aa2c2b ON ((unnamed_relation_41bc15e744037078.id = unnamed_relation_307e245965aa2c2b.id)) ``` > `NATURAL`, `POSITIONAL` and `ASOF` joins are not provided by the relational API. > `CROSS` joins are provided through the [cross method](#cross). ##### Parameters - **other_rel** : duckdb.duckdb.DuckDBPyRelation The relation to join with the current relation. - **condition** : object The join condition, typically a SQL expression or the duplicated column name to join on. - **how** : str, default: 'inner' The type of join to perform: 'inner', 'left', 'right', 'outer', 'semi' and 'anti'. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.set_alias("rel").join( other_rel=rel.set_alias("other_rel"), condition="rel.id = other_rel.id", how="left" ) rel.count("*") ``` ##### Result ```text ┌──────────────┐ │ count_star() │ │ int64 │ ├──────────────┤ │ 9 │ └──────────────┘ ``` ---- #### `limit` ##### Signature ```python limit(self: duckdb.duckdb.DuckDBPyRelation, n: int, offset: int = 0) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Only retrieve the first n rows from this relation object, starting at offset ##### Parameters - **n** : int The maximum number of rows to return. - **offset** : int, default: 0 The number of rows to skip before starting to return rows. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.limit(1) ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 4135597b-29e7-4cb9-a443-41f3d54f25df │ value is uneven │ 1 │ 2025-04-10 10:52:03.678+02 │ └──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘ ``` ---- #### `map` ##### Signature ```python map(self: duckdb.duckdb.DuckDBPyRelation, map_function: Callable, *, schema: typing.Optional[object] = None) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Calls the passed function on the relation ##### Parameters - **map_function** : Callable A Python function that takes a DataFrame and returns a transformed DataFrame. - **schema** : object, default: None Optional schema describing the structure of the output relation. ##### Example ```python import duckdb from pandas import DataFrame def multiply_by_2(df: DataFrame): df["id"] = df["id"] * 2 return df duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("select range as id, 'dummy' as text from range(1,3)") rel.map(multiply_by_2, schema={"id": int, "text": str}) ``` ##### Result ```text ┌───────┬─────────┐ │ id │ text │ │ int64 │ varchar │ ├───────┼─────────┤ │ 2 │ dummy │ │ 4 │ dummy │ └───────┴─────────┘ ``` ---- #### `order` ##### Signature ```python order(self: duckdb.duckdb.DuckDBPyRelation, order_expr: str) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Reorder the relation object by order_expr ##### Parameters - **order_expr** : str SQL expression defining the ordering of the result rows. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.order("value desc").limit(1, offset=4) ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 55899131-e3d3-463c-a215-f65cb8aef3bf │ value is uneven │ 5 │ 2025-04-10 10:56:03.678+02 │ └──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘ ``` ---- #### `project` ##### Signature ```python project(self: duckdb.duckdb.DuckDBPyRelation, *args, groups: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Project the relation object by the projection in project_expr **Aliases**: [`select`](#select) ##### Parameters - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.project("description").limit(1) ``` ##### Result ```text ┌─────────────────┐ │ description │ │ varchar │ ├─────────────────┤ │ value is uneven │ └─────────────────┘ ``` ---- #### `select` ##### Signature ```python select(self: duckdb.duckdb.DuckDBPyRelation, *args, groups: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Project the relation object by the projection in project_expr **Aliases**: [`project`](#project) ##### Parameters - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select("description").limit(1) ``` ##### Result ```text ┌─────────────────┐ │ description │ │ varchar │ ├─────────────────┤ │ value is uneven │ └─────────────────┘ ``` ---- #### `sort` ##### Signature ```python sort(self: duckdb.duckdb.DuckDBPyRelation, *args) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Reorder the relation object by the provided expressions ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.sort("description") ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 5e0dfa8c-de4d-4ccd-8cff-450dabb86bde │ value is even │ 6 │ 2025-04-10 16:52:15.605+02 │ │ 95f1ad48-facf-4a84-a971-0a4fecce68c7 │ value is even │ 2 │ 2025-04-10 16:48:15.605+02 │ ... ``` ---- #### `union` ##### Signature ```python union(self: duckdb.duckdb.DuckDBPyRelation, union_rel: duckdb.duckdb.DuckDBPyRelation) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Create the set union of this relation object with another relation object in other_rel >The union is `union all`. In order to retrieve distinct values, apply [distinct](#distinct). ##### Parameters - **union_rel** : duckdb.duckdb.DuckDBPyRelation The relation to union with the current relation (set union). ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.union(union_rel=rel) rel.count("*") ``` ##### Result ```text ┌──────────────┐ │ count_star() │ │ int64 │ ├──────────────┤ │ 18 │ └──────────────┘ ``` ---- #### `update` ##### Signature ```python update(self: duckdb.duckdb.DuckDBPyRelation, set: object, *, condition: object = None) -> None ``` ##### Description Update the given relation with the provided expressions ##### Parameters - **set** : object Mapping of columns to new values for the update operation. - **condition** : object, default: None Optional condition to filter which rows to update. ##### Example ```python import duckdb from duckdb import ColumnExpression duckdb_conn = duckdb.connect() duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ).to_table("code_example") rel = duckdb_conn.table("code_example") rel.update(set={"description":None}, condition=ColumnExpression("value") == 1) # the update is executed on the table, but not reflected on the relationship # the relationship has to be recreated to retrieve the modified data rel = duckdb_conn.table("code_example") rel.show() ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 66dcaa14-f4a6-4a55-af3b-7f6aa23ab4ad │ NULL │ 1 │ 2025-04-10 16:54:49.317+02 │ │ c6a18a42-67fb-4c95-827b-c966f2f95b88 │ value is even │ 2 │ 2025-04-10 16:55:49.317+02 │ ... ``` ## Functions This section contains the functions which can be applied to a relation, in order to get a (scalar) result. The functions are [lazy evaluated](#lazy-evaluation). | Name | Description | |:--|:-------| | [`any_value`](#any_value) | Returns the first non-null value from a given column | | [`arg_max`](#arg_max) | Finds the row with the maximum value for a value column and returns the value of that row for an argument column | | [`arg_min`](#arg_min) | Finds the row with the minimum value for a value column and returns the value of that row for an argument column | | [`avg`](#avg) | Computes the average on a given column | | [`bit_and`](#bit_and) | Computes the bitwise AND of all bits present in a given column | | [`bit_or`](#bit_or) | Computes the bitwise OR of all bits present in a given column | | [`bit_xor`](#bit_xor) | Computes the bitwise XOR of all bits present in a given column | | [`bitstring_agg`](#bitstring_agg) | Computes a bitstring with bits set for each distinct value in a given column | | [`bool_and`](#bool_and) | Computes the logical AND of all values present in a given column | | [`bool_or`](#bool_or) | Computes the logical OR of all values present in a given column | | [`count`](#count) | Computes the number of elements present in a given column | | [`cume_dist`](#cume_dist) | Computes the cumulative distribution within the partition | | [`dense_rank`](#dense_rank) | Computes the dense rank within the partition | | [`distinct`](#distinct) | Retrieve distinct rows from this relation object | | [`favg`](#favg) | Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum) | | [`first`](#first) | Returns the first value of a given column | | [`first_value`](#first_value) | Computes the first value within the group or partition | | [`fsum`](#fsum) | Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum) | | [`geomean`](#geomean) | Computes the geometric mean over all values present in a given column | | [`histogram`](#histogram) | Computes the histogram over all values present in a given column | | [`lag`](#lag) | Computes the lag within the partition | | [`last`](#last) | Returns the last value of a given column | | [`last_value`](#last_value) | Computes the last value within the group or partition | | [`lead`](#lead) | Computes the lead within the partition | | [`list`](#list) | Returns a list containing all values present in a given column | | [`max`](#max) | Returns the maximum value present in a given column | | [`mean`](#mean) | Computes the average on a given column | | [`median`](#median) | Computes the median over all values present in a given column | | [`min`](#min) | Returns the minimum value present in a given column | | [`mode`](#mode) | Computes the mode over all values present in a given column | | [`n_tile`](#n_tile) | Divides the partition as equally as possible into num_buckets | | [`nth_value`](#nth_value) | Computes the nth value within the partition | | [`percent_rank`](#percent_rank) | Computes the relative rank within the partition | | [`product`](#product) | Returns the product of all values present in a given column | | [`quantile`](#quantile) | Computes the exact quantile value for a given column | | [`quantile_cont`](#quantile_cont) | Computes the interpolated quantile value for a given column | | [`quantile_disc`](#quantile_disc) | Computes the exact quantile value for a given column | | [`rank`](#rank) | Computes the rank within the partition | | [`rank_dense`](#rank_dense) | Computes the dense rank within the partition | | [`row_number`](#row_number) | Computes the row number within the partition | | [`select_dtypes`](#select_dtypes) | Select columns from the relation, by filtering based on type(s) | | [`select_types`](#select_types) | Select columns from the relation, by filtering based on type(s) | | [`std`](#std) | Computes the sample standard deviation for a given column | | [`stddev`](#stddev) | Computes the sample standard deviation for a given column | | [`stddev_pop`](#stddev_pop) | Computes the population standard deviation for a given column | | [`stddev_samp`](#stddev_samp) | Computes the sample standard deviation for a given column | | [`string_agg`](#string_agg) | Concatenates the values present in a given column with a separator | | [`sum`](#sum) | Computes the sum of all values present in a given column | | [`unique`](#unique) | Returns the distinct values in a column. | | [`value_counts`](#value_counts) | Computes the number of elements present in a given column, also projecting the original column | | [`var`](#var) | Computes the sample variance for a given column | | [`var_pop`](#var_pop) | Computes the population variance for a given column | | [`var_samp`](#var_samp) | Computes the sample variance for a given column | | [`variance`](#variance) | Computes the sample variance for a given column | #### `any_value` ##### Signature ```python any_value(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns the first non-null value from a given column ##### Parameters - **column** : str The column name from which to retrieve any value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.any_value('id') ``` ##### Result ```text ┌──────────────────────────────────────┐ │ any_value(id) │ │ uuid │ ├──────────────────────────────────────┤ │ 642ea3d7-793d-4867-a759-91c1226c25a0 │ └──────────────────────────────────────┘ ``` ---- #### `arg_max` ##### Signature ```python arg_max(self: duckdb.duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Finds the row with the maximum value for a value column and returns the value of that row for an argument column ##### Parameters - **arg_column** : str The column name for which to find the argument maximizing the value. - **value_column** : str The column name containing values used to determine the maximum. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.arg_max(arg_column="value", value_column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────────────────┐ │ description │ arg_max("value", "value") │ │ varchar │ int64 │ ├─────────────────┼───────────────────────────┤ │ value is uneven │ 9 │ │ value is even │ 8 │ └─────────────────┴───────────────────────────┘ ``` ---- #### `arg_min` ##### Signature ```python arg_min(self: duckdb.duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Finds the row with the minimum value for a value column and returns the value of that row for an argument column ##### Parameters - **arg_column** : str The column name for which to find the argument minimizing the value. - **value_column** : str The column name containing values used to determine the minimum. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.arg_min(arg_column="value", value_column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────────────────┐ │ description │ arg_min("value", "value") │ │ varchar │ int64 │ ├─────────────────┼───────────────────────────┤ │ value is even │ 2 │ │ value is uneven │ 1 │ └─────────────────┴───────────────────────────┘ ``` ---- #### `avg` ##### Signature ```python avg(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the average on a given column ##### Parameters - **column** : str The column name to calculate the average on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.avg('value') ``` ##### Result ```text ┌──────────────┐ │ avg("value") │ │ double │ ├──────────────┤ │ 5.0 │ └──────────────┘ ``` ---- #### `bit_and` ##### Signature ```python bit_and(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the bitwise AND of all bits present in a given column ##### Parameters - **column** : str The column name to perform the bitwise AND aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, value::bit as value_bit") rel.bit_and(column="value_bit", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────────────────────────────────────────────────┐ │ description │ bit_and(value_bit) │ │ varchar │ bit │ ├─────────────────┼──────────────────────────────────────────────────────────────────┤ │ value is uneven │ 0000000000000000000000000000000000000000000000000000000000000001 │ │ value is even │ 0000000000000000000000000000000000000000000000000000000000000000 │ └─────────────────┴──────────────────────────────────────────────────────────────────┘ ``` ---- #### `bit_or` ##### Signature ```python bit_or(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the bitwise OR of all bits present in a given column ##### Parameters - **column** : str The column name to perform the bitwise OR aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, value::bit as value_bit") rel.bit_or(column="value_bit", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────────────────────────────────────────────────┐ │ description │ bit_or(value_bit) │ │ varchar │ bit │ ├─────────────────┼──────────────────────────────────────────────────────────────────┤ │ value is uneven │ 0000000000000000000000000000000000000000000000000000000000001111 │ │ value is even │ 0000000000000000000000000000000000000000000000000000000000001110 │ └─────────────────┴──────────────────────────────────────────────────────────────────┘ ``` ---- #### `bit_xor` ##### Signature ```python bit_xor(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the bitwise XOR of all bits present in a given column ##### Parameters - **column** : str The column name to perform the bitwise XOR aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, value::bit as value_bit") rel.bit_xor(column="value_bit", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────────────────────────────────────────────────┐ │ description │ bit_xor(value_bit) │ │ varchar │ bit │ ├─────────────────┼──────────────────────────────────────────────────────────────────┤ │ value is even │ 0000000000000000000000000000000000000000000000000000000000001000 │ │ value is uneven │ 0000000000000000000000000000000000000000000000000000000000001001 │ └─────────────────┴──────────────────────────────────────────────────────────────────┘ ``` ---- #### `bitstring_agg` ##### Signature ```python bitstring_agg(self: duckdb.duckdb.DuckDBPyRelation, column: str, min: typing.Optional[object] = None, max: typing.Optional[object] = None, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes a bitstring with bits set for each distinct value in a given column ##### Parameters - **column** : str The column name to aggregate as a bitstring. - **min** : object, default: None Optional minimum bitstring value for aggregation. - **max** : object, default: None Optional maximum bitstring value for aggregation. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.bitstring_agg(column="value", groups="description", projected_columns="description", min=1, max=9) ``` ##### Result ```text ┌─────────────────┬────────────────────────┐ │ description │ bitstring_agg("value") │ │ varchar │ bit │ ├─────────────────┼────────────────────────┤ │ value is uneven │ 101010101 │ │ value is even │ 010101010 │ └─────────────────┴────────────────────────┘ ``` ---- #### `bool_and` ##### Signature ```python bool_and(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the logical AND of all values present in a given column ##### Parameters - **column** : str The column name to perform the boolean AND aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, mod(value,2)::boolean as uneven") rel.bool_and(column="uneven", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────┐ │ description │ bool_and(uneven) │ │ varchar │ boolean │ ├─────────────────┼──────────────────┤ │ value is even │ false │ │ value is uneven │ true │ └─────────────────┴──────────────────┘ ``` ---- #### `bool_or` ##### Signature ```python bool_or(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the logical OR of all values present in a given column ##### Parameters - **column** : str The column name to perform the boolean OR aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, mod(value,2)::boolean as uneven") rel.bool_or(column="uneven", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬─────────────────┐ │ description │ bool_or(uneven) │ │ varchar │ boolean │ ├─────────────────┼─────────────────┤ │ value is even │ false │ │ value is uneven │ true │ └─────────────────┴─────────────────┘ ``` ---- #### `count` ##### Signature ```python count(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the number of elements present in a given column ##### Parameters - **column** : str The column name to perform count on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.count("id") ``` ##### Result ```text ┌───────────┐ │ count(id) │ │ int64 │ ├───────────┤ │ 9 │ └───────────┘ ``` ---- #### `cume_dist` ##### Signature ```python cume_dist(self: duckdb.duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the cumulative distribution within the partition ##### Parameters - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.cume_dist(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬──────────────────────────────────────────────────────────────┐ │ description │ value │ cume_dist() OVER (PARTITION BY description ORDER BY "value") │ │ varchar │ int64 │ double │ ├─────────────────┼───────┼──────────────────────────────────────────────────────────────┤ │ value is uneven │ 1 │ 0.2 │ │ value is uneven │ 3 │ 0.4 │ │ value is uneven │ 5 │ 0.6 │ │ value is uneven │ 7 │ 0.8 │ │ value is uneven │ 9 │ 1.0 │ │ value is even │ 2 │ 0.25 │ │ value is even │ 4 │ 0.5 │ │ value is even │ 6 │ 0.75 │ │ value is even │ 8 │ 1.0 │ └─────────────────┴───────┴──────────────────────────────────────────────────────────────┘ ``` ---- #### `dense_rank` ##### Signature ```python dense_rank(self: duckdb.duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the dense rank within the partition **Aliases**: [`rank_dense`](#rank_dense) ##### Parameters - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.dense_rank(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐ │ description │ value │ dense_rank() OVER (PARTITION BY description ORDER BY "value") │ │ varchar │ int64 │ int64 │ ├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤ │ value is even │ 2 │ 1 │ │ value is even │ 4 │ 2 │ │ value is even │ 6 │ 3 │ │ value is even │ 8 │ 4 │ │ value is uneven │ 1 │ 1 │ │ value is uneven │ 3 │ 2 │ │ value is uneven │ 5 │ 3 │ │ value is uneven │ 7 │ 4 │ │ value is uneven │ 9 │ 5 │ └─────────────────┴───────┴───────────────────────────────────────────────────────────────┘ ``` ---- #### `distinct` ##### Signature ```python distinct(self: duckdb.duckdb.DuckDBPyRelation) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Retrieve distinct rows from this relation object ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("select range from range(1,4)") rel = rel.union(union_rel=rel) rel.distinct().order("range") ``` ##### Result ```text ┌───────┐ │ range │ │ int64 │ ├───────┤ │ 1 │ │ 2 │ │ 3 │ └───────┘ ``` ---- #### `favg` ##### Signature ```python favg(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum) ##### Parameters - **column** : str The column name to calculate the average on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.favg(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────┐ │ description │ favg("value") │ │ varchar │ double │ ├─────────────────┼───────────────┤ │ value is uneven │ 5.0 │ │ value is even │ 5.0 │ └─────────────────┴───────────────┘ ``` ---- #### `first` ##### Signature ```python first(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns the first value of a given column ##### Parameters - **column** : str The column name from which to retrieve the first value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.first(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────┐ │ description │ "first"("value") │ │ varchar │ int64 │ ├─────────────────┼──────────────────┤ │ value is even │ 2 │ │ value is uneven │ 1 │ └─────────────────┴──────────────────┘ ``` ---- #### `first_value` ##### Signature ```python first_value(self: duckdb.duckdb.DuckDBPyRelation, column: str, window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the first value within the group or partition ##### Parameters - **column** : str The column name from which to retrieve the first value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.first_value(column="value", window_spec="over (partition by description order by value)", projected_columns="description").distinct() ``` ##### Result ```text ┌─────────────────┬───────────────────────────────────────────────────────────────────────┐ │ description │ first_value("value") OVER (PARTITION BY description ORDER BY "value") │ │ varchar │ int64 │ ├─────────────────┼───────────────────────────────────────────────────────────────────────┤ │ value is even │ 2 │ │ value is uneven │ 1 │ └─────────────────┴───────────────────────────────────────────────────────────────────────┘ ``` ---- #### `fsum` ##### Signature ```python fsum(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum) ##### Parameters - **column** : str The column name to calculate the sum on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fsum(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────┐ │ description │ fsum("value") │ │ varchar │ double │ ├─────────────────┼───────────────┤ │ value is even │ 20.0 │ │ value is uneven │ 25.0 │ └─────────────────┴───────────────┘ ``` ---- #### `geomean` ##### Signature ```python geomean(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the geometric mean over all values present in a given column ##### Parameters - **column** : str The column name to calculate the geometric mean on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.geomean(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────────┐ │ description │ geomean("value") │ │ varchar │ double │ ├─────────────────┼───────────────────┤ │ value is uneven │ 3.936283427035351 │ │ value is even │ 4.426727678801287 │ └─────────────────┴───────────────────┘ ``` ---- #### `histogram` ##### Signature ```python histogram(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the histogram over all values present in a given column ##### Parameters - **column** : str The column name to calculate the histogram on. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.histogram(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────────────────┐ │ description │ histogram("value") │ │ varchar │ map(bigint, ubigint) │ ├─────────────────┼───────────────────────────┤ │ value is uneven │ {1=1, 3=1, 5=1, 7=1, 9=1} │ │ value is even │ {2=1, 4=1, 6=1, 8=1} │ └─────────────────┴───────────────────────────┘ ``` ---- #### `lag` ##### Signature ```python lag(self: duckdb.duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: int = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the lag within the partition ##### Parameters - **column** : str The column name to apply the lag function on. - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **offset** : int, default: 1 The number of rows to lag behind. - **default_value** : str, default: 'NULL' The default value to return when the lag offset goes out of bounds. - **ignore_nulls** : bool, default: False Whether to ignore NULL values when computing the lag. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.lag(column="description", window_spec="over (order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬───────────────────────────────────────────────────┐ │ description │ value │ lag(description, 1, NULL) OVER (ORDER BY "value") │ │ varchar │ int64 │ varchar │ ├─────────────────┼───────┼───────────────────────────────────────────────────┤ │ value is uneven │ 1 │ NULL │ │ value is even │ 2 │ value is uneven │ │ value is uneven │ 3 │ value is even │ │ value is even │ 4 │ value is uneven │ │ value is uneven │ 5 │ value is even │ │ value is even │ 6 │ value is uneven │ │ value is uneven │ 7 │ value is even │ │ value is even │ 8 │ value is uneven │ │ value is uneven │ 9 │ value is even │ └─────────────────┴───────┴───────────────────────────────────────────────────┘ ``` ---- #### `last` ##### Signature ```python last(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns the last value of a given column ##### Parameters - **column** : str The column name from which to retrieve the last value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.last(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬─────────────────┐ │ description │ "last"("value") │ │ varchar │ int64 │ ├─────────────────┼─────────────────┤ │ value is even │ 8 │ │ value is uneven │ 9 │ └─────────────────┴─────────────────┘ ``` ---- #### `last_value` ##### Signature ```python last_value(self: duckdb.duckdb.DuckDBPyRelation, column: str, window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the last value within the group or partition ##### Parameters - **column** : str The column name from which to retrieve the last value within the window. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.last_value(column="value", window_spec="over (order by description)", projected_columns="description").distinct() ``` ##### Result ```text ┌─────────────────┬─────────────────────────────────────────────────┐ │ description │ last_value("value") OVER (ORDER BY description) │ │ varchar │ int64 │ ├─────────────────┼─────────────────────────────────────────────────┤ │ value is uneven │ 9 │ │ value is even │ 8 │ └─────────────────┴─────────────────────────────────────────────────┘ ``` ---- #### `lead` ##### Signature ```python lead(self: duckdb.duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: int = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the lead within the partition ##### Parameters - **column** : str The column name to apply the lead function on. - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **offset** : int, default: 1 The number of rows to lead ahead. - **default_value** : str, default: 'NULL' The default value to return when the lead offset goes out of bounds. - **ignore_nulls** : bool, default: False Whether to ignore NULL values when computing the lead. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.lead(column="description", window_spec="over (order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬────────────────────────────────────────────────────┐ │ description │ value │ lead(description, 1, NULL) OVER (ORDER BY "value") │ │ varchar │ int64 │ varchar │ ├─────────────────┼───────┼────────────────────────────────────────────────────┤ │ value is uneven │ 1 │ value is even │ │ value is even │ 2 │ value is uneven │ │ value is uneven │ 3 │ value is even │ │ value is even │ 4 │ value is uneven │ │ value is uneven │ 5 │ value is even │ │ value is even │ 6 │ value is uneven │ │ value is uneven │ 7 │ value is even │ │ value is even │ 8 │ value is uneven │ │ value is uneven │ 9 │ NULL │ └─────────────────┴───────┴────────────────────────────────────────────────────┘ ``` ---- #### `list` ##### Signature ```python list(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns a list containing all values present in a given column ##### Parameters - **column** : str The column name to aggregate values into a list. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.list(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬─────────────────┐ │ description │ list("value") │ │ varchar │ int64[] │ ├─────────────────┼─────────────────┤ │ value is even │ [2, 4, 6, 8] │ │ value is uneven │ [1, 3, 5, 7, 9] │ └─────────────────┴─────────────────┘ ``` ---- #### `max` ##### Signature ```python max(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns the maximum value present in a given column ##### Parameters - **column** : str The column name to calculate the maximum value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.max(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────┐ │ description │ max("value") │ │ varchar │ int64 │ ├─────────────────┼──────────────┤ │ value is even │ 8 │ │ value is uneven │ 9 │ └─────────────────┴──────────────┘ ``` ---- #### `mean` ##### Signature ```python mean(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the average on a given column ##### Parameters - **column** : str The column name to calculate the mean value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.mean(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────┐ │ description │ avg("value") │ │ varchar │ double │ ├─────────────────┼──────────────┤ │ value is even │ 5.0 │ │ value is uneven │ 5.0 │ └─────────────────┴──────────────┘ ``` ---- #### `median` ##### Signature ```python median(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the median over all values present in a given column ##### Parameters - **column** : str The column name to calculate the median value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.median(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬─────────────────┐ │ description │ median("value") │ │ varchar │ double │ ├─────────────────┼─────────────────┤ │ value is even │ 5.0 │ │ value is uneven │ 5.0 │ └─────────────────┴─────────────────┘ ``` ---- #### `min` ##### Signature ```python min(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns the minimum value present in a given column ##### Parameters - **column** : str The column name to calculate the min value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.min(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────┐ │ description │ min("value") │ │ varchar │ int64 │ ├─────────────────┼──────────────┤ │ value is uneven │ 1 │ │ value is even │ 2 │ └─────────────────┴──────────────┘ ``` ---- #### `mode` ##### Signature ```python mode(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the mode over all values present in a given column ##### Parameters - **column** : str The column name to calculate the mode (most frequent value) of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.mode(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬─────────────────┐ │ description │ "mode"("value") │ │ varchar │ int64 │ ├─────────────────┼─────────────────┤ │ value is uneven │ 1 │ │ value is even │ 2 │ └─────────────────┴─────────────────┘ ``` ---- #### `n_tile` ##### Signature ```python n_tile(self: duckdb.duckdb.DuckDBPyRelation, window_spec: str, num_buckets: int, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Divides the partition as equally as possible into num_buckets ##### Parameters - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **num_buckets** : int The number of buckets to divide the rows into. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.n_tile(window_spec="over (partition by description)", num_buckets=2, projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬──────────────────────────────────────────┐ │ description │ value │ ntile(2) OVER (PARTITION BY description) │ │ varchar │ int64 │ int64 │ ├─────────────────┼───────┼──────────────────────────────────────────┤ │ value is uneven │ 1 │ 1 │ │ value is uneven │ 3 │ 1 │ │ value is uneven │ 5 │ 1 │ │ value is uneven │ 7 │ 2 │ │ value is uneven │ 9 │ 2 │ │ value is even │ 2 │ 1 │ │ value is even │ 4 │ 1 │ │ value is even │ 6 │ 2 │ │ value is even │ 8 │ 2 │ └─────────────────┴───────┴──────────────────────────────────────────┘ ``` ---- #### `nth_value` ##### Signature ```python nth_value(self: duckdb.duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: int, ignore_nulls: bool = False, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the nth value within the partition ##### Parameters - **column** : str The column name from which to retrieve the nth value within the window. - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **offset** : int The position of the value to retrieve within the window (1-based index). - **ignore_nulls** : bool, default: False Whether to ignore NULL values when computing the nth value. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.nth_value(column="value", window_spec="over (partition by description)", projected_columns="description", offset=1) ``` ##### Result ```text ┌─────────────────┬───────────────────────────────────────────────────────┐ │ description │ nth_value("value", 1) OVER (PARTITION BY description) │ │ varchar │ int64 │ ├─────────────────┼───────────────────────────────────────────────────────┤ │ value is even │ 2 │ │ value is even │ 2 │ │ value is even │ 2 │ │ value is even │ 2 │ │ value is uneven │ 1 │ │ value is uneven │ 1 │ │ value is uneven │ 1 │ │ value is uneven │ 1 │ │ value is uneven │ 1 │ └─────────────────┴───────────────────────────────────────────────────────┘ ``` ---- #### `percent_rank` ##### Signature ```python percent_rank(self: duckdb.duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the relative rank within the partition ##### Parameters - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.percent_rank(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬─────────────────────────────────────────────────────────────────┐ │ description │ value │ percent_rank() OVER (PARTITION BY description ORDER BY "value") │ │ varchar │ int64 │ double │ ├─────────────────┼───────┼─────────────────────────────────────────────────────────────────┤ │ value is even │ 2 │ 0.0 │ │ value is even │ 4 │ 0.3333333333333333 │ │ value is even │ 6 │ 0.6666666666666666 │ │ value is even │ 8 │ 1.0 │ │ value is uneven │ 1 │ 0.0 │ │ value is uneven │ 3 │ 0.25 │ │ value is uneven │ 5 │ 0.5 │ │ value is uneven │ 7 │ 0.75 │ │ value is uneven │ 9 │ 1.0 │ └─────────────────┴───────┴─────────────────────────────────────────────────────────────────┘ ``` ---- #### `product` ##### Signature ```python product(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns the product of all values present in a given column ##### Parameters - **column** : str The column name to calculate the product of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.product(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────┐ │ description │ product("value") │ │ varchar │ double │ ├─────────────────┼──────────────────┤ │ value is uneven │ 945.0 │ │ value is even │ 384.0 │ └─────────────────┴──────────────────┘ ``` ---- #### `quantile` ##### Signature ```python quantile(self: duckdb.duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the exact quantile value for a given column ##### Parameters - **column** : str The column name to compute the quantile for. - **q** : object, default: 0.5 The quantile value to compute (e.g., 0.5 for median). - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.quantile(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────────────────┐ │ description │ quantile_disc("value", 0.500000) │ │ varchar │ int64 │ ├─────────────────┼──────────────────────────────────┤ │ value is uneven │ 5 │ │ value is even │ 4 │ └─────────────────┴──────────────────────────────────┘ ``` ---- #### `quantile_cont` ##### Signature ```python quantile_cont(self: duckdb.duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the interpolated quantile value for a given column ##### Parameters - **column** : str The column name to compute the continuous quantile for. - **q** : object, default: 0.5 The quantile value to compute (e.g., 0.5 for median). - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.quantile_cont(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────────────────┐ │ description │ quantile_cont("value", 0.500000) │ │ varchar │ double │ ├─────────────────┼──────────────────────────────────┤ │ value is even │ 5.0 │ │ value is uneven │ 5.0 │ └─────────────────┴──────────────────────────────────┘ ``` ---- #### `quantile_disc` ##### Signature ```python quantile_disc(self: duckdb.duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the exact quantile value for a given column ##### Parameters - **column** : str The column name to compute the discrete quantile for. - **q** : object, default: 0.5 The quantile value to compute (e.g., 0.5 for median). - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.quantile_disc(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────────────────┐ │ description │ quantile_disc("value", 0.500000) │ │ varchar │ int64 │ ├─────────────────┼──────────────────────────────────┤ │ value is even │ 4 │ │ value is uneven │ 5 │ └─────────────────┴──────────────────────────────────┘ ``` ---- #### `rank` ##### Signature ```python rank(self: duckdb.duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the rank within the partition ##### Parameters - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.rank(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬─────────────────────────────────────────────────────────┐ │ description │ value │ rank() OVER (PARTITION BY description ORDER BY "value") │ │ varchar │ int64 │ int64 │ ├─────────────────┼───────┼─────────────────────────────────────────────────────────┤ │ value is uneven │ 1 │ 1 │ │ value is uneven │ 3 │ 2 │ │ value is uneven │ 5 │ 3 │ │ value is uneven │ 7 │ 4 │ │ value is uneven │ 9 │ 5 │ │ value is even │ 2 │ 1 │ │ value is even │ 4 │ 2 │ │ value is even │ 6 │ 3 │ │ value is even │ 8 │ 4 │ └─────────────────┴───────┴─────────────────────────────────────────────────────────┘ ``` ---- #### `rank_dense` ##### Signature ```python rank_dense(self: duckdb.duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the dense rank within the partition **Aliases**: [`dense_rank`](#dense_rank) ##### Parameters - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.rank_dense(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐ │ description │ value │ dense_rank() OVER (PARTITION BY description ORDER BY "value") │ │ varchar │ int64 │ int64 │ ├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤ │ value is uneven │ 1 │ 1 │ │ value is uneven │ 3 │ 2 │ │ value is uneven │ 5 │ 3 │ │ value is uneven │ 7 │ 4 │ │ value is uneven │ 9 │ 5 │ │ value is even │ 2 │ 1 │ │ value is even │ 4 │ 2 │ │ value is even │ 6 │ 3 │ │ value is even │ 8 │ 4 │ └─────────────────┴───────┴───────────────────────────────────────────────────────────────┘ ``` ---- #### `row_number` ##### Signature ```python row_number(self: duckdb.duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the row number within the partition ##### Parameters - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.row_number(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ##### Result ```text ┌─────────────────┬───────┬───────────────────────────────────────────────────────────────┐ │ description │ value │ row_number() OVER (PARTITION BY description ORDER BY "value") │ │ varchar │ int64 │ int64 │ ├─────────────────┼───────┼───────────────────────────────────────────────────────────────┤ │ value is uneven │ 1 │ 1 │ │ value is uneven │ 3 │ 2 │ │ value is uneven │ 5 │ 3 │ │ value is uneven │ 7 │ 4 │ │ value is uneven │ 9 │ 5 │ │ value is even │ 2 │ 1 │ │ value is even │ 4 │ 2 │ │ value is even │ 6 │ 3 │ │ value is even │ 8 │ 4 │ └─────────────────┴───────┴───────────────────────────────────────────────────────────────┘ ``` ---- #### `select_dtypes` ##### Signature ```python select_dtypes(self: duckdb.duckdb.DuckDBPyRelation, types: object) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Select columns from the relation, by filtering based on type(s) **Aliases**: [`select_types`](#select_types) ##### Parameters - **types** : object Data type(s) to select columns by. Can be a single type or a collection of types. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select_dtypes(types=[duckdb.typing.VARCHAR]).distinct() ``` ##### Result ```text ┌─────────────────┐ │ description │ │ varchar │ ├─────────────────┤ │ value is even │ │ value is uneven │ └─────────────────┘ ``` ---- #### `select_types` ##### Signature ```python select_types(self: duckdb.duckdb.DuckDBPyRelation, types: object) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Select columns from the relation, by filtering based on type(s) **Aliases**: [`select_dtypes`](#select_dtypes) ##### Parameters - **types** : object Data type(s) to select columns by. Can be a single type or a collection of types. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select_types(types=[duckdb.typing.VARCHAR]).distinct() ``` ##### Result ```text ┌─────────────────┐ │ description │ │ varchar │ ├─────────────────┤ │ value is even │ │ value is uneven │ └─────────────────┘ ``` ---- #### `std` ##### Signature ```python std(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sample standard deviation for a given column **Aliases**: [`stddev`](#stddev), [`stddev_samp`](#stddev_samp) ##### Parameters - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.std(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────┐ │ description │ stddev_samp("value") │ │ varchar │ double │ ├─────────────────┼──────────────────────┤ │ value is uneven │ 3.1622776601683795 │ │ value is even │ 2.581988897471611 │ └─────────────────┴──────────────────────┘ ``` ---- #### `stddev` ##### Signature ```python stddev(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sample standard deviation for a given column **Aliases**: [`std`](#std), [`stddev_samp`](#stddev_samp) ##### Parameters - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.stddev(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────┐ │ description │ stddev_samp("value") │ │ varchar │ double │ ├─────────────────┼──────────────────────┤ │ value is even │ 2.581988897471611 │ │ value is uneven │ 3.1622776601683795 │ └─────────────────┴──────────────────────┘ ``` ---- #### `stddev_pop` ##### Signature ```python stddev_pop(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the population standard deviation for a given column ##### Parameters - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.stddev_pop(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬─────────────────────┐ │ description │ stddev_pop("value") │ │ varchar │ double │ ├─────────────────┼─────────────────────┤ │ value is even │ 2.23606797749979 │ │ value is uneven │ 2.8284271247461903 │ └─────────────────┴─────────────────────┘ ``` ---- #### `stddev_samp` ##### Signature ```python stddev_samp(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sample standard deviation for a given column **Aliases**: [`stddev`](#stddev), [`std`](#std) ##### Parameters - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.stddev_samp(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────┐ │ description │ stddev_samp("value") │ │ varchar │ double │ ├─────────────────┼──────────────────────┤ │ value is even │ 2.581988897471611 │ │ value is uneven │ 3.1622776601683795 │ └─────────────────┴──────────────────────┘ ``` ---- #### `string_agg` ##### Signature ```python string_agg(self: duckdb.duckdb.DuckDBPyRelation, column: str, sep: str = ',', groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Concatenates the values present in a given column with a separator ##### Parameters - **column** : str The column name to concatenate values from. - **sep** : str, default: ',' Separator string to use between concatenated values. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.string_agg(column="value", sep=",", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────────────┐ │ description │ string_agg("value", ',') │ │ varchar │ varchar │ ├─────────────────┼──────────────────────────┤ │ value is even │ 2,4,6,8 │ │ value is uneven │ 1,3,5,7,9 │ └─────────────────┴──────────────────────────┘ ``` ---- #### `sum` ##### Signature ```python sum(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sum of all values present in a given column ##### Parameters - **column** : str The column name to calculate the sum for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.sum(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────┐ │ description │ sum("value") │ │ varchar │ int128 │ ├─────────────────┼──────────────┤ │ value is even │ 20 │ │ value is uneven │ 25 │ └─────────────────┴──────────────┘ ``` ---- #### `unique` ##### Signature ```python unique(self: duckdb.duckdb.DuckDBPyRelation, unique_aggr: str) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Returns the distinct values in a column. ##### Parameters - **unique_aggr** : str The column to get the distinct values for. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.unique(unique_aggr="description") ``` ##### Result ```text ┌─────────────────┐ │ description │ │ varchar │ ├─────────────────┤ │ value is even │ │ value is uneven │ └─────────────────┘ ``` ---- #### `value_counts` ##### Signature ```python value_counts(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the number of elements present in a given column, also projecting the original column ##### Parameters - **column** : str The column name to count values from. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.value_counts(column="description", groups="description") ``` ##### Result ```text ┌─────────────────┬────────────────────┐ │ description │ count(description) │ │ varchar │ int64 │ ├─────────────────┼────────────────────┤ │ value is uneven │ 5 │ │ value is even │ 4 │ └─────────────────┴────────────────────┘ ``` ---- #### `var` ##### Signature ```python var(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sample variance for a given column **Aliases**: [`variance`](#variance), [`var_samp`](#var_samp) ##### Parameters - **column** : str The column name to calculate the sample variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.var(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────────┐ │ description │ var_samp("value") │ │ varchar │ double │ ├─────────────────┼───────────────────┤ │ value is even │ 6.666666666666667 │ │ value is uneven │ 10.0 │ └─────────────────┴───────────────────┘ ``` ---- #### `var_pop` ##### Signature ```python var_pop(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the population variance for a given column ##### Parameters - **column** : str The column name to calculate the population variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.var_pop(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬──────────────────┐ │ description │ var_pop("value") │ │ varchar │ double │ ├─────────────────┼──────────────────┤ │ value is even │ 5.0 │ │ value is uneven │ 8.0 │ └─────────────────┴──────────────────┘ ``` ---- #### `var_samp` ##### Signature ```python var_samp(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sample variance for a given column **Aliases**: [`variance`](#variance), [`var`](#var) ##### Parameters - **column** : str The column name to calculate the sample variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.var_samp(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────────┐ │ description │ var_samp("value") │ │ varchar │ double │ ├─────────────────┼───────────────────┤ │ value is even │ 6.666666666666667 │ │ value is uneven │ 10.0 │ └─────────────────┴───────────────────┘ ``` ---- #### `variance` ##### Signature ```python variance(self: duckdb.duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Computes the sample variance for a given column **Aliases**: [`var`](#var), [`var_samp`](#var_samp) ##### Parameters - **column** : str The column name to calculate the sample variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.variance(column="value", groups="description", projected_columns="description") ``` ##### Result ```text ┌─────────────────┬───────────────────┐ │ description │ var_samp("value") │ │ varchar │ double │ ├─────────────────┼───────────────────┤ │ value is even │ 6.666666666666667 │ │ value is uneven │ 10.0 │ └─────────────────┴───────────────────┘ ``` ## Output This section contains the functions which will trigger an SQL execution and retrieve the data. | Name | Description | |:--|:-------| | [`arrow`](#arrow) | Execute and fetch all rows as an Arrow Table | | [`close`](#close) | Closes the result | | [`create`](#create) | Creates a new table named table_name with the contents of the relation object | | [`create_view`](#create_view) | Creates a view named view_name that refers to the relation object | | [`df`](#df) | Execute and fetch all rows as a pandas DataFrame | | [`execute`](#execute) | Transform the relation into a result set | | [`fetch_arrow_reader`](#fetch_arrow_reader) | Execute and return an Arrow Record Batch Reader that yields all rows | | [`fetch_arrow_table`](#fetch_arrow_table) | Execute and fetch all rows as an Arrow Table | | [`fetch_df_chunk`](#fetch_df_chunk) | Execute and fetch a chunk of the rows | | [`fetchall`](#fetchall) | Execute and fetch all rows as a list of tuples | | [`fetchdf`](#fetchdf) | Execute and fetch all rows as a pandas DataFrame | | [`fetchmany`](#fetchmany) | Execute and fetch the next set of rows as a list of tuples | | [`fetchnumpy`](#fetchnumpy) | Execute and fetch all rows as a Python dict mapping each column to one numpy arrays | | [`fetchone`](#fetchone) | Execute and fetch a single row as a tuple | | [`pl`](#pl) | Execute and fetch all rows as a Polars DataFrame | | [`record_batch`](#record_batch) | Execute and return an Arrow Record Batch Reader that yields all rows | | [`tf`](#tf) | Fetch a result as dict of TensorFlow Tensors | | [`to_arrow_table`](#to_arrow_table) | Execute and fetch all rows as an Arrow Table | | [`to_csv`](#to_csv) | Write the relation object to a CSV file in 'file_name' | | [`to_df`](#to_df) | Execute and fetch all rows as a pandas DataFrame | | [`to_parquet`](#to_parquet) | Write the relation object to a Parquet file in 'file_name' | | [`to_table`](#to_table) | Creates a new table named table_name with the contents of the relation object | | [`to_view`](#to_view) | Creates a view named view_name that refers to the relation object | | [`torch`](#torch) | Fetch a result as dict of PyTorch Tensors | | [`write_csv`](#write_csv) | Write the relation object to a CSV file in 'file_name' | | [`write_parquet`](#write_parquet) | Write the relation object to a Parquet file in 'file_name' | #### `arrow` ##### Signature ```python arrow(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) -> pyarrow.lib.Table ``` ##### Description Execute and fetch all rows as an Arrow Table **Aliases**: [`fetch_arrow_table`](#fetch_arrow_table), [`to_arrow_table`](#to_arrow_table) ##### Parameters - **batch_size** : int, default: 1000000 The batch size of writing the data to the Arrow table ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) pa_table = rel.arrow() pa_table ``` ##### Result ```text pyarrow.Table id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: [["3ac9e0ba-8390-4a02-ad72-33b1caea6354","8b844392-1404-4bbc-b731-120f42c8ca27","ca5584ca-8e97-4fca-a295-ae3c16c32f5b","926d071e-5f64-488f-ae02-d19e315f9f5c","aabeedf0-5783-4eff-9963-b3967a6ea5d8","1f20db9a-bee8-4b65-b7e8-e7c36b5b8fee","795c678e-3524-4b52-96ec-7b48c24eeab1","9ffbd403-169f-4fe4-bc41-09751066f1f1","8fdb0a60-29f0-4f5b-afcc-c736a03cd083"]] description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]] value: [[1,2,3,4,5,6,7,8,9]] created_timestamp: [[2025-04-10 09:07:12.614000Z,2025-04-10 09:08:12.614000Z,2025-04-10 09:09:12.614000Z,2025-04-10 09:10:12.614000Z,2025-04-10 09:11:12.614000Z,2025-04-10 09:12:12.614000Z,2025-04-10 09:13:12.614000Z,2025-04-10 09:14:12.614000Z,2025-04-10 09:15:12.614000Z]] ``` ---- #### `close` ##### Signature ```python close(self: duckdb.duckdb.DuckDBPyRelation) -> None ``` ##### Description Closes the result ---- #### `create` ##### Signature ```python create(self: duckdb.duckdb.DuckDBPyRelation, table_name: str) -> None ``` ##### Description Creates a new table named table_name with the contents of the relation object **Aliases**: [`to_table`](#to_table) ##### Parameters - **table_name** : str The name of the table to be created. There shouldn't be any other table with the same name. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.create("table_code_example") duckdb_conn.table("table_code_example").limit(1) ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │ 1 │ 2025-04-10 11:07:12.614+02 │ └──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘ ``` ---- #### `create_view` ##### Signature ```python create_view(self: duckdb.duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Creates a view named view_name that refers to the relation object **Aliases**: [`to_view`](#to_view) ##### Parameters - **view_name** : str The name of the view to be created. - **replace** : bool, default: True If the view should be created with `CREATE OR REPLACE`. When set to `False`, there shouldn't be another view with the same `view_name`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.create_view("view_code_example", replace=True) duckdb_conn.table("view_code_example").limit(1) ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │ 1 │ 2025-04-10 11:07:12.614+02 │ └──────────────────────────────────────┴─────────────────┴───────┴────────────────────────────┘ ``` ---- #### `df` ##### Signature ```python df(self: duckdb.duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame ``` ##### Description Execute and fetch all rows as a pandas DataFrame **Aliases**: [`fetchdf`](#fetchdf), [`to_df`](#to_df) ##### Parameters - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.df() ``` ##### Result ```text id description value created_timestamp 0 3ac9e0ba-8390-4a02-ad72-33b1caea6354 value is uneven 1 2025-04-10 11:07:12.614000+02:00 1 8b844392-1404-4bbc-b731-120f42c8ca27 value is even 2 2025-04-10 11:08:12.614000+02:00 2 ca5584ca-8e97-4fca-a295-ae3c16c32f5b value is uneven 3 2025-04-10 11:09:12.614000+02:00 ... ``` ---- #### `execute` ##### Signature ```python execute(self: duckdb.duckdb.DuckDBPyRelation) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Transform the relation into a result set ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.execute() ``` ##### Result ```text ┌──────────────────────────────────────┬─────────────────┬───────┬────────────────────────────┐ │ id │ description │ value │ created_timestamp │ │ uuid │ varchar │ int64 │ timestamp with time zone │ ├──────────────────────────────────────┼─────────────────┼───────┼────────────────────────────┤ │ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 │ value is uneven │ 1 │ 2025-04-10 11:07:12.614+02 │ │ 8b844392-1404-4bbc-b731-120f42c8ca27 │ value is even │ 2 │ 2025-04-10 11:08:12.614+02 │ │ ca5584ca-8e97-4fca-a295-ae3c16c32f5b │ value is uneven │ 3 │ 2025-04-10 11:09:12.614+02 │ ``` ---- #### `fetch_arrow_reader` ##### Signature ```python fetch_arrow_reader(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) -> pyarrow.lib.RecordBatchReader ``` ##### Description Execute and return an Arrow Record Batch Reader that yields all rows ##### Parameters - **batch_size** : int, default: 1000000 The batch size for fetching the data. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) pa_reader = rel.fetch_arrow_reader(batch_size=1) pa_reader.read_next_batch() ``` ##### Result ```text pyarrow.RecordBatch id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: ["e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd"] description: ["value is even"] value: [2] created_timestamp: [2025-04-10 09:25:51.259000Z] ``` ---- #### `fetch_arrow_table` ##### Signature ```python fetch_arrow_table(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) -> pyarrow.lib.Table ``` ##### Description Execute and fetch all rows as an Arrow Table **Aliases**: [`arrow`](#arrow), [`to_arrow_table`](#to_arrow_table) ##### Parameters - **batch_size** : int, default: 1000000 The batch size for fetching the data. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetch_arrow_table() ``` ##### Result ```text pyarrow.Table id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: [["1587b4b0-3023-49fe-82cf-06303ca136ac","e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd","3f8ad67a-290f-4a22-b41b-0173b8e45afa","9a4e37ef-d8bd-46dd-ab01-51cf4973549f","12baa624-ebc9-45ae-b73e-6f4029e31d2d","56d41292-53cc-48be-a1b8-e1f5d6ca5581","1accca18-c950-47c1-9108-aef8afbd5249","56d8db75-72c4-4d40-90d2-a3c840579c37","e19f6201-8646-401c-b019-e37c42c39632"]] description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]] value: [[1,2,3,4,5,6,7,8,9]] created_timestamp: [[2025-04-10 09:24:51.259000Z,2025-04-10 09:25:51.259000Z,2025-04-10 09:26:51.259000Z,2025-04-10 09:27:51.259000Z,2025-04-10 09:28:51.259000Z,2025-04-10 09:29:51.259000Z,2025-04-10 09:30:51.259000Z,2025-04-10 09:31:51.259000Z,2025-04-10 09:32:51.259000Z]] ``` ---- #### `fetch_df_chunk` ##### Signature ```python fetch_df_chunk(self: duckdb.duckdb.DuckDBPyRelation, vectors_per_chunk: int = 1, *, date_as_object: bool = False) -> pandas.DataFrame ``` ##### Description Execute and fetch a chunk of the rows ##### Parameters - **vectors_per_chunk** : int, default: 1 Number of data chunks to be processed before converting to dataframe. - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetch_df_chunk() ``` ##### Result ```text id description value created_timestamp 0 1587b4b0-3023-49fe-82cf-06303ca136ac value is uneven 1 2025-04-10 11:24:51.259000+02:00 1 e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd value is even 2 2025-04-10 11:25:51.259000+02:00 2 3f8ad67a-290f-4a22-b41b-0173b8e45afa value is uneven 3 2025-04-10 11:26:51.259000+02:00 ... ``` ---- #### `fetchall` ##### Signature ```python fetchall(self: duckdb.duckdb.DuckDBPyRelation) -> list ``` ##### Description Execute and fetch all rows as a list of tuples ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.limit(1).fetchall() ``` ##### Result ```text [(UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'), 'value is uneven', 1, datetime.datetime(2025, 4, 10, 11, 24, 51, 259000, tzinfo=))] ``` ---- #### `fetchdf` ##### Signature ```python fetchdf(self: duckdb.duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame ``` ##### Description Execute and fetch all rows as a pandas DataFrame **Aliases**: [`df`](#df), [`to_df`](#to_df) ##### Parameters - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetchdf() ``` ##### Result ```text id description value created_timestamp 0 1587b4b0-3023-49fe-82cf-06303ca136ac value is uneven 1 2025-04-10 11:24:51.259000+02:00 1 e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd value is even 2 2025-04-10 11:25:51.259000+02:00 2 3f8ad67a-290f-4a22-b41b-0173b8e45afa value is uneven 3 2025-04-10 11:26:51.259000+02:00 ... ``` ---- #### `fetchmany` ##### Signature ```python fetchmany(self: duckdb.duckdb.DuckDBPyRelation, size: int = 1) -> list ``` ##### Description Execute and fetch the next set of rows as a list of tuples >Warning Executing any operation during the retrieval of the data from an [aggregate](#aggregate) relation, >will close the result set. >```python >import duckdb > >duckdb_conn = duckdb.connect() > >rel = duckdb_conn.sql(""" > select > gen_random_uuid() as id, > concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, > range as value, > now() + concat(range,' ', 'minutes')::interval as created_timestamp > from range(1, 10) > """ >) > >agg_rel = rel.aggregate("value") > >while res := agg_rel.fetchmany(size=1): > print(res) > rel.show() >``` ##### Parameters - **size** : int, default: 1 The number of records to be fetched. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) while res := rel.fetchmany(size=1): print(res) ``` ##### Result ```text [(UUID('cf4c5e32-d0aa-4699-a3ee-0092e900f263'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 16, 23, 5, 310000, tzinfo=))] [(UUID('cec335ac-24ac-49a3-ae9a-bb35f71fc88d'), 'value is even', 2, datetime.datetime(2025, 4, 30, 16, 24, 5, 310000, tzinfo=))] [(UUID('2423295d-9bb0-453c-a385-21bdacba03b6'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 16, 25, 5, 310000, tzinfo=))] [(UUID('88806b21-192d-41e7-a293-c789aad636ba'), 'value is even', 4, datetime.datetime(2025, 4, 30, 16, 26, 5, 310000, tzinfo=))] [(UUID('05837a28-dacf-4121-88a6-a374aefb8a07'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 16, 27, 5, 310000, tzinfo=))] [(UUID('b9c1f7e9-6156-4554-b80e-67d3b5d810bb'), 'value is even', 6, datetime.datetime(2025, 4, 30, 16, 28, 5, 310000, tzinfo=))] [(UUID('4709c7fa-d286-4864-bb48-69748b447157'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 16, 29, 5, 310000, tzinfo=))] [(UUID('30e48457-b103-4fa5-95cf-1c7f0143335b'), 'value is even', 8, datetime.datetime(2025, 4, 30, 16, 30, 5, 310000, tzinfo=))] [(UUID('036b7f4b-bd78-4ffb-a351-964d93f267b7'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 16, 31, 5, 310000, tzinfo=))] ``` ---- #### `fetchnumpy` ##### Signature ```python fetchnumpy(self: duckdb.duckdb.DuckDBPyRelation) -> dict ``` ##### Description Execute and fetch all rows as a Python dict mapping each column to one numpy arrays ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetchnumpy() ``` ##### Result ```text {'id': array([UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'), UUID('e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd'), UUID('3f8ad67a-290f-4a22-b41b-0173b8e45afa'), UUID('9a4e37ef-d8bd-46dd-ab01-51cf4973549f'), UUID('12baa624-ebc9-45ae-b73e-6f4029e31d2d'), UUID('56d41292-53cc-48be-a1b8-e1f5d6ca5581'), UUID('1accca18-c950-47c1-9108-aef8afbd5249'), UUID('56d8db75-72c4-4d40-90d2-a3c840579c37'), UUID('e19f6201-8646-401c-b019-e37c42c39632')], dtype=object), 'description': array(['value is uneven', 'value is even', 'value is uneven', 'value is even', 'value is uneven', 'value is even', 'value is uneven', 'value is even', 'value is uneven'], dtype=object), 'value': array([1, 2, 3, 4, 5, 6, 7, 8, 9]), 'created_timestamp': array(['2025-04-10T09:24:51.259000', '2025-04-10T09:25:51.259000', '2025-04-10T09:26:51.259000', '2025-04-10T09:27:51.259000', '2025-04-10T09:28:51.259000', '2025-04-10T09:29:51.259000', '2025-04-10T09:30:51.259000', '2025-04-10T09:31:51.259000', '2025-04-10T09:32:51.259000'], dtype='datetime64[us]')} ``` ---- #### `fetchone` ##### Signature ```python fetchone(self: duckdb.duckdb.DuckDBPyRelation) -> typing.Optional[tuple] ``` ##### Description Execute and fetch a single row as a tuple >Warning Executing any operation during the retrieval of the data from an [aggregate](#aggregate) relation, >will close the result set. >```python >import duckdb > >duckdb_conn = duckdb.connect() > >rel = duckdb_conn.sql(""" > select > gen_random_uuid() as id, > concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, > range as value, > now() + concat(range,' ', 'minutes')::interval as created_timestamp > from range(1, 10) > """ >) > >agg_rel = rel.aggregate("value") > >while res := agg_rel.fetchone(): > print(res) > rel.show() >``` ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) while res := rel.fetchone(): print(res) ``` ##### Result ```text (UUID('fe036411-f4c7-4f52-9ddd-80cd2bb56613'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 12, 59, 8, 912000, tzinfo=)) (UUID('466c9b43-e9f0-4237-8f26-155f259a5b59'), 'value is even', 2, datetime.datetime(2025, 4, 30, 13, 0, 8, 912000, tzinfo=)) (UUID('5755cf16-a94f-41ef-a16d-21e856d71f9f'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 13, 1, 8, 912000, tzinfo=)) (UUID('05b52c93-bd68-45e1-b02a-a08d682c33d5'), 'value is even', 4, datetime.datetime(2025, 4, 30, 13, 2, 8, 912000, tzinfo=)) (UUID('cf61ef13-2840-4541-900d-f493767d7622'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 13, 3, 8, 912000, tzinfo=)) (UUID('033e7c68-e800-4ee8-9787-6cf50aabc27b'), 'value is even', 6, datetime.datetime(2025, 4, 30, 13, 4, 8, 912000, tzinfo=)) (UUID('8b8d6545-ff54-45d6-b69a-97edb63dfe43'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 13, 5, 8, 912000, tzinfo=)) (UUID('7da79dfe-b29c-462b-a414-9d5e3cc80139'), 'value is even', 8, datetime.datetime(2025, 4, 30, 13, 6, 8, 912000, tzinfo=)) (UUID('f83ffff2-33b9-4f86-9d14-46974b546bab'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 13, 7, 8, 912000, tzinfo=)) ``` ---- #### `pl` ##### Signature ```python pl(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) -> duckdb::PolarsDataFrame ``` ##### Description Execute and fetch all rows as a Polars DataFrame ##### Parameters - **batch_size** : int, default: 1000000 The number of records to be fetched per batch. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.pl(batch_size=1) ``` ##### Result ```text shape: (9, 4) ┌─────────────────────────────────┬─────────────────┬───────┬────────────────────────────────┐ │ id ┆ description ┆ value ┆ created_timestamp │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ i64 ┆ datetime[μs, Europe/Amsterdam] │ ╞═════════════════════════════════╪═════════════════╪═══════╪════════════════════════════════╡ │ b2f92c3c-9372-49f3-897f-2c86fc… ┆ value is uneven ┆ 1 ┆ 2025-04-10 11:49:51.886 CEST │ ``` ---- #### `record_batch` ##### Signature ```python record_batch(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) -> pyarrow.lib.RecordBatchReader ``` ##### Description Execute and return an Arrow Record Batch Reader that yields all rows ##### Parameters - **batch_size** : int, default: 1000000 The batch size for fetching the data. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) pa_batch = rel.record_batch(batch_size=1) pa_batch.read_next_batch() ``` ##### Result ```text pyarrow.RecordBatch id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: ["908cf67c-a086-4b94-9017-2089a83e4a6c"] description: ["value is uneven"] value: [1] created_timestamp: [2025-04-10 09:52:55.249000Z] ``` ---- #### `tf` ##### Signature ```python tf(self: duckdb.duckdb.DuckDBPyRelation) -> dict ``` ##### Description Fetch a result as dict of TensorFlow Tensors ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select("description, value").tf() ``` ##### Result ```text {'description': , 'value': } ``` ---- #### `to_arrow_table` ##### Signature ```python to_arrow_table(self: duckdb.duckdb.DuckDBPyRelation, batch_size: int = 1000000) -> pyarrow.lib.Table ``` ##### Description Execute and fetch all rows as an Arrow Table **Aliases**: [`fetch_arrow_table`](#fetch_arrow_table), [`arrow`](#arrow) ##### Parameters - **batch_size** : int, default: 1000000 The batch size for fetching the data. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_arrow_table() ``` ##### Result ```text pyarrow.Table id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: [["86b2011d-3818-426f-a41e-7cd5c7321f79","07fa4f89-0bba-4049-9acd-c933332a66d5","f2f1479e-f582-4fe4-b82f-9b753b69634c","529d3c63-5961-4adb-b0a8-8249188fc82a","aa9eea7d-7fac-4dcf-8f32-4a0b5d64f864","4852aa32-03f2-40d3-8006-b8213904775a","c0127203-f2e3-4925-9810-655bc02a3c19","2a1356ba-5707-44d6-a492-abd0a67e5efb","800a1c24-231c-4dae-bd68-627654c8a110"]] description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]] value: [[1,2,3,4,5,6,7,8,9]] created_timestamp: [[2025-04-10 09:54:24.015000Z,2025-04-10 09:55:24.015000Z,2025-04-10 09:56:24.015000Z,2025-04-10 09:57:24.015000Z,2025-04-10 09:58:24.015000Z,2025-04-10 09:59:24.015000Z,2025-04-10 10:00:24.015000Z,2025-04-10 10:01:24.015000Z,2025-04-10 10:02:24.015000Z]] ``` ---- #### `to_csv` ##### Signature ```python to_csv(self: duckdb.duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None ``` ##### Description Write the relation object to a CSV file in 'file_name' **Aliases**: [`write_csv`](#write_csv) ##### Parameters - **file_name** : str The name of the output CSV file. - **sep** : str, default: ',' Field delimiter for the output file. - **na_rep** : str, default: '' Missing data representation. - **header** : bool, default: True Whether to write column headers. - **quotechar** : str, default: '"' Character used to quote fields containing special characters. - **escapechar** : str, default: None Character used to escape the delimiter if quoting is set to QUOTE_NONE. - **date_format** : str, default: None Custom format string for DATE values. - **timestamp_format** : str, default: None Custom format string for TIMESTAMP values. - **quoting** : int, default: csv.QUOTE_MINIMAL Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL). - **encoding** : str, default: 'utf-8' Character encoding for the output file. - **compression** : str, default: auto Compression type (e.g., 'gzip', 'bz2', 'zstd'). - **overwrite** : bool, default: False When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `partition_by`. - **per_thread_output** : bool, default: False When `true`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_csv("code_example.csv") ``` ##### Result ```text The data is exported to a CSV file, named code_example.csv ``` ---- #### `to_df` ##### Signature ```python to_df(self: duckdb.duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame ``` ##### Description Execute and fetch all rows as a pandas DataFrame **Aliases**: [`fetchdf`](#fetchdf), [`df`](#df) ##### Parameters - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_df() ``` ##### Result ```text id description value created_timestamp 0 e1f79925-60fd-4ee2-ae67-5eff6b0543d1 value is uneven 1 2025-04-10 11:56:04.452000+02:00 1 caa619d4-d79c-4c00-b82e-9319b086b6f8 value is even 2 2025-04-10 11:57:04.452000+02:00 2 64c68032-99b9-4e8f-b4a3-6c522d5419b3 value is uneven 3 2025-04-10 11:58:04.452000+02:00 ... ``` ---- #### `to_parquet` ##### Signature ```python to_parquet(self: duckdb.duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None) -> None ``` ##### Description Write the relation object to a Parquet file in 'file_name' **Aliases**: [`write_parquet`](#write_parquet) ##### Parameters - **file_name** : str The name of the output Parquet file. - **compression** : str, default: 'snappy' The compression format to use (`uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`). - **field_ids** : STRUCT The field_id for each column. Pass auto to attempt to infer automatically. - **row_group_size_bytes** : int, default: row_group_size * 1024 The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored. - **row_group_size** : int, default: 122880 The target size, i.e., number of rows, of each row group. - **overwrite** : bool, default: False If True, overwrite the file if it exists. - **per_thread_output** : bool, default: False When `True`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. - **append** : bool, default: False When `True`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `partition_by`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_parquet("code_example.parquet") ``` ##### Result ```text The data is exported to a Parquet file, named code_example.parquet ``` ---- #### `to_table` ##### Signature ```python to_table(self: duckdb.duckdb.DuckDBPyRelation, table_name: str) -> None ``` ##### Description Creates a new table named table_name with the contents of the relation object **Aliases**: [`create`](#create) ##### Parameters - **table_name** : str The name of the table to be created. There shouldn't be any other table with the same name. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_table("table_code_example") ``` ##### Result ```text A table, named table_code_example, is created with the data of the relation ``` ---- #### `to_view` ##### Signature ```python to_view(self: duckdb.duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) -> duckdb.duckdb.DuckDBPyRelation ``` ##### Description Creates a view named view_name that refers to the relation object **Aliases**: [`create_view`](#create_view) ##### Parameters - **view_name** : str The name of the view to be created. - **replace** : bool, default: True If the view should be created with `CREATE OR REPLACE`. When set to `False`, there shouldn't be another view with the same `view_name`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_view("view_code_example", replace=True) ``` ##### Result ```text A view, named view_code_example, is created with the query definition of the relation ``` ---- #### `torch` ##### Signature ```python torch(self: duckdb.duckdb.DuckDBPyRelation) -> dict ``` ##### Description Fetch a result as dict of PyTorch Tensors ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select("value").torch() ``` ##### Result ```text {'value': tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])} ``` ---- #### `write_csv` ##### Signature ```python write_csv(self: duckdb.duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None ``` ##### Description Write the relation object to a CSV file in 'file_name' **Aliases**: [`to_csv`](#to_csv) ##### Parameters - **file_name** : str The name of the output CSV file. - **sep** : str, default: ',' Field delimiter for the output file. - **na_rep** : str, default: '' Missing data representation. - **header** : bool, default: True Whether to write column headers. - **quotechar** : str, default: '"' Character used to quote fields containing special characters. - **escapechar** : str, default: None Character used to escape the delimiter if quoting is set to QUOTE_NONE. - **date_format** : str, default: None Custom format string for DATE values. - **timestamp_format** : str, default: None Custom format string for TIMESTAMP values. - **quoting** : int, default: csv.QUOTE_MINIMAL Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL). - **encoding** : str, default: 'utf-8' Character encoding for the output file. - **compression** : str, default: auto Compression type (e.g., 'gzip', 'bz2', 'zstd'). - **overwrite** : bool, default: False When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `partition_by`. - **per_thread_output** : bool, default: False When `true`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.write_csv("code_example.csv") ``` ##### Result ```text The data is exported to a CSV file, named code_example.csv ``` ---- #### `write_parquet` ##### Signature ```python write_parquet(self: duckdb.duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None) -> None ``` ##### Description Write the relation object to a Parquet file in 'file_name' **Aliases**: [`to_parquet`](#to_parquet) ##### Parameters - **file_name** : str The name of the output Parquet file. - **compression** : str, default: 'snappy' The compression format to use (`uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`). - **field_ids** : STRUCT The field_id for each column. Pass auto to attempt to infer automatically. - **row_group_size_bytes** : int, default: row_group_size * 1024 The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored. - **row_group_size** : int, default: 122880 The target size, i.e., number of rows, of each row group. - **overwrite** : bool, default: False If True, overwrite the file if it exists. - **per_thread_output** : bool, default: False When `True`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. - **append** : bool, default: False When `True`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `partition_by`. ##### Example ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.write_parquet("code_example.parquet") ``` ##### Result ```text The data is exported to a Parquet file, named code_example.parquet ```