# Relation helpers > **Note** > The up-to-date 1.4 documentation lives under > {doc}`versions/1.4/core/relations ` in the Sphinx > site. This legacy Markdown file is retained for historical reference. The `duckplus.relation.Relation` wrapper keeps the lightweight DuckDB relation API ergonomic while enforcing immutability. Column helpers like [`Relation.add`](../duckplus/relation.py) let callers extend a relation with new expressions without mutating the original instance. ## Working with ODBC data sources DuckPlus wraps the DuckDB nano-ODBC community extension so relations can be created directly from external databases. Pass ``extra_extensions=("nanodbc",)`` when constructing [:class:`~duckplus.duckcon.DuckCon`] to ensure the extension is installed and loaded as the connection opens. DuckDB installs community extensions per user profile; the first load may download the bundle, and offline environments should pre-install it via the DuckDB CLI or the ``duckdb-extensions`` package before creating the connection. ```python from duckplus import DuckCon, Relation manager = DuckCon(extra_extensions=("nanodbc",)) with manager: # nano-ODBC is ready once the connection is open. sales = Relation.from_odbc_query( manager, "Driver={DuckDB};Database=/data/sales.duckdb", "SELECT * FROM annual_sales", ) customers = Relation.from_odbc_table( manager, "Driver={SQLite3};Database=/data/crm.sqlite", "customers", ) ``` Both helpers validate that the managed connection is open and raise a helpful ``RuntimeError`` when the extension has not been loaded yet, recommending the ``extra_extensions`` workflow. Queries can supply parameter bindings via the ``parameters`` keyword, e.g. ``Relation.from_odbc_query(..., parameters=[2024])`` to avoid string interpolation. Call ``manager.extensions()`` inside the context manager to audit which extensions are installed, their versions, and whether they are currently loaded in the session. The information is sourced from DuckDB's ``duckdb_extensions()`` table function and reflects the machine-level installation DuckDB maintains for community bundles. DuckPlus' own integration tests rely on the environment variables ``DUCKPLUS_TEST_ODBC_CONNECTION``, ``DUCKPLUS_TEST_ODBC_QUERY``, and ``DUCKPLUS_TEST_ODBC_TABLE`` to point at an accessible data source. Configure these variables locally to exercise the helpers end-to-end; otherwise the tests will be skipped automatically in offline environments. ## Loading Excel workbooks DuckDB's Excel community extension exposes a ``read_excel`` table function which DuckPlus now wraps via :meth:`Relation.from_excel`. Either request the extension when constructing :class:`DuckCon` or let the helper load it on demand: ```python from duckplus import DuckCon, Relation manager = DuckCon(extra_extensions=("excel",)) with manager: sales = Relation.from_excel( manager, "workbooks/sales.xlsx", sheet="Q1", header=True, names=("region", "amount"), dtype={"amount": "INTEGER"}, ) ``` If the extension has not been installed yet, DuckPlus attempts to install it via DuckDB's ``INSTALL excel`` command and surfaces a helpful error message when the machine is offline. The helper accepts the most common ``read_excel`` keyword arguments—``sheet``, ``header``, ``skip``, ``limit``, ``names``, ``dtype``, and ``all_varchar``—while keeping the underlying relation immutable. ## Adding computed columns `Relation.add` accepts typed expressions from :mod:`duckplus.static_typed` that have been aliased with :meth:`~duckplus.static_typed.expressions.base.TypedExpression.alias`. The positional helpers avoid repeating column names while preserving dependency metadata so DuckPlus can validate that references only target columns already present on the relation. Keyword arguments remain supported for callers that prefer explicit mappings. ```python from duckplus import DuckCon, Relation, ducktype manager = DuckCon() with manager as connection: base = Relation.from_relation( manager, connection.sql("SELECT 3::INTEGER AS value, 5::INTEGER AS other"), ) value = ducktype.Numeric("value") other = ducktype.Numeric("other") extended = base.add( (value + other).alias("total"), (value - other).alias("delta"), ) assert extended.columns == ("value", "other", "total", "delta") assert extended.relation.fetchall() == [(3, 5, 8, -2)] ``` If an expression references a column that does not exist on the original relation—for example, referencing `total` while defining `delta`—DuckPlus raises a `ValueError` explaining that the expression references unknown columns. This mirrors the validation performed for other column helpers and keeps column creation deterministic. ## Projecting columns with a fluent builder `Relation.select` exposes a builder that assembles complex projection lists before applying them to the underlying relation. Call :meth:`~duckplus.relation._SelectBuilder.column` for each expression to include and finish with :meth:`~duckplus.relation._SelectBuilder.from_` to materialise the projection. Typed expressions participate in dependency validation just like other helpers, while `if_exists=True` columns mirror the optional behaviour of `keep_if_exists` and friends. ```python from duckplus import DuckCon, Relation, ducktype manager = DuckCon() with manager as connection: base = Relation.from_relation( manager, connection.sql( "SELECT * FROM (VALUES" " ('a'::VARCHAR, 1::INTEGER, 2::INTEGER)," " ('b'::VARCHAR, 3::INTEGER, 4::INTEGER)" ") AS data(category, amount, fallback)" ), ) projected = ( base.select() .column("category") .column(ducktype.Numeric("amount").alias("primary_amount")) .column( ducktype.Numeric("fallback").alias("fallback_amount"), if_exists=True, ) .from_() ) assert projected.columns == ("category", "primary_amount", "fallback_amount") ``` Each fluent call returns a new builder, so it is safe to branch a projection by capturing the result of ``select().column(...)`` before adding additional columns. The original builder remains unchanged, making it easy to experiment with alternative projections without rebuilding the shared prefix. Optional ``if_exists`` columns and ``replace_if_exists`` clauses still validate their typed-expression dependencies up front and simply skip rendering when the required columns are absent. Builders also support `*` projections with optional `REPLACE`/`EXCLUDE` modifiers. DuckPlus validates dependencies for typed replacements before issuing the query so missing or ambiguous columns surface as descriptive errors. ## Grouping and aggregating data `Relation.aggregate` returns a transient builder that groups rows and computes named aggregate expressions. Aggregations must be provided as typed expressions from :mod:`duckplus` (which re-exports the static typed expression helpers), and optional filter predicates limit the input rows before aggregation. Call :meth:`~duckplus.relation._AggregateBuilder.by` to specify explicit grouping columns or :meth:`~duckplus.relation._AggregateBuilder.all` to group by the non-aggregate expressions passed positionally. Grouping columns are validated against the original relation so typos surface immediately. ```python from duckplus import DuckCon, Relation from duckplus import ducktype manager = DuckCon() with manager as connection: base = Relation.from_relation( manager, connection.sql( "SELECT * FROM (VALUES" " ('a'::VARCHAR, 1::INTEGER)," " ('a'::VARCHAR, 2::INTEGER)," " ('b'::VARCHAR, 3::INTEGER) ) AS data(category, amount)" ), ) total_sales = ducktype.Numeric("amount").sum().alias("total_sales") average_sale = ducktype.Numeric("amount").avg().alias("average_sale") summary = ( base.aggregate() .start_agg() .component("amount > 1") .agg(total_sales) .agg(average_sale) .having(ducktype.Numeric("amount").avg() > 2) .by(ducktype.Varchar("category")) ) assert summary.columns == ("category", "total_sales", "average_sale") assert summary.relation.fetchall() == [("b", 3, 3.0)] ``` The helper inspects each component to determine where it belongs: * ``component`` accepts strings and typed expressions. Strings and non-aggregate boolean expressions apply pre-aggregation filters. * Strings containing aggregate functions automatically become ``HAVING`` clauses and rewrite to reference the projected aliases, even when the input omits identifier quoting or uses different casing. * Non-boolean expressions without aggregate functions become additional grouping expressions that ``.all()`` will include automatically. * ``agg`` registers aggregate expressions. Each must define an alias (either via :meth:`alias` on the typed expression or the ``alias=`` keyword) and appears after the grouping columns in the result. * Boolean expressions containing aggregates passed to ``component`` are treated as ``HAVING`` clauses and rewritten to reference the projected aggregate aliases. You can also add them explicitly via :meth:`~duckplus.relation._AggregateBuilder.having` before materialising the query. If a filter, aggregate, or having clause references an unknown column, DuckPlus raises a descriptive error before executing the query, keeping failures easy to diagnose. Like the select helper, every call to ``component`` or ``agg`` produces a new builder instance. You can branch an aggregation pipeline by capturing the resulting builder before adding more expressions, and the original instance stays usable for alternative groupings. DuckPlus continues to validate dependencies for optional filters and typed expressions before the query reaches DuckDB. ## Filtering rows `Relation.filter` applies one or more conditions to a relation while keeping the original untouched. Conditions can mix raw SQL snippets with typed boolean expressions, and each clause is validated against the source relation's column metadata before DuckDB executes the query. ```python from duckplus import DuckCon, Relation from duckplus import ducktype manager = DuckCon() with manager as connection: base = Relation.from_relation( manager, connection.sql( "SELECT * FROM (VALUES", " ('a'::VARCHAR, 1::INTEGER),", " ('a'::VARCHAR, 2::INTEGER),", " ('b'::VARCHAR, 3::INTEGER)", ") AS data(category, amount)", ), ) filtered = base.filter( "amount > 1", ducktype.Varchar("category") == "b", ) assert filtered.columns == ("category", "amount") assert filtered.relation.fetchall() == [("b", 3)] ``` Like aggregation filters, blank conditions or references to unknown columns raise descriptive errors so mistakes surface quickly. ## Sampling relation data for notebooks Notebook workflows often need a quick preview of relation data. DuckPlus now offers dedicated sampling helpers that stream results into popular dataframe libraries without mutating the original relation. ```python from duckplus import DuckCon, Relation manager = DuckCon() with manager as connection: source = Relation.from_relation( manager, connection.sql( "SELECT * FROM (VALUES" " (1::INTEGER, 'alpha'::VARCHAR)," " (2::INTEGER, 'beta'::VARCHAR)" ") AS data(id, label)" ), ) preview = source.sample_pandas(limit=10) for arrow_batch in source.iter_arrow_batches(batch_size=100): ... # stream into analytics tooling for polars_frame in source.iter_polars_batches(batch_size=1_000): ... # process in polars without loading the full relation ``` `sample_pandas`, `sample_arrow`, and `sample_polars` return a single object containing up to ``limit`` rows (defaulting to 50), while the corresponding ``iter_*_batches`` helpers yield generators that respect the requested ``batch_size``. Each helper validates that the managed connection remains open and surfaces informative ``ModuleNotFoundError`` messages when optional dependencies such as pandas, PyArrow, or Polars have not been installed yet. ## Joining relations `Relation.join`, `left_join`, `right_join`, `outer_join`, and `semi_join` wrap DuckDB's join operations while enforcing deterministic column ordering. DuckPlus automatically joins on every column shared between the two relations (matching names case-insensitively) and prefers the left relation's values when duplicates exist. Additional join pairs can be supplied with the `on` keyword argument, mapping left column names to right column names for equality comparisons. ```python from duckplus import DuckCon, Relation manager = DuckCon() with manager as connection: customers = Relation.from_relation( manager, connection.sql( """ SELECT * FROM (VALUES (1::INTEGER, 'north'::VARCHAR), (2::INTEGER, 'south'::VARCHAR) ) AS data(customer_id, region) """ ), ) orders = Relation.from_relation( manager, connection.sql( """ SELECT * FROM (VALUES ('north'::VARCHAR, 1::INTEGER, 500::INTEGER), ('south'::VARCHAR, 2::INTEGER, 700::INTEGER) ) AS data(region, order_customer_id, total) """ ), ) joined = customers.join(orders, on={"customer_id": "order_customer_id"}) assert joined.columns == ( "customer_id", "region", "order_customer_id", "total", ) assert joined.relation.order("customer_id").fetchall() == [ (1, "north", 1, 500), (2, "south", 2, 700), ] ``` When no additional columns are provided, DuckPlus joins solely on shared column names. Helpers for other join flavours work identically: `left_join` retains unmatched rows from the left relation while filling right-side columns with `NULL`, and `semi_join` filters rows using the join keys but keeps only the left relation's columns. Attempting to join relations originating from different `DuckCon` instances or referencing unknown columns raises clear errors so callers can rename inputs before running the query. ### As-of joins `Relation.asof_join` aligns rows by the most recent ordering value from the right-hand relation that does not violate the requested direction (backward by default) or tolerance. The helper automatically reuses the case-insensitive column matching logic from `Relation.join`, but requires callers to supply the ordering columns explicitly. Ordering operands can be supplied as column names or typed expressions; when using typed expressions, qualify column references with the synthetic aliases `left` and `right` that DuckPlus assigns to the underlying relations during query generation. ```python from duckplus import DuckCon, Relation from duckplus import ducktype manager = DuckCon() with manager as connection: trades = Relation.from_relation( manager, connection.sql( """ SELECT * FROM (VALUES (1::INTEGER, 10::INTEGER), (1::INTEGER, 35::INTEGER) ) AS data(symbol, event_ts) """, ), ) quotes = Relation.from_relation( manager, connection.sql( """ SELECT * FROM (VALUES (1::INTEGER, 5::INTEGER, 100::INTEGER), (1::INTEGER, 30::INTEGER, 110::INTEGER) ) AS data(symbol, quote_ts, price) """, ), ) joined = trades.asof_join( quotes, on={"symbol": "symbol"}, order=("event_ts", "quote_ts"), tolerance=ducktype.Numeric.literal(15), ) assert joined.relation.fetchall() == [ (1, 10, 5, 100), (1, 35, 30, 110), ] ``` Per-row tolerances can be provided via typed expressions as well: ```python joined = trades.asof_join( quotes, on={"symbol": "symbol"}, order=( ducktype.Numeric.coerce(("left", "event_ts")), ducktype.Numeric.coerce(("right", "quote_ts")), ), tolerance=ducktype.Numeric.coerce(("left", "max_gap")), ) ``` The helper validates that all referenced columns exist on the participating relations and raises informative errors when aliases or dependencies do not resolve. In the per-row tolerance example above, `trades` exposes a `max_gap` column that controls how far each row may look back when selecting a matching quote. ## Profiling relation data Exploratory analysis often benefits from quick metadata about the result set before diving into more involved transformations. DuckPlus provides :meth:`Relation.row_count` and :meth:`Relation.null_ratios` so callers can profile relations without leaving Python: ```python from duckplus import DuckCon, Relation manager = DuckCon() with manager as connection: relation = Relation.from_relation( manager, connection.sql( """ SELECT * FROM (VALUES (1::INTEGER, 'north'::VARCHAR, NULL::VARCHAR), (2::INTEGER, 'south'::VARCHAR, 'prime'::VARCHAR), (NULL::INTEGER, 'east'::VARCHAR, NULL::VARCHAR) ) AS data(id, region, segment) """.strip(), ), ) rows = relation.row_count() nulls = relation.null_ratios() print(rows) print(nulls) ``` ``Relation.null_ratios`` returns a dictionary keyed by the relation's column names with floating-point ratios between ``0.0`` and ``1.0``. Empty relations produce zero ratios while relations without columns return an empty dictionary. In the example above the helper reports ``3`` rows and a mapping such as ``{"id": 0.3333333333, "region": 0.0, "segment": 0.6666666667}``. ## Writing relations into tables Once a relation contains the desired result set, call `DuckCon.table().insert(...)` to materialise it into a DuckDB table while keeping connection management consistent with other helpers. The wrapper reuses the same validation as the file appenders so callers can opt into table creation, overwriting, or inserting into a subset of columns when defaults should populate the remainder. ```python from duckplus import DuckCon, Relation manager = DuckCon() with manager as connection: # Build a relation using the fluent helpers. relation = Relation.from_sql( manager, "SELECT * FROM (VALUES (1::INTEGER, 'north'::VARCHAR)) AS data(id, region)", ) # Persist the relation into a table, creating it on first run. manager.table("analytics.customers").insert(relation, create=True, overwrite=True) rows = connection.sql("SELECT * FROM analytics.customers").fetchall() assert rows == [(1, "north")] ``` Attempting to persist a relation created from another `DuckCon` instance raises an informative error so call sites keep connection ownership explicit. When `create=True`, the helper requires inserting every column to keep table schemas deterministic; during append operations, specify `target_columns` to rely on table defaults for any omitted values. ### Materialising intermediate results DuckPlus also exposes :meth:`Relation.materialize`, which writes the relation to a DuckDB table and returns a relation pointing at the materialised data. The helper creates a temporary table by default so intermediate results clean up automatically when the connection closes. Provide a name to control the table identifier or set ``temporary=False`` to persist the rows. ```python from duckplus import DuckCon, Relation manager = DuckCon() with manager as connection: relation = Relation.from_sql( manager, "SELECT 42::INTEGER AS value", ) materialized = relation.materialize(name="temp_values") print(materialized.relation.fetchall()) # Temporary tables disappear when the connection closes. with manager as connection: connection.sql("SELECT * FROM temp_values") # Raises duckdb.CatalogException ``` ## Appending CSV and Parquet files DuckPlus now treats file appends as IO operations performed directly by a relation. Use :meth:`Relation.append_csv` to extend a CSV file with the relation content while keeping the helper intentionally lightweight: ```python from pathlib import Path from duckplus import DuckCon, Relation manager = DuckCon() with manager: new_rows = Relation.from_sql( manager, "SELECT * FROM (VALUES (1, 'north'), (2, 'south')) AS data(id, region)", ) # The CSV header is written automatically when the file does not exist. new_rows.append_csv(Path("data.csv")) ``` Existing files can be deduplicated before writing by either matching on a unique identifier column (or columns) or by checking every column for a duplicate match. Set ``unique_id_column="id"`` to perform an anti join on that column, or pass ``match_all_columns=True`` to only append rows that differ from all existing values. The helper returns a relation representing the rows that would be appended, which makes it easy to inspect the anti-joined results when ``mutate=False``. Parquet appends follow the same interface via :meth:`Relation.append_parquet`, rewriting the target file through a temporary Parquet file whenever ``mutate=True``: ```python manager = DuckCon() with manager: relation = Relation.from_sql( manager, "SELECT * FROM (VALUES (1, 'north'), (2, 'south')) AS data(id, region)", ) # Rewrites data.parquet using a temporary file to keep Parquet immutable. relation.append_parquet(Path("data.parquet"), unique_id_column="id", mutate=True) ``` Because Parquet files are immutable, ``mutate`` defaults to ``False`` so callers opt into the rewrite explicitly. CSV appends default to ``mutate=True`` since they can extend the file without rebuilding existing rows. In both cases the helpers reject directory targets, ensuring folder-level relations are not used accidentally. DuckPlus keeps column comparisons case-insensitive and raises clear errors when a requested identifier cannot be found on the relation or the existing file. ## Writing partitioned Parquet datasets When working with immutable file stores it is often preferable to maintain one Parquet file per logical partition. :meth:`Relation.write_parquet_dataset` persists a relation into a directory of Parquet files using a column to derive the partition key. The helper creates the directory when needed and accepts a filename template so pipelines can keep the existing naming convention: ```python target = Path("/data/events") relation.write_parquet_dataset( target, partition_column="partition_key", filename_template="{partition}.parquet", ) ``` By default the helper overwrites the Parquet file associated with each partition. Provide ``partition_actions`` to mix append and overwrite behaviour within the same batch. In the following example new rows for the ``north`` partition are appended while ``west`` is rebuilt from scratch: ```python relation.write_parquet_dataset( target, partition_column="partition_key", partition_actions={"north": "append", "west": "overwrite"}, ) ``` Set ``immutable=True`` to enforce append-only semantics. Existing partition files trigger an error so ingestion jobs can guarantee that each run only adds new partitions: ```python relation.write_parquet_dataset( target, partition_column="partition_key", immutable=True, ) ``` All strategies require the underlying connection to remain open. The helper relies on DuckDB's Parquet writer so compression and existing schema inference behaviour mirror DuckDB's native commands.