# File and table readers DuckPlus mirrors DuckDB's file readers while integrating with :class:`~duckplus.duckcon.DuckCon`. Each helper expects an open manager and returns an immutable :class:`~duckplus.relation.Relation` with cached metadata. The functions—covering file readers and extension-backed connectors—live in :mod:`duckplus.io` and register automatically on every ``DuckCon`` instance, so you can call them directly from the connection manager without importing the module. They intentionally avoid ``**kwargs`` so editor completions surface every DuckDB option. ```python from pathlib import Path from duckplus import DuckCon manager = DuckCon() with manager: relation = manager.read_csv(Path("data.csv"), header=True) print(relation.columns) ``` Because the helpers register automatically, persisting results is just as easy when chaining to the relation-level writers: ```python with manager: relation = manager.read_parquet(Path("data.parquet")) relation.append_csv(Path("report.csv")) relation.write_parquet_dataset( Path("dataset"), partition_column="country", ) ``` ## CSV reader :meth:`duckplus.io.read_csv` exposes DuckDB's table-function keywords without using ``**kwargs`` so IDEs surface every option. Aliases such as ``delim`` and ``quote`` match DuckDB's own names, and DuckPlus raises a descriptive error when both the canonical and alias form are supplied for the same argument. Key options include: - ``columns`` and ``dtype`` for explicit column typing. - ``names`` and ``na_values`` to override column names and null sentinels. - ``filename=True`` to append the absolute path of each input file. ```python with manager: relation = manager.read_csv( Path("transactions.csv"), delimiter="|", header=True, na_values=["NA", ""], filename=True, ) ``` Pass ``lazy=True`` to stream large CSVs lazily—DuckPlus will propagate the parameter to DuckDB, allowing you to chain transformations before triggering materialisation. ## Parquet reader :meth:`duckplus.io.read_parquet` mirrors DuckDB's keyword arguments, including ``union_by_name``, ``filename``, and ``hive_partitioning``. Passing a directory with ``directory=True`` loads all ``*.parquet`` files by default. ```python with manager: relation = manager.read_parquet( [ Path("/data/sales_2024.parquet"), Path("/data/sales_2025.parquet"), ], union_by_name=True, filename=True, ) ``` ## JSON, Arrow, and database connectors The IO module extends to DuckDB's JSON readers, Arrow integration, and community-extension backed connectors like Excel and nano-ODBC. Each helper keeps parameters explicit so scripts remain self-documenting. Highlights include: - :func:`duckplus.io.read_json` for line-delimited JSON, with ``maximum_depth`` and ``format`` options mirroring DuckDB's table function. - :func:`duckplus.io.read_arrow` for zero-copy reads from Arrow datasets or ``pyarrow.dataset.Dataset`` objects. - :func:`duckplus.io.read_odbc_query` and :func:`duckplus.io.read_odbc_table` for nano-ODBC queries and scans. - :func:`duckplus.io.read_excel` for Excel workbooks, which will automatically install the ``excel`` extension when missing. Consult the docstrings in :mod:`duckplus.io` for the full argument lists. When an extension is required, DuckPlus will attempt to install it automatically or raise an actionable message if the environment is offline. Call ``manager.apply_helper("read_csv", ...)`` to route through the bound helper directly, or pass ``overwrite=True`` to :meth:`DuckCon.register_helper ` if you need to replace the defaults with a custom implementation. ## Composing with custom helpers :class:`DuckCon ` exposes :meth:`~duckplus.duckcon.DuckCon.register_helper` and :meth:`~duckplus.duckcon.DuckCon.apply_helper` so you can wrap bespoke data sources. Register a callable that accepts the open connection, then return a DuckPlus relation to remain within the immutable flow: ```python def read_yaml(connection, path): return connection.sql("SELECT * FROM read_json(? ::VARCHAR)", [str(path)]) manager.register_helper("read_yaml", read_yaml) with manager: relation = manager.apply_helper("read_yaml", Path("data.yaml")) ``` The returned relation captures metadata like any built-in reader, so downstream validation and schema utilities continue to work.