# I/O Helpers > **Note** > The actively maintained documentation now lives under > {doc}`versions/1.4/io/overview ` and > {doc}`versions/1.4/io/file_append ` in the versioned > Sphinx site. This file remains for quick reference while older links migrate. The `duckplus.io` module provides thin wrappers around DuckDB's file readers that integrate with :class:`duckplus.DuckCon`. Each helper expects an open `DuckCon` context and returns an immutable :class:`duckplus.Relation` that keeps a reference to the managing connection. ```{note} DuckPlus helpers register through direct Python imports. Decorating a function with :func:`duckplus.io.duckcon_helper` attaches it to ``duckplus.duckcon.DuckCon`` when the module imports, so the fluent API stays in sync with the shipped Python objects—no runtime registry lookups or ``**kwargs`` funnels. ``` Because helpers are attached at import time, calling ``manager.read_csv(...)`` or ``manager.apply_helper("read_excel", ...)`` works without importing ``duckplus.io`` explicitly. Every reader exposes its full keyword signature directly in Python so editors and type checkers surface the supported options. The ``duckcon`` manager and ``source`` parameters can also be supplied by keyword to make call sites more descriptive when desired. ```python from pathlib import Path from duckplus import DuckCon manager = DuckCon() with manager: relation = manager.read_csv(Path("data.csv")) print(relation.columns) ``` Because the helpers register automatically, persisting results is equally straightforward via the relation-level file writers: ```python with manager: relation = manager.read_parquet(Path("data.parquet")) relation.append_csv(Path("report.csv")) relation.write_parquet_dataset( Path("dataset"), partition_column="country", ) ``` ## CSV ``` io.read_csv( duckcon, source, *, header=None, delimiter=",", quotechar='"', escapechar=None, sample_size=None, auto_detect=None, columns=None, dtype=None, names=None, na_values=None, null_padding=None, force_not_null=None, files_to_sniff=None, decimal=None, date_format=None, timestamp_format=None, encoding=None, compression=None, hive_types_autocast=None, all_varchar=None, hive_partitioning=None, comment=None, max_line_size=None, store_rejects=None, rejects_table=None, rejects_limit=None, rejects_scan=None, union_by_name=None, filename=None, normalize_names=None, ignore_errors=None, allow_quoted_nulls=None, auto_type_candidates=None, parallel=None, skiprows=None, ) ``` * **source** – ``pathlib.Path`` instance or sequence of ``Path`` objects pointing at CSV files or buffers. ```python manager = DuckCon() with manager: relation = io.read_csv( manager, "data.csv", header=True, delimiter=",", na_values=["NA", ""], ) ``` All arguments are keyword-only apart from the connection and source, which may also be passed by keyword. The explicit signature keeps IDE completions in sync with the implementation and provides quick access to aliases such as ``delim``/``delimiter``. * **source** – String or `PathLike` pointing at a CSV file or buffer. * **header** – Treat the first row as column names. * **delimiter** – Single-character field separator (aliases: `delim`). * **quotechar** / **escapechar** – Configure quoting behaviour (aliases: `quote`, `escape`). * **columns** / **dtype** – Optional mappings describing column types. * **names** / **na_values** – Override column names or provide strings considered nulls. * **filename** – Append the originating filename (as an absolute path) as an additional column when `True`. * Remaining keywords mirror DuckDB's `read_csv` table function and are forwarded explicitly so IDEs surface the available options. Aliases raise a descriptive `ValueError` when conflicting values are supplied. **Example** ```python relation = io.read_csv( manager, Path("transactions.csv"), delimiter="|", header=True, na_values=["NA", ""], filename=True, ) ``` Passing the alias keywords (`delim`, `quote`, `escape`, etc.) behaves identically to their primary counterparts so long as only one spelling is used for each option. Attempting to provide both will raise an error, making it obvious when a typo slipped through autocomplete suggestions. ## Parquet ``` io.read_parquet( duckcon, source, *, binary_as_string=None, file_row_number=None, filename=None, hive_partitioning=None, union_by_name=None, compression=None, directory=False, partition_id_column=None, partition_glob="*.parquet", ) ``` These options map directly to DuckDB's [`read_parquet`](https://duckdb.org/docs/data/parquet) table function. The ``source`` may be a single ``Path`` or a sequence of ``Path`` objects/globs. Enabling ``directory=True`` treats ``source`` as a folder and loads every file matching ``partition_glob`` (``"*.parquet"`` by default). Only explicitly provided keyword arguments are forwarded so callers can rely on IDE completions. **Example** ```python relation = io.read_parquet( manager, [Path("/data/sales_2024.parquet"), Path("/data/sales_2025.parquet")], filename=True, union_by_name=True, ) ``` Because the signature is spelled out explicitly, editors can suggest the available options—such as `binary_as_string` for forcing binary columns into `VARCHAR`—without resorting to `**kwargs` guesswork. table function. The ``source`` may be a single path or a sequence of paths/globs, and explicit keyword-only arguments ensure unexpected options are surfaced at call time instead of being silently ignored. The positional ``duckcon`` and ``source`` parameters can similarly be passed by keyword if preferred. Only explicitly provided keyword arguments are forwarded so callers can rely on IDE completions. ```python manager = DuckCon() with manager: relation = io.read_parquet( manager, ["part-*.parquet"], union_by_name=True, filename=True, ) ``` * **binary_as_string** – Cast DuckDB's binary columns to strings when `True`. * **file_row_number** – Include a running row number for each file when set. * **filename** – Append the source filename (absolute path) when enabled. * **hive_partitioning** – Interpret partitioned directory structures. * **union_by_name** – Align schemas with non-matching column order. * **compression** – Override DuckDB's decompression behaviour. * **directory** – When `True`, scan a directory instead of a single file and apply ``partition_glob`` to collect matching inputs. * **partition_id_column** – Add a derived column populated with each file's stem. The helper automatically enables ``filename=True`` so provenance remains accessible alongside the partition key. * **partition_glob** – Glob (or sequence of globs) used to select files when ``directory=True``. ```python manager = DuckCon() with manager: relation = io.read_parquet( manager, Path("/data/events"), directory=True, partition_id_column="batch_id", ) assert set(relation.relation.project("batch_id").distinct().fetchall()) == { ("2024_01",), ("2024_02",), } ``` ## JSON ``` io.read_json( duckcon, source, *, columns=None, sample_size=None, maximum_depth=None, records=None, format=None, date_format=None, timestamp_format=None, compression=None, maximum_object_size=None, ignore_errors=None, convert_strings_to_integers=None, field_appearance_threshold=None, map_inference_threshold=None, maximum_sample_files=None, filename=None, hive_partitioning=None, union_by_name=None, hive_types=None, hive_types_autocast=None, ) ``` JSON helpers are flexible enough to cover both JSON Lines and nested JSON inputs. ``source`` accepts either a single ``Path`` or any ``collections.abc.Sequence`` of ``Path`` objects, mirroring DuckDB's flexibility. ``columns`` accepts the same mappings and sequences that DuckDB's ``read_json`` table function understands and is normalised to built-in containers before forwarding. As with the CSV and Parquet helpers, only provided keyword arguments are forwarded to DuckDB to avoid masking typos. Passing ``duckcon``/``source`` by keyword is also supported so pipelines can highlight the data source being loaded inline. ```python manager = DuckCon() with manager: relation = io.read_json( manager, "events.json", columns={"payload": "STRUCT"}, records=True, maximum_depth=5, ) ``` * **records** – Control whether the input is JSON Lines (`True`) or arrays/objects. * **format** – Explicitly declare a format variant DuckDB should expect. * **convert_strings_to_integers** – Enable automatic integer coercion. * **maximum_* options** – Tune sampling depth, object size, and file counts. * **hive_types** / **hive_types_autocast** – Normalise Hive-partitioned datasets. **Example** ```python relation = io.read_json( manager, Path("events.ndjson"), records=True, maximum_depth=4, hive_types_autocast=True, ) ``` The helper surfaces the full keyword list so features like `convert_strings_to_integers` are discoverable directly from IDE tooltips. ## File appenders Previous releases exposed table-oriented append helpers under :mod:`duckplus.io`. Those helpers have been replaced by :meth:`Relation.append_csv` and :meth:`Relation.append_parquet`, which treat appends as direct file operations. See :doc:`relation` for usage examples covering mutation behaviour, duplicate avoidance, and Parquet's temporary rewrite workflow.