--- layout: docu title: Conversion between DuckDB and Python --- This page documents the rules for converting [Python objects to DuckDB](#object-conversion-python-object-to-duckdb) and [DuckDB results to Python](#result-conversion-duckdb-results-to-python). ## Object Conversion: Python Object to DuckDB This is a mapping of Python object types to DuckDB [Logical Types]({% link docs/1.3/sql/data_types/overview.md %}): * `None` → `NULL` * `bool` → `BOOLEAN` * `datetime.timedelta` → `INTERVAL` * `str` → `VARCHAR` * `bytearray` → `BLOB` * `memoryview` → `BLOB` * `decimal.Decimal` → `DECIMAL` / `DOUBLE` * `uuid.UUID` → `UUID` The rest of the conversion rules are as follows. ### `int` Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints. Instead we perform these casts in order until one succeeds: * `BIGINT` * `INTEGER` * `UBIGINT` * `UINTEGER` * `DOUBLE` When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion. ### `float` These casts are tried in order until one succeeds: * `DOUBLE` * `FLOAT` ### `datetime.datetime` For `datetime` we will check `pandas.isnull` if it's available and return `NULL` if it returns `true`. We check against `datetime.datetime.min` and `datetime.datetime.max` to convert to `-inf` and `+inf` respectively. If the `datetime` has tzinfo, we will use `TIMESTAMPTZ`, otherwise it becomes `TIMESTAMP`. ### `datetime.time` If the `time` has tzinfo, we will use `TIMETZ`, otherwise it becomes `TIME`. ### `datetime.date` `date` converts to the `DATE` type. We check against `datetime.date.min` and `datetime.date.max` to convert to `-inf` and `+inf` respectively. ### `bytes` `bytes` converts to `BLOB` by default, when it's used to construct a Value object of type `BITSTRING`, it maps to `BITSTRING` instead. ### `list` `list` becomes a `LIST` type of the “most permissive” type of its children, for example: ```python my_list_value = [ 12345, "test" ] ``` Will become `VARCHAR[]` because 12345 can convert to `VARCHAR` but `test` can not convert to `INTEGER`. ```sql [12345, test] ``` ### `dict` The `dict` object can convert to either `STRUCT(...)` or `MAP(..., ...)` depending on its structure. If the dict has a structure similar to: ```python import duckdb my_map_dict = { "key": [ 1, 2, 3 ], "value": [ "one", "two", "three" ] } duckdb.values(my_map_dict) ``` Then we'll convert it to a `MAP` of key-value pairs of the two lists zipped together. The example above becomes a `MAP(INTEGER, VARCHAR)`: ```text ┌─────────────────────────┐ │ {1=one, 2=two, 3=three} │ │ map(integer, varchar) │ ├─────────────────────────┤ │ {1=one, 2=two, 3=three} │ └─────────────────────────┘ ``` If the dict is returned by a [function]({% link docs/1.3/clients/python/function.md %}), the function will return a `MAP`, therefore the function `return_type` has to be specified. Providing a return type which cannot convert to `MAP` will raise an error: ```python import duckdb duckdb_conn = duckdb.connect() def get_map() -> dict[str,list[str]|list[int]]: return { "key": [ 1, 2, 3 ], "value": [ "one", "two", "three" ] } duckdb_conn.create_function("get_map", get_map, return_type=dict[int, str]) duckdb_conn.sql("select get_map()").show() duckdb_conn.create_function("get_map_error", get_map) duckdb_conn.sql("select get_map_error()").show() ``` ```text ┌─────────────────────────┐ │ get_map() │ │ map(bigint, varchar) │ ├─────────────────────────┤ │ {1=one, 2=two, 3=three} │ └─────────────────────────┘ ConversionException: Conversion Error: Type VARCHAR can't be cast as UNION(u1 VARCHAR[], u2 BIGINT[]). VARCHAR can't be implicitly cast to any of the union member types: VARCHAR[], BIGINT[] ``` > The names of the fields matter and the two lists need to have the same size. Otherwise we'll try to convert it to a `STRUCT`. ```python import duckdb my_struct_dict = { 1: "one", "2": 2, "three": [1, 2, 3], False: True } duckdb.values(my_struct_dict) ``` Becomes: ```text ┌────────────────────────────────────────────────────────────────────┐ │ {'1': 'one', '2': 2, 'three': [1, 2, 3], 'False': true} │ │ struct("1" varchar, "2" integer, three integer[], "false" boolean) │ ├────────────────────────────────────────────────────────────────────┤ │ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true} │ └────────────────────────────────────────────────────────────────────┘ ``` If the dict is returned by a [function]({% link docs/1.3/clients/python/function.md %}), the function will return a `MAP`, due to [automatic conversion]({% link docs/1.3/clients/python/types.md %}#dictkey_type-value_type). To return a `STRUCT`, the `return_type` has to be provided: ```python import duckdb from duckdb.typing import BOOLEAN, INTEGER, VARCHAR from duckdb import list_type, struct_type duckdb_conn = duckdb.connect() my_struct_dict = { 1: "one", "2": 2, "three": [1, 2, 3], False: True } def get_struct() -> dict[str|int|bool,str|int|list[int]|bool]: return my_struct_dict duckdb_conn.create_function("get_struct_as_map", get_struct) duckdb_conn.sql("select get_struct_as_map()").show() duckdb_conn.create_function("get_struct", get_struct, return_type=struct_type({ 1: VARCHAR, "2": INTEGER, "three": list_type(duckdb.typing.INTEGER), False: BOOLEAN })) duckdb_conn.sql("select get_struct()").show() ``` ```text ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ get_struct_as_map() │ │ map(union(u1 varchar, u2 bigint, u3 boolean), union(u1 varchar, u2 bigint, u3 bigint[], u4 boolean)) │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ {1=one, 2=2, three=[1, 2, 3], false=true} │ └──────────────────────────────────────────────────────────────────────────────────────────────────────┘ ┌────────────────────────────────────────────────────────────────────┐ │ get_struct() │ │ struct("1" varchar, "2" integer, three integer[], "false" boolean) │ ├────────────────────────────────────────────────────────────────────┤ │ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true} │ └────────────────────────────────────────────────────────────────────┘ ``` > Every `key` of the dictionary is converted to string. ### `tuple` `tuple` converts to `LIST` by default, when it's used to construct a Value object of type `STRUCT` it will convert to `STRUCT` instead. ### `numpy.ndarray` and `numpy.datetime64` `ndarray` and `datetime64` are converted by calling `tolist()` and converting the result of that. ## Result Conversion: DuckDB Results to Python DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data. ### NumPy * `fetchnumpy()` fetches the data as a dictionary of NumPy arrays ### Pandas * `df()` fetches the data as a Pandas DataFrame * `fetchdf()` is an alias of `df()` * `fetch_df()` is an alias of `df()` * `fetch_df_chunk(vector_multiple)` fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default). ### Apache Arrow * `arrow()` fetches the data as an [Arrow table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) * `fetch_arrow_table()` is an alias of `arrow()` * `fetch_record_batch(chunk_size)` returns an [Arrow record batch reader](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html) with `chunk_size` rows per batch ### Polars * `pl()` fetches the data as a Polars DataFrame ### Examples Below are some examples using this functionality. See the [Python guides]({% link docs/1.3/guides/overview.md %}#python-client) for more examples. Fetch as Pandas DataFrame: ```python df = con.execute("SELECT * FROM items").fetchdf() print(df) ``` ```text item value count 0 jeans 20.0 1 1 hammer 42.2 2 2 laptop 2000.0 1 3 chainsaw 500.0 10 4 iphone 300.0 2 ``` Fetch as dictionary of NumPy arrays: ```python arr = con.execute("SELECT * FROM items").fetchnumpy() print(arr) ``` ```text {'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'], mask=[False, False, False, False, False], fill_value='?', dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0], mask=[False, False, False, False, False], fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2], mask=[False, False, False, False, False], fill_value=999999, dtype=int32)} ``` Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing: ```python tbl = con.execute("SELECT * FROM items").fetch_arrow_table() print(tbl.to_pandas()) ``` ```text item value count 0 jeans 20.00 1 1 hammer 42.20 2 2 laptop 2000.00 1 3 chainsaw 500.00 10 4 iphone 300.00 2 ```