--- layout: docu title: Text Functions --- ## Text Functions and Operators This section describes functions and operators for examining and manipulating [`STRING` values]({% link docs/1.3/sql/data_types/text.md %}). | Name | Description | |:--|:-------| | [`string[index]`](#stringindex) | Extracts a single character using a (1-based) index. | | [`string[begin:end]`](#stringbeginend) | Extracts a string using [slice conventions]({% link docs/1.3/sql/functions/list.md %}#slicing) similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. | | [`string LIKE target`](#string-like-target) | Returns `true` if the `string` matches the like specifier (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})). | | [`string SIMILAR TO regex`](#string-similar-to-regex) | Returns `true` if the `string` matches the `regex` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})). | | [`string ^@ search_string`](#string--search_string) | Returns `true` if `string` begins with `search_string`. | | [`arg1 || arg2`](#arg1--arg2) | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#concatvalue-) and [`list_concat(list1, list2)`]({% link docs/1.3/sql/functions/list.md %}#list_concatlist1-list2). | | [`array_extract(string, index)`](#array_extractstring-index) | Extracts a single character from a `string` using a (1-based) `index`. | | [`array_slice(list, begin, end)`](#array_slicelist-begin-end) | Extracts a sublist or substring using [slice conventions]({% link docs/1.3/sql/functions/list.md %}#slicing). Negative values are accepted. | | [`ascii(string)`](#asciistring) | Returns an integer that represents the Unicode code point of the first character of the `string`. | | [`bar(x, min, max[, width])`](#barx-min-max-width) | Draws a band whose width is proportional to (`x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80. | | [`base64(blob)`](#base64blob) | Converts a `blob` to a base64 encoded string. | | [`bin(string)`](#binstring) | Converts the `string` to binary representation. | | [`bit_length(string)`](#bit_lengthstring) | Number of bits in a `string`. | | [`char_length(string)`](#char_lengthstring) | Number of characters in `string`. | | [`character_length(string)`](#character_lengthstring) | Number of characters in `string`. | | [`chr(code_point)`](#chrcode_point) | Returns a character which is corresponding the ASCII code value or Unicode code point. | | [`concat(value, ...)`](#concatvalue-) | Concatenates multiple strings, lists, or blobs. `NULL` inputs are skipped. See also [operator `||`](#arg1--arg2). | | [`concat_ws(separator, string, ...)`](#concat_wsseparator-string-) | Concatenates many strings, separated by `separator`. `NULL` inputs are skipped. | | [`contains(string, search_string)`](#containsstring-search_string) | Returns `true` if `search_string` is found within `string`. Note that [collations]({% link docs/1.3/sql/expressions/collations.md %}) are not supported. | | [`ends_with(string, search_string)`](#ends_withstring-search_string) | Returns `true` if `string` ends with `search_string`. | | [`format(format, ...)`](#formatformat-) | Formats a string using the [fmt syntax](#fmt-syntax). | | [`formatReadableDecimalSize(integer)`](#formatreadabledecimalsizeinteger) | Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). | | [`formatReadableSize(integer)`](#formatreadablesizeinteger) | Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). | | [`format_bytes(integer)`](#format_bytesinteger) | Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). | | [`from_base64(string)`](#from_base64string) | Converts a base64 encoded `string` to a character string (`BLOB`). | | [`from_binary(value)`](#from_binaryvalue) | Converts a `value` from binary representation to a blob. | | [`from_hex(value)`](#from_hexvalue) | Converts a `value` from hexadecimal representation to a blob. | | [`greatest(arg1, ...)`](#greatestarg1-) | Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and [collations]({% link docs/1.3/sql/expressions/collations.md %}) are not supported. | | [`hash(value, ...)`](#hashvalue-) | Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash. | | [`hex(string)`](#hexstring) | Converts the `string` to hexadecimal representation. | | [`ilike_escape(string, like_specifier, escape_character)`](#ilike_escapestring-like_specifier-escape_character) | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`instr(string, search_string)`](#instrstring-search_string) | Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | [`lcase(string)`](#lcasestring) | Converts `string` to lower case. | | [`least(arg1, ...)`](#leastarg1-) | Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and [collations]({% link docs/1.3/sql/expressions/collations.md %}) are not supported. | | [`left(string, count)`](#leftstring-count) | Extracts the left-most count characters. | | [`left_grapheme(string, count)`](#left_graphemestring-count) | Extracts the left-most count grapheme clusters. | | [`len(string)`](#lenstring) | Number of characters in `string`. | | [`length(string)`](#lengthstring) | Number of characters in `string`. | | [`length_grapheme(string)`](#length_graphemestring) | Number of grapheme clusters in `string`. | | [`like_escape(string, like_specifier, escape_character)`](#like_escapestring-like_specifier-escape_character) | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`lower(string)`](#lowerstring) | Converts `string` to lower case. | | [`lpad(string, count, character)`](#lpadstring-count-character) | Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | [`ltrim(string[, characters])`](#ltrimstring-characters) | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. | | [`md5(string)`](#md5string) | Returns the MD5 hash of the `string` as a `VARCHAR`. | | [`md5_number(string)`](#md5_numberstring) | Returns the MD5 hash of the `string` as a `HUGEINT`. | | [`md5_number_lower(string)`](#md5_number_lowerstring) | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | [`md5_number_upper(string)`](#md5_number_upperstring) | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | [`nfc_normalize(string)`](#nfc_normalizestring) | Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. | | [`not_ilike_escape(string, like_specifier, escape_character)`](#not_ilike_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`not_like_escape(string, like_specifier, escape_character)`](#not_like_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`ord(string)`](#ordstring) | Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. | | [`parse_dirname(path[, separator])`](#parse_dirnamepath-separator) | Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`parse_dirpath(path[, separator])`](#parse_dirpathpath-separator) | Returns the head of the `path` (the pathname until the last slash) similarly to Python's [`os.path.dirname`](https://docs.python.org/3.7/library/os.path.html#os.path.dirname). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`parse_filename(string[, trim_extension][, separator])`](#parse_filenamestring-trim_extension-separator) | Returns the last component of the `path` similarly to Python's [`os.path.basename`](https://docs.python.org/3.7/library/os.path.html#os.path.basename) function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`parse_path(path[, separator])`](#parse_pathpath-separator) | Returns a list of the components (directories and filename) in the `path` similarly to Python's [`pathlib.parts`](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts) function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`position(search_string IN string)`](#positionsearch_string-in-string) | Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | [`prefix(string, search_string)`](#prefixstring-search_string) | Returns `true` if `string` starts with `search_string`. | | [`printf(format, ...)`](#printfformat-) | Formats a `string` using [printf syntax](#printf-syntax). | | [`read_text(source)`](#read_textsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide]({% link docs/1.3/guides/file_formats/read_file.md %}#read_text) for more details. | | [`regexp_escape(string)`](#regexp_escapestring) | Escapes special patterns to turn `string` into a regular expression similarly to Python's [`re.escape` function](https://docs.python.org/3/library/re.html#re.escape). | | [`regexp_extract(string, regex[, group][, options])`](#regexp_extractstring-regex-group-options) | If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`regexp_extract(string, regex, name_list[, options])`](#regexp_extractstring-regex-name_list-options) | If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`regexp_extract_all(string, regex[, group][, options])`](#regexp_extract_allstring-regex-group-options) | Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`regexp_full_match(string, regex[, col2])`](#regexp_full_matchstring-regex-col2) | Returns `true` if the entire `string` matches the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`regexp_matches(string, regex[, options])`](#regexp_matchesstring-regex-options) | Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`regexp_replace(string, regex, replacement[, options])`](#regexp_replacestring-regex-replacement-options) | If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`regexp_split_to_array(string, regex[, options])`](#regexp_split_to_arraystring-regex-options) | Splits the `string` along the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`regexp_split_to_table(string, regex)`](#regexp_split_to_tablestring-regex) | Splits the `string` along the `regex` and returns a row for each part. | | [`repeat(string, count)`](#repeatstring-count) | Repeats the `string` `count` number of times. | | [`replace(string, source, target)`](#replacestring-source-target) | Replaces any occurrences of the `source` with `target` in `string`. | | [`reverse(string)`](#reversestring) | Reverses the `string`. | | [`right(string, count)`](#rightstring-count) | Extract the right-most `count` characters. | | [`right_grapheme(string, count)`](#right_graphemestring-count) | Extracts the right-most `count` grapheme clusters. | | [`rpad(string, count, character)`](#rpadstring-count-character) | Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | [`rtrim(string[, characters])`](#rtrimstring-characters) | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. | | [`sha1(value)`](#sha1value) | Returns a `VARCHAR` with the SHA-1 hash of the `value`. | | [`sha256(value)`](#sha256value) | Returns a `VARCHAR` with the SHA-256 hash of the `value` | | [`split(string, separator)`](#splitstring-separator) | Splits the `string` along the `separator`. | | [`split_part(string, separator, index)`](#split_partstring-separator-index) | Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). | | [`starts_with(string, search_string)`](#starts_withstring-search_string) | Returns `true` if `string` begins with `search_string`. | | [`str_split(string, separator)`](#str_splitstring-separator) | Splits the `string` along the `separator`. | | [`str_split_regex(string, regex[, options])`](#str_split_regexstring-regex-options) | Splits the `string` along the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`string_split(string, separator)`](#string_splitstring-separator) | Splits the `string` along the `separator`. | | [`string_split_regex(string, regex[, options])`](#string_split_regexstring-regex-options) | Splits the `string` along the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | [`string_to_array(string, separator)`](#string_to_arraystring-separator) | Splits the `string` along the `separator`. | | [`strip_accents(string)`](#strip_accentsstring) | Strips accents from `string`. | | [`strlen(string)`](#strlenstring) | Number of bytes in `string`. | | [`strpos(string, search_string)`](#strposstring-search_string) | Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | [`substr(string, start[, length])`](#substrstring-start-length) | Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. | | [`substring(string, start[, length])`](#substringstring-start-length) | Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. | | [`substring_grapheme(string, start[, length])`](#substring_graphemestring-start-length) | Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`. | | [`suffix(string, search_string)`](#suffixstring-search_string) | Returns `true` if `string` ends with `search_string`. | | [`to_base(number, radix[, min_length])`](#to_basenumber-radix-min_length) | Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. | | [`to_base64(blob)`](#to_base64blob) | Converts a `blob` to a base64 encoded string. | | [`to_binary(string)`](#to_binarystring) | Converts the `string` to binary representation. | | [`to_hex(string)`](#to_hexstring) | Converts the `string` to hexadecimal representation. | | [`translate(string, from, to)`](#translatestring-from-to) | Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted. | | [`trim(string[, characters])`](#trimstring-characters) | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. | | [`ucase(string)`](#ucasestring) | Converts `string` to upper case. | | [`unbin(value)`](#unbinvalue) | Converts a `value` from binary representation to a blob. | | [`unhex(value)`](#unhexvalue) | Converts a `value` from hexadecimal representation to a blob. | | [`unicode(string)`](#unicodestring) | Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. | | [`upper(string)`](#upperstring) | Converts `string` to upper case. | | [`url_decode(string)`](#url_decodestring) | Decodes a URL from a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | | [`url_encode(string)`](#url_encodestring) | Encodes a URL to a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | #### `string[index]`
| **Description** | Extracts a single character using a (1-based) index. | | **Example** | `'DuckDB'[4]` | | **Result** | `k` | | **Alias** | `array_extract` | #### `string[begin:end]`
| **Description** | Extracts a string using [slice conventions]({% link docs/1.3/sql/functions/list.md %}#slicing) similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. | | **Example** | `'DuckDB'[:4]` | | **Result** | `Duck` | | **Alias** | `array_slice` | #### `string LIKE target`
| **Description** | Returns `true` if the `string` matches the like specifier (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})). | | **Example** | `'hello' LIKE '%lo'` | | **Result** | `true` | #### `string SIMILAR TO regex`
| **Description** | Returns `true` if the `string` matches the `regex` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})). | | **Example** | `'hello' SIMILAR TO 'l+'` | | **Result** | `false` | | **Alias** | `regexp_full_match` | #### `string ^@ search_string`
| **Description** | Returns `true` if `string` begins with `search_string`. | | **Example** | `'abc' ^@ 'a'` | | **Result** | `true` | | **Alias** | `starts_with` | #### `arg1 || arg2`
| **Description** | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#concatvalue-) and [`list_concat(list1, list2)`]({% link docs/1.3/sql/functions/list.md %}#list_concatlist1-list2). | | **Example 1** | `'Duck' || 'DB'` | | **Result** | `DuckDB` | | **Example 2** | `[1, 2, 3] || [4, 5, 6]` | | **Result** | `[1, 2, 3, 4, 5, 6]` | | **Example 3** | `'\xAA'::BLOB || '\xBB'::BLOB` | | **Result** | `\xAA\xBB` | #### `array_extract(string, index)`
| **Description** | Extracts a single character from a `string` using a (1-based) `index`. | | **Example** | `array_extract('DuckDB', 2)` | | **Result** | `u` | #### `array_slice(list, begin, end)`
| **Description** | Extracts a sublist or substring using [slice conventions]({% link docs/1.3/sql/functions/list.md %}#slicing). Negative values are accepted. | | **Example 1** | `array_slice('DuckDB', 3, 4)` | | **Result** | `ck` | | **Example 2** | `array_slice('DuckDB', 3, NULL)` | | **Result** | `NULL` | | **Example 3** | `array_slice('DuckDB', 0, -3)` | | **Result** | `Duck` | | **Alias** | `list_slice` | #### `ascii(string)`
| **Description** | Returns an integer that represents the Unicode code point of the first character of the `string`. | | **Example** | `ascii('Ω')` | | **Result** | `937` | #### `bar(x, min, max[, width])`
| **Description** | Draws a band whose width is proportional to (`x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80. | | **Example** | `bar(5, 0, 20, 10)` | | **Result** | `██▌ ` | #### `base64(blob)`
| **Description** | Converts a `blob` to a base64 encoded string. | | **Example** | `base64('A'::BLOB)` | | **Result** | `QQ==` | | **Alias** | `to_base64` | #### `bin(string)`
| **Description** | Converts the `string` to binary representation. | | **Example** | `bin('Aa')` | | **Result** | `0100000101100001` | | **Alias** | `to_binary` | #### `bit_length(string)`
| **Description** | Number of bits in a `string`. | | **Example** | `bit_length('abc')` | | **Result** | `24` | #### `char_length(string)`
| **Description** | Number of characters in `string`. | | **Example** | `char_length('Hello🦆')` | | **Result** | `6` | | **Aliases** | `character_length`, `len`, `length` | #### `character_length(string)`
| **Description** | Number of characters in `string`. | | **Example** | `character_length('Hello🦆')` | | **Result** | `6` | | **Aliases** | `char_length`, `len`, `length` | #### `chr(code_point)`
| **Description** | Returns a character which is corresponding the ASCII code value or Unicode code point. | | **Example** | `chr(65)` | | **Result** | `A` | #### `concat(value, ...)`
| **Description** | Concatenates multiple strings, lists, or blobs. `NULL` inputs are skipped. See also [operator `||`](#arg1--arg2). | | **Example** | `concat('Hello', ' ', 'World')` | | **Result** | `Hello World` | #### `concat_ws(separator, string, ...)`
| **Description** | Concatenates many strings, separated by `separator`. `NULL` inputs are skipped. | | **Example** | `concat_ws(', ', 'Banana', 'Apple', 'Melon')` | | **Result** | `Banana, Apple, Melon` | #### `contains(string, search_string)`
| **Description** | Returns `true` if `search_string` is found within `string`. | | **Example** | `contains('abc', 'a')` | | **Result** | `true` | #### `ends_with(string, search_string)`
| **Description** | Returns `true` if `string` ends with `search_string`. Note that [collations]({% link docs/1.3/sql/expressions/collations.md %}) are not supported. | | **Example** | `ends_with('abc', 'bc')` | | **Result** | `true` | | **Alias** | `suffix` | #### `format(format, ...)`
| **Description** | Formats a string using the [fmt syntax](#fmt-syntax). | | **Example** | `format('Benchmark "{}" took {} seconds', 'CSV', 42)` | | **Result** | `Benchmark "CSV" took 42 seconds` | #### `formatReadableDecimalSize(integer)`
| **Description** | Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). | | **Example** | `formatReadableDecimalSize(16_000)` | | **Result** | `16.0 kB` | #### `formatReadableSize(integer)`
| **Description** | Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). | | **Example** | `formatReadableSize(16_000)` | | **Result** | `15.6 KiB` | | **Alias** | `format_bytes` | #### `format_bytes(integer)`
| **Description** | Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). | | **Example** | `format_bytes(16_000)` | | **Result** | `15.6 KiB` | | **Alias** | `formatReadableSize` | #### `from_base64(string)`
| **Description** | Converts a base64 encoded `string` to a character string (`BLOB`). | | **Example** | `from_base64('QQ==')` | | **Result** | `A` | #### `from_binary(value)`
| **Description** | Converts a `value` from binary representation to a blob. | | **Example** | `from_binary('0110')` | | **Result** | `\x06` | | **Alias** | `unbin` | #### `from_hex(value)`
| **Description** | Converts a `value` from hexadecimal representation to a blob. | | **Example** | `from_hex('2A')` | | **Result** | `*` | | **Alias** | `unhex` | #### `greatest(arg1, ...)`
| **Description** | Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and [collations]({% link docs/1.3/sql/expressions/collations.md %}) are not supported. | | **Example 1** | `greatest(42, 84)` | | **Result** | `84` | | **Example 2** | `greatest('abc', 'bcd', 'cde', 'EFG')` | | **Result** | `cde` | #### `hash(value, ...)`
| **Description** | Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash. | | **Example** | `hash('🦆')` | | **Result** | `4164431626903154684` | #### `hex(string)`
| **Description** | Converts the `string` to hexadecimal representation. | | **Example** | `hex('Hello')` | | **Result** | `48656C6C6F` | | **Alias** | `to_hex` | #### `ilike_escape(string, like_specifier, escape_character)`
| **Description** | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `ilike_escape('A%c', 'a$%C', '$')` | | **Result** | `true` | #### `instr(string, search_string)`
| **Description** | Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | **Example** | `instr('test test', 'es')` | | **Result** | `2` | | **Aliases** | `position`, `strpos` | #### `lcase(string)`
| **Description** | Converts `string` to lower case. | | **Example** | `lcase('Hello')` | | **Result** | `hello` | | **Alias** | `lower` | #### `least(arg1, ...)`
| **Description** | Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and [collations]({% link docs/1.3/sql/expressions/collations.md %}) are not supported. | | **Example 1** | `least(42, 84)` | | **Result** | `42` | | **Example 2** | `least('abc', 'bcd', 'cde', 'EFG')` | | **Result** | `EFG` | #### `left(string, count)`
| **Description** | Extracts the left-most count characters. | | **Example** | `left('Hello🦆', 2)` | | **Result** | `He` | #### `left_grapheme(string, count)`
| **Description** | Extracts the left-most count grapheme clusters. | | **Example** | `left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` | | **Result** | `🤦🏼‍♂️` | #### `len(string)`
| **Description** | Number of characters in `string`. | | **Example** | `length('Hello🦆')` | | **Result** | `6` | | **Aliases** | `char_length`, `character_length`, `length` | #### `length(string)`
| **Description** | Number of characters in `string`. | | **Example** | `length('Hello🦆')` | | **Result** | `6` | | **Aliases** | `char_length`, `character_length`, `len` | #### `length_grapheme(string)`
| **Description** | Number of grapheme clusters in `string`. | | **Example** | `length_grapheme('🤦🏼‍♂️🤦🏽‍♀️')` | | **Result** | `2` | #### `like_escape(string, like_specifier, escape_character)`
| **Description** | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `like_escape('a%c', 'a$%c', '$')` | | **Result** | `true` | #### `lower(string)`
| **Description** | Converts `string` to lower case. | | **Example** | `lower('Hello')` | | **Result** | `hello` | | **Alias** | `lcase` | #### `lpad(string, count, character)`
| **Description** | Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | **Example** | `lpad('hello', 8, '>')` | | **Result** | `>>>hello` | #### `ltrim(string[, characters])`
| **Description** | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. | | **Example 1** | `ltrim(' test ')` | | **Result** | `test ` | | **Example 2** | `ltrim('>>>>test<<', '><')` | | **Result** | `test<<` | #### `md5(string)`
| **Description** | Returns the MD5 hash of the `string` as a `VARCHAR`. | | **Example** | `md5('abc')` | | **Result** | `900150983cd24fb0d6963f7d28e17f72` | #### `md5_number(string)`
| **Description** | Returns the MD5 hash of the `string` as a `HUGEINT`. | | **Example** | `md5_number('abc')` | | **Result** | `152195979970564155685860391459828531600` | #### `md5_number_lower(string)`
| **Description** | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | **Example** | `md5_number_lower('abc')` | | **Result** | `8250560606382298838` | #### `md5_number_upper(string)`
| **Description** | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | **Example** | `md5_number_upper('abc')` | | **Result** | `12704604231530709392` | #### `nfc_normalize(string)`
| **Description** | Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. | | **Example** | `nfc_normalize('ardèch')` | | **Result** | `ardèch` | #### `not_ilike_escape(string, like_specifier, escape_character)`
| **Description** | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `not_ilike_escape('A%c', 'a$%C', '$')` | | **Result** | `false` | #### `not_like_escape(string, like_specifier, escape_character)`
| **Description** | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching]({% link docs/1.3/sql/functions/pattern_matching.md %})) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `not_like_escape('a%c', 'a$%c', '$')` | | **Result** | `false` | #### `ord(string)`
| **Description** | Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. | | **Example** | `[unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]` | | **Result** | `[226, 226, -1, NULL]` | | **Alias** | `unicode` | #### `parse_dirname(path[, separator])`
| **Description** | Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_dirname('path/to/file.csv', 'system')` | | **Result** | `path` | #### `parse_dirpath(path[, separator])`
| **Description** | Returns the head of the `path` (the pathname until the last slash) similarly to Python's [`os.path.dirname`](https://docs.python.org/3.7/library/os.path.html#os.path.dirname). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_dirpath('path/to/file.csv', 'forward_slash')` | | **Result** | `path/to` | #### `parse_filename(string[, trim_extension][, separator])`
| **Description** | Returns the last component of the `path` similarly to Python's [`os.path.basename`](https://docs.python.org/3.7/library/os.path.html#os.path.basename) function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_filename('path/to/file.csv', true, 'forward_slash')` | | **Result** | `file` | #### `parse_path(path[, separator])`
| **Description** | Returns a list of the components (directories and filename) in the `path` similarly to Python's [`pathlib.parts`](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts) function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_path('path/to/file.csv', 'system')` | | **Result** | `[path, to, file.csv]` | #### `position(search_string IN string)`
| **Description** | Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | **Example** | `position('b' IN 'abc')` | | **Result** | `2` | | **Aliases** | `instr`, `strpos` | #### `prefix(string, search_string)`
| **Description** | Returns `true` if `string` starts with `search_string`. | | **Example** | `prefix('abc', 'ab')` | | **Result** | `true` | #### `printf(format, ...)`
| **Description** | Formats a `string` using [printf syntax](#printf-syntax). | | **Example** | `printf('Benchmark "%s" took %d seconds', 'CSV', 42)` | | **Result** | `Benchmark "CSV" took 42 seconds` | #### `read_text(source)`
| **Description** | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide]({% link docs/1.3/guides/file_formats/read_file.md %}#read_text) for more details. | | **Example** | `read_text('hello.txt')` | | **Result** | `hello\n` | #### `regexp_escape(string)`
| **Description** | Escapes special patterns to turn `string` into a regular expression similarly to Python's [`re.escape` function](https://docs.python.org/3/library/re.html#re.escape). | | **Example** | `regexp_escape('https://duckdb.org')` | | **Result** | `https\:\/\/duckdb\.org` | #### `regexp_extract(string, regex[, group][, options])`
| **Description** | If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract('ABC', '([a-z])(b)', 1, 'i')` | | **Result** | `A` | #### `regexp_extract(string, regex, name_list[, options])`
| **Description** | If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i')` | | **Result** | `{'first_name': John, 'last_name': Doe}` | #### `regexp_extract_all(string, regex[, group][, options])`
| **Description** | Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2)` | | **Result** | `[33, 14]` | #### `regexp_full_match(string, regex[, col2])`
| **Description** | Returns `true` if the entire `string` matches the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `regexp_full_match('anabanana', '(an)*')` | | **Result** | `false` | #### `regexp_matches(string, regex[, options])`
| **Description** | Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `regexp_matches('anabanana', '(an)*')` | | **Result** | `true` | #### `regexp_replace(string, regex, replacement[, options])`
| **Description** | If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `regexp_replace('hello', '[lo]', '-')` | | **Result** | `he-lo` | #### `regexp_split_to_array(string, regex[, options])`
| **Description** | Splits the `string` along the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `regexp_split_to_array('hello world; 42', ';? ')` | | **Result** | `[hello, world, 42]` | | **Aliases** | `str_split_regex`, `string_split_regex` | #### `regexp_split_to_table(string, regex)`
| **Description** | Splits the `string` along the `regex` and returns a row for each part. | | **Example** | `regexp_split_to_table('hello world; 42', ';? ')` | | **Result** | Multiple rows: `'hello'`, `'world'`, `'42'` | #### `repeat(string, count)`
| **Description** | Repeats the `string` `count` number of times. | | **Example** | `repeat('A', 5)` | | **Result** | `AAAAA` | #### `replace(string, source, target)`
| **Description** | Replaces any occurrences of the `source` with `target` in `string`. | | **Example** | `replace('hello', 'l', '-')` | | **Result** | `he--o` | #### `reverse(string)`
| **Description** | Reverses the `string`. | | **Example** | `reverse('hello')` | | **Result** | `olleh` | #### `right(string, count)`
| **Description** | Extract the right-most `count` characters. | | **Example** | `right('Hello🦆', 3)` | | **Result** | `lo🦆` | #### `right_grapheme(string, count)`
| **Description** | Extracts the right-most `count` grapheme clusters. | | **Example** | `right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1)` | | **Result** | `🤦🏽‍♀️` | #### `rpad(string, count, character)`
| **Description** | Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | **Example** | `rpad('hello', 10, '<')` | | **Result** | `hello<<<<<` | #### `rtrim(string[, characters])`
| **Description** | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. | | **Example 1** | `rtrim(' test ')` | | **Result** | ` test` | | **Example 2** | `rtrim('>>>>test<<', '><')` | | **Result** | `>>>>test` | #### `sha1(value)`
| **Description** | Returns a `VARCHAR` with the SHA-1 hash of the `value`. | | **Example** | `sha1('🦆')` | | **Result** | `949bf843dc338be348fb9525d1eb535d31241d76` | #### `sha256(value)`
| **Description** | Returns a `VARCHAR` with the SHA-256 hash of the `value` | | **Example** | `sha256('🦆')` | | **Result** | `d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb` | #### `split(string, separator)`
| **Description** | Splits the `string` along the `separator`. | | **Example** | `split('hello-world', '-')` | | **Result** | `[hello, world]` | | **Aliases** | `str_split`, `string_split`, `string_to_array` | #### `split_part(string, separator, index)`
| **Description** | Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). | | **Example** | `split_part('a;b;c', ';', 2)` | | **Result** | `b` | #### `starts_with(string, search_string)`
| **Description** | Returns `true` if `string` begins with `search_string`. | | **Example** | `starts_with('abc', 'a')` | | **Result** | `true` | | **Alias** | `^@` | #### `str_split(string, separator)`
| **Description** | Splits the `string` along the `separator`. | | **Example** | `str_split('hello-world', '-')` | | **Result** | `[hello, world]` | | **Aliases** | `split`, `string_split`, `string_to_array` | #### `str_split_regex(string, regex[, options])`
| **Description** | Splits the `string` along the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `str_split_regex('hello world; 42', ';? ')` | | **Result** | `[hello, world, 42]` | | **Aliases** | `regexp_split_to_array`, `string_split_regex` | #### `string_split(string, separator)`
| **Description** | Splits the `string` along the `separator`. | | **Example** | `string_split('hello-world', '-')` | | **Result** | `[hello, world]` | | **Aliases** | `split`, `str_split`, `string_to_array` | #### `string_split_regex(string, regex[, options])`
| **Description** | Splits the `string` along the `regex`. A set of optional [regex `options`]({% link docs/1.3/sql/functions/regular_expressions.md %}#options-for-regular-expression-functions) can be set. | | **Example** | `string_split_regex('hello world; 42', ';? ')` | | **Result** | `[hello, world, 42]` | | **Aliases** | `regexp_split_to_array`, `str_split_regex` | #### `string_to_array(string, separator)`
| **Description** | Splits the `string` along the `separator`. | | **Example** | `string_to_array('hello-world', '-')` | | **Result** | `[hello, world]` | | **Aliases** | `split`, `str_split`, `string_split` | #### `strip_accents(string)`
| **Description** | Strips accents from `string`. | | **Example** | `strip_accents('mühleisen')` | | **Result** | `muhleisen` | #### `strlen(string)`
| **Description** | Number of bytes in `string`. | | **Example** | `strlen('🦆')` | | **Result** | `4` | #### `strpos(string, search_string)`
| **Description** | Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | **Example** | `strpos('test test', 'es')` | | **Result** | `2` | | **Aliases** | `instr`, `position` | #### `substr(string, start[, length])`
| **Description** | Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. | | **Example 1** | `substring('Hello', 2)` | | **Result** | `ello` | | **Example 2** | `substring('Hello', 2, 2)` | | **Result** | `el` | | **Alias** | `substring` | #### `substring(string, start[, length])`
| **Description** | Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. | | **Example 1** | `substring('Hello', 2)` | | **Result** | `ello` | | **Example 2** | `substring('Hello', 2, 2)` | | **Result** | `el` | | **Alias** | `substr` | #### `substring_grapheme(string, start[, length])`
| **Description** | Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`. | | **Example 1** | `substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3)` | | **Result** | `🤦🏽‍♀️🦆` | | **Example 2** | `substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3, 2)` | | **Result** | `🤦🏽‍♀️🦆` | #### `suffix(string, search_string)`
| **Description** | Returns `true` if `string` ends with `search_string`. | | **Example** | `suffix('abc', 'bc')` | | **Result** | `true` | | **Alias** | `ends_with` | #### `to_base(number, radix[, min_length])`
| **Description** | Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. | | **Example** | `to_base(42, 16, 5)` | | **Result** | `0002A` | #### `to_base64(blob)`
| **Description** | Converts a `blob` to a base64 encoded string. | | **Example** | `to_base64('A'::BLOB)` | | **Result** | `QQ==` | | **Alias** | `base64` | #### `to_binary(string)`
| **Description** | Converts the `string` to binary representation. | | **Example** | `to_binary('Aa')` | | **Result** | `0100000101100001` | | **Alias** | `bin` | #### `to_hex(string)`
| **Description** | Converts the `string` to hexadecimal representation. | | **Example** | `to_hex('Hello')` | | **Result** | `48656C6C6F` | | **Alias** | `hex` | #### `translate(string, from, to)`
| **Description** | Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted. | | **Example** | `translate('12345', '143', 'ax')` | | **Result** | `a2x5` | #### `trim(string[, characters])`
| **Description** | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. | | **Example 1** | `trim(' test ')` | | **Result** | `test` | | **Example 2** | `trim('>>>>test<<', '><')` | | **Result** | `test` | #### `ucase(string)`
| **Description** | Converts `string` to upper case. | | **Example** | `ucase('Hello')` | | **Result** | `HELLO` | | **Alias** | `upper` | #### `unbin(value)`
| **Description** | Converts a `value` from binary representation to a blob. | | **Example** | `unbin('0110')` | | **Result** | `\x06` | | **Alias** | `from_binary` | #### `unhex(value)`
| **Description** | Converts a `value` from hexadecimal representation to a blob. | | **Example** | `unhex('2A')` | | **Result** | `*` | | **Alias** | `from_hex` | #### `unicode(string)`
| **Description** | Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. | | **Example** | `[unicode('âbcd'), unicode('â'), unicode(''), unicode(NULL)]` | | **Result** | `[226, 226, -1, NULL]` | | **Alias** | `ord` | #### `upper(string)`
| **Description** | Converts `string` to upper case. | | **Example** | `upper('Hello')` | | **Result** | `HELLO` | | **Alias** | `ucase` | #### `url_decode(string)`
| **Description** | Decodes a URL from a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | | **Example** | `url_decode('https%3A%2F%2Fduckdb.org%2Fwhy_duckdb%23portable')` | | **Result** | `https://duckdb.org/why_duckdb#portable` | #### `url_encode(string)`
| **Description** | Encodes a URL to a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | | **Example** | `url_encode('this string has/ special+ characters>')` | | **Result** | `this%20string%20has%2F%20special%2B%20characters%3E` | ## Text Similarity Functions These functions are used to measure the similarity of two strings using various [similarity measures](https://en.wikipedia.org/wiki/Similarity_measure). | Name | Description | |:--|:-------| | [`damerau_levenshtein(s1, s2)`](#damerau_levenshteins1-s2) | Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different. | | [`editdist3(s1, s2)`](#editdist3s1-s2) | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. | | [`hamming(s1, s2)`](#hammings1-s2) | The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. | | [`jaccard(s1, s2)`](#jaccards1-s2) | The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. | | [`jaro_similarity(s1, s2[, score_cutoff])`](#jaro_similaritys1-s2-score_cutoff) | The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | [`jaro_winkler_similarity(s1, s2[, score_cutoff])`](#jaro_winkler_similaritys1-s2-score_cutoff) | The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | [`levenshtein(s1, s2)`](#levenshteins1-s2) | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. | | [`mismatches(s1, s2)`](#mismatchess1-s2) | The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. | #### `damerau_levenshtein(s1, s2)`
| **Description** | Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `damerau_levenshtein('duckdb', 'udckbd')` | | **Result** | `2` | #### `editdist3(s1, s2)`
| **Description** | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `editdist3('duck', 'db')` | | **Result** | `3` | | **Alias** | `levenshtein` | #### `hamming(s1, s2)`
| **Description** | The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `hamming('duck', 'luck')` | | **Result** | `1` | | **Alias** | `mismatches` | #### `jaccard(s1, s2)`
| **Description** | The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. | | **Example** | `jaccard('duck', 'luck')` | | **Result** | `0.6` | #### `jaro_similarity(s1, s2[, score_cutoff])`
| **Description** | The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | **Example** | `jaro_similarity('duck', 'duckdb')` | | **Result** | `0.8888888888888888` | #### `jaro_winkler_similarity(s1, s2[, score_cutoff])`
| **Description** | The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | **Example** | `jaro_winkler_similarity('duck', 'duckdb')` | | **Result** | `0.9333333333333333` | #### `levenshtein(s1, s2)`
| **Description** | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `levenshtein('duck', 'db')` | | **Result** | `3` | | **Alias** | `editdist3` | #### `mismatches(s1, s2)`
| **Description** | The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `mismatches('duck', 'luck')` | | **Result** | `1` | | **Alias** | `hamming` | ## Formatters ### `fmt` Syntax The `format(format, parameters...)` function formats strings, loosely following the syntax of the [{fmt} open-source formatting library](https://fmt.dev/latest/syntax/). Format without additional parameters: ```sql SELECT format('Hello world'); -- Hello world ``` Format a string using {}: ```sql SELECT format('The answer is {}', 42); -- The answer is 42 ``` Format a string using positional arguments: ```sql SELECT format('I''d rather be {1} than {0}.', 'right', 'happy'); -- I'd rather be happy than right. ``` #### Format Specifiers | Specifier | Description | Example | |:-|:------|:---| | `{:d}` | integer | `654321` | | `{:E}` | scientific notation | `3.141593E+00` | | `{:f}` | float | `4.560000` | | `{:o}` | octal | `2375761` | | `{:s}` | string | `asd` | | `{:x}` | hexadecimal | `9fbf1` | | `{:tX}` | integer, `X` is the thousand separator | `654 321` | #### Formatting Types Integers: ```sql SELECT format('{} + {} = {}', 3, 5, 3 + 5); -- 3 + 5 = 8 ``` Booleans: ```sql SELECT format('{} != {}', true, false); -- true != false ``` Format datetime values: ```sql SELECT format('{}', DATE '1992-01-01'); -- 1992-01-01 SELECT format('{}', TIME '12:01:00'); -- 12:01:00 SELECT format('{}', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00 ``` Format BLOB: ```sql SELECT format('{}', BLOB '\x00hello'); -- \x00hello ``` Pad integers with 0s: ```sql SELECT format('{:04d}', 33); -- 0033 ``` Create timestamps from integers: ```sql SELECT format('{:02d}:{:02d}:{:02d} {}', 12, 3, 16, 'AM'); -- 12:03:16 AM ``` Convert to hexadecimal: ```sql SELECT format('{:x}', 123_456_789); -- 75bcd15 ``` Convert to binary: ```sql SELECT format('{:b}', 123_456_789); -- 111010110111100110100010101 ``` #### Print Numbers with Thousand Separators Integers: ```sql SELECT format('{:,}', 123_456_789); -- 123,456,789 SELECT format('{:t.}', 123_456_789); -- 123.456.789 SELECT format('{:''}', 123_456_789); -- 123'456'789 SELECT format('{:_}', 123_456_789); -- 123_456_789 SELECT format('{:t }', 123_456_789); -- 123 456 789 SELECT format('{:tX}', 123_456_789); -- 123X456X789 ``` Float, double and decimal: ```sql SELECT format('{:,f}', 123456.789); -- 123,456.78900 SELECT format('{:,.2f}', 123456.789); -- 123,456.79 SELECT format('{:t..2f}', 123456.789); -- 123.456,79 ``` ### `printf` Syntax The `printf(format, parameters...)` function formats strings using the [`printf` syntax](https://cplusplus.com/reference/cstdio/printf/). Format without additional parameters: ```sql SELECT printf('Hello world'); ``` ```text Hello world ``` Format a string using arguments in a given order: ```sql SELECT printf('The answer to %s is %d', 'life', 42); ``` ```text The answer to life is 42 ``` Format a string using positional arguments `%position$formatter`, e.g., the second parameter as a string is encoded as `%2$s`: ```sql SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy'); ``` ```text I'd rather be happy than right. ``` #### Format Specifiers | Specifier | Description | Example | |:-|:------|:---| | `%c` | character code to character | `a` | | `%d` | integer | `654321` | | `%Xd` | integer with thousand seperarator `X` from `,`, `.`, `''`, `_` | `654_321` | | `%E` | scientific notation | `3.141593E+00` | | `%f` | float | `4.560000` | | `%hd` | integer | `654321` | | `%hhd` | integer | `654321` | | `%lld` | integer | `654321` | | `%o` | octal | `2375761` | | `%s` | string | `asd` | | `%x` | hexadecimal | `9fbf1` | #### Formatting Types Integers: ```sql SELECT printf('%d + %d = %d', 3, 5, 3 + 5); -- 3 + 5 = 8 ``` Booleans: ```sql SELECT printf('%s != %s', true, false); -- true != false ``` Format datetime values: ```sql SELECT printf('%s', DATE '1992-01-01'); -- 1992-01-01 SELECT printf('%s', TIME '12:01:00'); -- 12:01:00 SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00 ``` Format BLOB: ```sql SELECT printf('%s', BLOB '\x00hello'); -- \x00hello ``` Pad integers with 0s: ```sql SELECT printf('%04d', 33); -- 0033 ``` Create timestamps from integers: ```sql SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); -- 12:03:16 AM ``` Convert to hexadecimal: ```sql SELECT printf('%x', 123_456_789); -- 75bcd15 ``` Convert to binary: ```sql SELECT printf('%b', 123_456_789); -- 111010110111100110100010101 ``` #### Thousand Separators Integers: ```sql SELECT printf('%,d', 123_456_789); -- 123,456,789 SELECT printf('%.d', 123_456_789); -- 123.456.789 SELECT printf('%''d', 123_456_789); -- 123'456'789 SELECT printf('%_d', 123_456_789); -- 123_456_789 ``` Float, double and decimal: ```sql SELECT printf('%,f', 123456.789); -- 123,456.789000 SELECT printf('%,.2f', 123456.789); -- 123,456.79 ```