# String Functions ||| |---|--- | **JEP** | 14 | **Author** | Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson | **Created**| 13-October-2022 | **SemVer** | MINOR | **Status**| accepted ## Abstract This JEP introduces a core set of useful string manipulation functions. Those functions are modeled from functions found in popular programming languages such as JavaScript and Python. ## Specification Some string manipulation functions bring the new concept of _optional arguments_ to JMESPath functions. The specification paragraph on function evaluation must thus be changed accordingly – highlighted in **bold** in the text below: _Functions can ~~either~~ have a specific arity, **a range of valid – minimum and maximum – number of arguments** or be variadic with a minimum number of arguments. If a function-expression is encountered where the arity does not match or the minimum number of arguments for a variadic function is not provided, then implementations must indicate to the caller that an invalid-arity error occurred. How and when this error is raised is implementation specific._ Some functions accept number arguments which are further constrained to integers or even non-negative integers. This JEP specifies a new error type `invalid-value` by updating the paragraph on type constraints from the specification like so: _Each function signature declares the types of its input parameters. If any type constraints are not met, implementations must indicate that an `invalid-type` error occurred. **If a function parameter accepts values constrained to a specific subset of a type and those constraints are not met, implementations must report that an `invalid-value` error occurred. How and when those errors are raised is implementation specific.**_ ### find_first ``` int find_first(string $subject, string $sub[, int $start[, int $end]]) ``` Given the `$subject` string, `find_first()` returns the zero-based index of the first occurrence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_first()` returns `null`. The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found. - If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string). - If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string). If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised. Contrary to similar functions found in most popular programming languages, the `find_first()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves. ### Examples | Given | Expression | Result |---|---|--- | `"subject string"` | `` find_first(@, 'string') `` | `8` | `"subject string"` | `` find_first(@, 'string', `0`) `` | `8` | `"subject string"` | `` find_first(@, 'string', `0`, `14`) `` | `8` | `"subject string"` | `` find_first(@, 'string', `-99`, `100`) `` | `8` | `"subject string"` | `` find_first(@, 'string', `-6`) `` | `8` | `"subject string"` | `` find_first(@, 'string', `0`, `13`) `` | `null` | `"subject string"` | `` find_first(@, 'string', `8`) `` | `8` | `"subject string"` | `` find_first(@, 'string', `8`, `11`) `` | `null` | `"subject string"` | `` find_first(@, 'string', `9`) `` | `null` | `"subject string"` | `` find_first(@, 's') `` | `0` | `"subject string"` | `` find_first(@, 's', `1`) `` | `8` | `"subject string"` | `` find_first(@, '') `` | `null` ### find_last ``` int find_last(string $subject, string $sub[, int $start[, int $end]]) ``` Given the `$subject` string, `find_last()` returns the zero-based index of the last occurrence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_last()` returns `null`. The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found. - If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string). - If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string). If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised. Contrary to similar functions found in most popular programming languages, the `find_last()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves. ### Examples | Given | Expression | Result |---|---|--- | `"subject string"` | `` find_last(@, 'string') `` | `8` | `"subject string"` | `` find_last(@, 'string', `8`) `` | `8` | `"subject string"` | `` find_last(@, 'string', `8`, `9`) `` | `null` | `"subject string"` | `` find_last(@, 'string', `9`) `` | `null` | `"subject string"` | `` find_last(@, 's') `` | `8` | `"subject string"` | `` find_last(@, 's', `1`) `` | `8` | `"subject string"` | `` find_last(@, 's', `0`, `7`) `` | `0` | `"subject string"` | `` find_last(@, '') `` | `null` ### lower ``` string lower(string $subject) ``` Returns the lowercase `$subject` string using Unicode default casing conversion specification. ### Examples | Given | Expression | Result |---|---|--- | `"STRING"` | `` lower(@) `` | `"string"` ### pad_left ``` string pad_left(string $subject, number $width[, string $pad]) ``` Given the `$subject` string, `pad_left()` adds characters to the beginning and returns a string of length at least `$width`. The `$pad` optional string parameter specifies the padding character. If omitted, it defaults to an ASCII space (U+0020). If present, it MUST have length 1, otherwise an error MUST be raised. If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified. If `$width` is not an integer or is negative, an error MUST be raised. ### Examples | Given | Expression | Result |---|---|--- | `"string"` | `` pad_left(@, `0`) `` | `"string"` | `"string"` | `` pad_left(@, `5`) `` | `"string"` | `"string"` | `` pad_left(@, `10`) `` | `"    string"` | `"string"` | `` pad_left(@, `10`, '-') `` | `"----string"` ### pad_right ``` string pad_right(string $subject, number $width[, string $pad]) ``` Given the `$subject` string, `pad_right()` adds characters to the end and returns a string of length at least `$width`. The `$pad` optional string parameter specifies the padding character. If omitted, it defaults to an ASCII space (U+0020). If present, it MUST have length 1, otherwise an error MUST be raised. If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified. If `$width` is not an integer or is negative, an error MUST be raised. ### Examples | Given | Expression | Result |---|---|--- | `"string"` | `` pad_right(@, `0`) `` | `"string"` | `"string"` | `` pad_right(@, `5`) `` | `"string"` | `"string"` | `` pad_right(@, `10`) `` | `"string    "` | `"string"` | `` pad_right(@, `10`, '-') `` | `"string----"` ### replace ``` string replace(string $subject, string $old, string $new[, number $count]) ``` Given the `$subject` string, `replace()` replaces occurrences of the `$old` substring with the `$new` substring. The `$count` optional integer specifies how many occurrences of the `$old` substring in `$subject` are replaced. If this parameter is omitted, all occurrences are replaced. If `$count` is not an integer or is negative, an error MUST be raised. The `replace()` function has no effect if `$count` is `0`. ### Examples | Given | Expression | Result |---|---|--- | `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `0`) `` | `"aabaaabaaaab"` | `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `1`) `` | `"-baaabaaaab"` | `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `2`) `` | `"-b-abaaaab"` | `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `3`) `` | `"-b-ab-aab"` | `"aabaaabaaaab"` | `` replace(@, 'aa', '-') `` | `"-b-ab--b"` ### split ``` array[string] split(string $subject, string $search[, number $count]) ``` Given the `$subject` string, `split()` breaks on occurrences of the string `$search` and returns an array. The `split()` function returns an array containing each partial string between occurrences of `$search`. If `$subject` contains no occurrences of the `$search` string, an array containing just the original `$subject` string will be returned. If the `$search` argument is an empty string, `split()` breaks on every character and returns an array containing each character from the `$subject` string. Thus, if `$subject` is _also_ an empty string, `split()` returns an empty array. The `$count` optional integer specifies the maximum number of split points within the `$search` string. If this parameter is omitted, all occurrences are split. If `$count` is not an integer or is negative, an error MUST be raised. If `$count` is equal to `0`, `split()` returns an array containing a single element, the `$subject` string. Otherwise, the `split()` function breaks on occurrences of the `$search` string up to `$count` times. The last string in the resulting array containing the remaining contents of `$subject` unmodified. **Note**: The `split()` function was [originally designed by Chris Armstrong](https://github.com/GorillaStack/jmespath.site/blob/master/docs/proposals/string-manipulation.rst). However, its behavior has been slightly altered for consistency reasons. ### Examples | Expression | Result |---|--- | `split('', '')` | `[]` | `split('all chars', '')` | `[ "a", "l", "l", " ", "c", "h", "a", "r", "s" ]` | `split('/', '/')` | `[ "", "" ]` | |`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|') `` | `[ "average", "min", "max", "mean", "median" ]` |`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `3`) `` | `[ "average", "min", "max", "mean\|-\|median" ]` |`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `2`) `` | `[ "average", "min", "max\|-\|mean\|-\|median" ]` |`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `1`) `` | `[ "average", "min\|-\|max\|-\|mean\|-\|median" ]` |`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `0`) `` | `[ "average\|-\|min\|-\|max\|-\|mean\|-\|median" ]` | `split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '-')` | `[ "average\|", "\|min\|", "\|max\|", "\|mean\|", "\|median" ]` ## Specification ### trim ``` string trim(string $subject[, string $chars]) ``` Given the `$subject` string, `trim()` removes the leading and trailing characters found in `$chars`. The `$chars` optional string parameter represents a set of characters to be removed. If this parameter is not specified, or is an empty string, whitespace characters are removed from the `$subject` string. Whitespaces are defined by the Unicode standard as codepoints having the `White_Space` property set to `Yes`. ### Examples | Given | Expression | Result |---|---|--- | `" subject string "` | `` trim(@) `` | `"subject string"` | `" subject string "` | `` trim(@, '') `` | `"subject string"` | `" subject string "` | `` trim(@, ' ') `` | `"subject string"` | `" subject string "` | `` trim(@, 's') `` | `" subject string "` | `" subject string "` | `` trim(@, 'su') `` | `" subject string "` | `" subject string "` | `` trim(@, 'su ') `` | `"bject string"` | `" subject string "` | `` trim(@, 'gsu ') `` | `"bject strin"` ### trim_left ``` string trim_left(string $subject[, string $chars]) ``` Given the `$subject` string, `trim_left()` removes the leading characters found in `$chars`. Like for the `trim()` function, the `$chars` optional string parameter represents a set of characters to be removed. `trim_left()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string. ### Examples | Given | Expression | Result |---|---|--- | `" subject string "` | `` trim_left(@) `` | `"subject string "` | `" subject string "` | `` trim_left(@, 's') `` | `" subject string "` | `" subject string "` | `` trim_left(@, 'su') `` | `" subject string "` | `" subject string "` | `` trim_left(@, 'su ') `` | `"bject string "` | `" subject string "` | `` trim_left(@, 'gsu ') `` | `"bject string "` ### trim_right ``` string trim_right(string $subject[, string $chars]) ``` Given the `$subject` string, `trim_right()` removes the trailing characters found in `$chars`. Like for the `trim()` and `trim_left()` functions, the `$chars` optional string parameter represents a set of characters to be removed. `trim_right()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string. ### Examples | Given | Expression | Result |---|---|--- | `" subject string "` | `` trim_right(@) `` | `" subject string"` | `" subject string "` | `` trim_right(@, 's') `` | `" subject string "` | `" subject string "` | `` trim_right(@, 'su') `` | `" subject string "` | `" subject string "` | `` trim_right(@, 'su ') `` | `" subject string"` | `" subject string "` | `` trim_right(@, 'gsu ') `` | `" subject strin"` ### upper ``` string upper(string $subject) ``` Returns the uppercase `$subject` string using Unicode default casing conversion specification. | Given | Expression | Result |---|---|--- | `"string"` | `` upper(@) `` | `"STRING"` ## Compliance A new `string_functions.json` file will be added to the compliance tests. The test suite will introduce the following new error type: - invalid-value This error type would be raised by `split()` for instance, if its `$count` parameter is negative or not an integer.