tokenize
This function breaks a string into individual token (pieces), returning them as a set of strings or other types, depending on the settings provided in the 2nd function parameter.
Indirect parameter passing is disabled
1-2
No. | Type | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 input |
string | input string This string will be tokenized | ||||||||||||||||||||||
Opt. 2 input |
set or string | Options Specify one or more options. Formulation rules (applicable to 2nd-7th function parameters):
Following options are supported
| ||||||||||||||||||||||
Opt. 3 input |
set or string | Token separator strings Specify at least 1 separator string (e.g. blank, comma, tab, slash, etc). The string may contain multiple characters. In this case, the sequence of these multiple
characters combined represent the separation, e.g. { "//", "..." } specified.
| ||||||||||||||||||||||
Opt. 4 input |
set or string | Quotation marks If one string used: considered for opening and closing. Example: "Hello World" | ||||||||||||||||||||||
Opt. 5 input |
set or string | Additional tokens Collection of token symbols to be parsed separately and suitable for categorization.
This is specifically important if these tokens follow without separators (e.g. white spaces) in-between.
You can also assign multi-character tokens, e.g. "<=", "<>", "while", "for", etc.
| ||||||||||||||||||||||
Opt. 6 input |
set or string | Block comment symbols Pairwise collection of opening and closing block comment symbols, e.g. { "/*", "*/", "<--", "-->" }.
Contents commented out will not be tokenized.
| ||||||||||||||||||||||
Opt. 7 input |
set or string | Line comment symbols Specify all comment symbols which declare the rest of the line as comment, e.g. { "//", "#!" }.
Contents commented out will not be tokenized.
|
Type | Description |
---|---|
set | Tokenized result Every token is represented as an element in the set |
echo( new line, "Basic use of tokenize. Separators are blank and new line" );
echo( tokenize( "This is a" + new line + " test" ) );
echo( new line, "Demonstrate 'include blanks' and 'trim token'" );
echo( tokenize( ",Ha, He ,,Hi,", {}, "," ) );
echo( tokenize( ",Ha, He ,,Hi,", include blanks, "," ) );
echo( tokenize( ";Ha; He ,,Hi,", trim token, {",",";"} ) );
echo( tokenize( ",Ha, He ,,Hi,", {include blanks, trim token}, "," ) );
echo( new line, "New line inside quotations allowed" );
echo( tokenize( "'Me, and"+new line+"You','and us'" , allow new line inside quotations, ",", "'" ) );
echo( new line, "Demonstrate usage of quotations" );
echo( tokenize( "<text>A gnu</text>,<text>A gnat</text>" , {}, ",", { "<text>", "</text>" } ) );
echo( tokenize( "<text>A gnu</text>,<text>A gnat</text>" , include quotations as tokens, ",", { "<text>", "</text>" } ) );
echo( new line, "Read numerals, dates, booleans" );
echo( tokenize( "1 true 1E+3 FALSE text 2020-05-07 15:30:00", { read numerals, scientific notation, read dates, read booleans } ) );
echo( new line, "Thousand and Decimal separators" );
echo( tokenize( "1,234 1.234", { read numerals, thousand separator, ".", decimal separator, "," } ) );
echo( new line, "Additional tokens" );
echo( tokenize( "for a=1to5 'do something'", {}, " ", "'", { "=", to, for } ) );
echo( new line, "Ignore comments" );
echo( tokenize( "for a=1to5 /: for a = 3 to 4 :/ 'do something'", {}, " ", "'", { "=", to, for }, { "/:", ":/" } ) );
echo(tokenize("for a=1to5 // 'do something'", {}, " ", "'", { "=", to, for }, { }, "//"));
Basic use of tokenize. Separators are blank and new line
{'This','is','a','test'}
Demonstrate 'include blanks' and 'trim token'
{'Ha',' He ','Hi'}
{'','Ha',' He ','','Hi',''}
{'Ha','He','Hi'}
{'','Ha','He','','Hi',''}
New line inside quotations allowed
{'Me, and
You','and us'}
Demonstrate usage of quotations
{'A gnu','A gnat'}
{'<text>','A gnu','</text>','<text>','A gnat','</text>'}
Read numerals, dates, booleans
{1,true,1000,false,'text','2020-05-07','15:30:00'}
Thousand and Decimal separators
{1.234,1234}
Additional tokens
{'for','a','=','1','to','5','do something'}
Ignore comments
{'for','a','=','1','to','5','do something'}
{'for','a','=','1','to','5'}