clean
Various contents imported from external sources may contain exotic white spaces (e.g. the no-break space) as well as exotic new line symbols which are outside the common ANSI character space and would just create confusion. This function converts all such spaces and line breaks int regular spaces (ANSI / UNICODE 32) ane new line symbols (ANSI / UNICODE 10, or 13+10 for contents saved in files under Windows). The table below lists the characters whcich are converted to spaces and new lines:
Space Symbol Code | Description | New Line Symbol | Description |
---|---|---|---|
U+0020 / chr(32) | Space | U+000A / chr(10) | Line feed (= 'new line' in B4P) |
U+00A9 / chr(160) | No-break space | U+000C / chr(12) | |
U+1680 / chr(5760) | Ogham space mark | U+0085 / chr(133) | Next line |
U+2000 / chr(8192) ... U+200A / chr(8202) | Various typographical spaces | U+2028 / chr(8232) | Line separator |
U+202F / chr(8239) | Narrow no-break space | U+2029 / chr(8233) | Paragraph separator |
U+205F / chr(8287) | Medium mathematical space | ||
U+3000 / chr(12288) | Ideographic space for CJK text |
Vectorization: This function supports vectorization in the 1st function parameter.
Instead of providing a single value, you can provide a set or even a nested set which contain multiple values.
The function will then process every value and its return value contains a corresponding set containing all results.
Note: Other values which are not of type 'string', e.g. numbers, boolean values and dates, will be passed through.
Indirect parameter passing is disabled
Vectorization is allowed in the 1st function parameter
1
No. | Type | Description |
---|---|---|
1 input |
valid types | input value Values to be cleaned up (if containing strings) |
Type | Description |
---|---|
string set |
cleaned values Excess white spaces removed according to function chosen. In case of vectorization, a set is returned. |
echo("Demonstrate cleaning up strange UNICODE spaces");
a[] = clean( 'ABC     DEF' );
print( a[], " - Char codes: " );
for (i[] = 0, i[] < a[]{}, i[] ++) print( code(a[]{i[]})," " );
echo(new line);
echo("Cleaning up strange UNICODE line breaks");
b[] = clean( 'ABC 
DEF' );
print( b[], " - Char codes: " );
for (i[] = 0, i[] < a[]{}, i[] ++) print( code(b[]{i[]})," " );
echo(new line);
echo("Mixed technique: put it all into a set");
c[] = {'Test',123, 'ABC     DEF', { 'X Y', 'AB'},'ABC 
DEF' };
c[] = clean( c[] );
echo(c[]);
Demonstrate cleaning up strange UNICODE spaces
ABC DEF - Char codes: 65 66 67 32 32 32 32 32 68 69 70
Cleaning up strange UNICODE line breaks
ABC
DEF - Char codes: 65 66 67 10 10 10 68 69 70 0 0
Mixed technique: put it all into a set
{'Test',123,'ABC DEF',{'X Y','A
B'},'ABC
DEF'}