clean
Various contents imported from external sources may contain exotic white spaces (e.g. the no-break space) as well as exotic new line symbols which are outside the common ANSI character space and would just create confusion. This function converts all such spaces and line breaks int regular spaces (ANSI / UNICODE 32) ane new line symbols (ANSI / UNICODE 10, or 13+10 for contents saved in files under Windows). The table below lists the characters whcich are converted to spaces and new lines:
Space Symbol Code | Description | New Line Symbol | Description |
---|---|---|---|
U+0020 / chr(32) | Space | U+000A / chr(10) | Line feed (= 'new line' in B4P) |
U+00A9 / chr(160) | No-break space | U+000C / chr(12) | |
U+1680 / chr(5760) | Ogham space mark | U+0085 / chr(133) | Next line |
U+2000 / chr(8192) ... U+200A / chr(8202) | Various typographical spaces | U+2028 / chr(8232) | Line separator |
U+202F / chr(8239) | Narrow no-break space | U+2029 / chr(8233) | Paragraph separator |
U+205F / chr(8287) | Medium mathematical space | ||
U+3000 / chr(12288) | Ideographic space for CJK text |
Note: If parameter sets are provided, then all elements wich are of type 'string' will be cleaned up.
Note: Other values which are not of type 'string', e.g. numbers, boolean values and dates, will be passed through.
Indirect parameter passing is disabled
1
No. | Type | Description |
---|---|---|
1 input |
valid types | input value Values to be cleaned up (if containing strings) |
Type | Description |
---|---|
string | cleaned values Excess white spaces removed according to function chosen |
echo("Demonstrate cleaning up strange UNICODE spaces");
a[] = clean( 'ABC     DEF' );
print( a[], " - Char codes: " );
for (i[] = 0, i[] < a[]{}, i[] ++) print( code(a[]{i[]})," " );
echo(new line);
echo("Cleaning up strange UNICODE line breaks");
b[] = clean( 'ABC 
DEF' );
print( b[], " - Char codes: " );
for (i[] = 0, i[] < a[]{}, i[] ++) print( code(b[]{i[]})," " );
echo(new line);
echo("Mixed technique: put it all into a parameter set");
c[] = {'Test',123, 'ABC     DEF', { 'X Y', 'AB'},'ABC 
DEF' };
c[] = clean( c[] );
echo(c[]);
Demonstrate cleaning up strange UNICODE spaces
ABC DEF - Char codes: 65 66 67 32 32 32 32 32 68 69 70
Cleaning up strange UNICODE line breaks
ABC
DEF - Char codes: 65 66 67 10 10 10 68 69 70 0 0
Mixed technique: put it all into a parameter set
{'Test',123,'ABC DEF',{'X Y','A
B'},'ABC
DEF'}