clean

Prev Next

Function Names

clean

Description

Various contents imported from external sources may contain exotic white spaces (e.g. the no-break space) as well as exotic new line symbols which are outside the common ANSI character space and would just create confusion. This function converts all such spaces and line breaks int regular spaces (ANSI / UNICODE 32) ane new line symbols (ANSI / UNICODE 10, or 13+10 for contents saved in files under Windows). The table below lists the characters whcich are converted to spaces and new lines:

Space Symbol Code Description New Line Symbol Description
U+0020 / chr(32) Space U+000A / chr(10) Line feed (= 'new line' in B4P)
U+00A9 / chr(160) No-break space U+000C / chr(12)
U+1680 / chr(5760) Ogham space mark U+0085 / chr(133) Next line
U+2000 / chr(8192)
...
U+200A / chr(8202)
Various typographical spaces U+2028 / chr(8232) Line separator
U+202F / chr(8239) Narrow no-break space U+2029 / chr(8233) Paragraph separator
U+205F / chr(8287) Medium mathematical space
U+3000 / chr(12288) Ideographic space for CJK text



Vectorization: This function supports vectorization in the 1st function parameter. Instead of providing a single value, you can provide a set or even a nested set which contain multiple values. The function will then process every value and its return value contains a corresponding set containing all results.

Note: Other values which are not of type 'string', e.g. numbers, boolean values and dates, will be passed through.

Call as: function

Restrictions

Indirect parameter passing is disabled
Vectorization is allowed in the 1st function parameter

Parameter count

1

Parameters

No.TypeDescription
1
input
valid types input value

Values to be cleaned up (if containing strings)

Return value

TypeDescription
string
set
cleaned values

Excess white spaces removed according to function chosen. In case of vectorization, a set is returned.

Examples

       echo("Demonstrate cleaning up strange UNICODE spaces");
       a[] = clean( 'ABC     DEF' );

       print( a[], "  - Char codes: " );
       for (i[] = 0, i[] < a[]{}, i[] ++) print( code(a[]{i[]})," " );
       echo(new line);

       echo("Cleaning up strange UNICODE line breaks");
       b[] = clean( 'ABC&#12;&#10;&#8232;DEF' );

       print( b[], "  - Char codes: " );
       for (i[] = 0, i[] < a[]{}, i[] ++) print( code(b[]{i[]})," " );
       echo(new line);


       echo("Mixed technique: put it all into a set");
       c[] = {'Test',123, 'ABC&nbsp;&#5760;&#8287; &#12288;DEF', { 'X&nbsp;Y', 'A&#12;B'},'ABC&#12;&#10;&#8232;DEF' };
       c[] = clean( c[] );
       echo(c[]);

Output

Demonstrate cleaning up strange UNICODE spaces
ABC     DEF  - Char codes: 65 66 67 32 32 32 32 32 68 69 70

Cleaning up strange UNICODE line breaks
ABC


DEF  - Char codes: 65 66 67 10 10 10 68 69 70 0 0

Mixed technique: put it all into a set
{'Test',123,'ABC     DEF',{'X Y','A
B'},'ABC


DEF'}
Try it yourself: Open LIB_Function_clean.b4p in B4P_Examples.zip. Decompress before use.

See also

table clean