clean

Prev Next

Function Names

clean

Description

Various contents imported from external sources may contain exotic white spaces (e.g. the no-break space) as well as exotic new line symbols which are outside the common ANSI character space and would just create confusion. This function converts all such spaces and line breaks int regular spaces (ANSI / UNICODE 32) ane new line symbols (ANSI / UNICODE 10, or 13+10 for contents saved in files under Windows). The table below lists the characters whcich are converted to spaces and new lines:

Space Symbol Code Description New Line Symbol Description
U+0020 / chr(32) Space U+000A / chr(10) Line feed (= 'new line' in B4P)
U+00A9 / chr(160) No-break space U+000C / chr(12)
U+1680 / chr(5760) Ogham space mark U+0085 / chr(133) Next line
U+2000 / chr(8192)
...
U+200A / chr(8202)
Various typographical spaces U+2028 / chr(8232) Line separator
U+202F / chr(8239) Narrow no-break space U+2029 / chr(8233) Paragraph separator
U+205F / chr(8287) Medium mathematical space
U+3000 / chr(12288) Ideographic space for CJK text

Note: If parameter sets are provided, then all elements wich are of type 'string' will be cleaned up.
Note: Other values which are not of type 'string', e.g. numbers, boolean values and dates, will be passed through.

Call as: function

Restrictions

Indirect parameter passing is disabled

Parameter count

1

Parameters

No.TypeDescription
1
input
valid types input value

Values to be cleaned up (if containing strings)

Return value

TypeDescription
string cleaned values

Excess white spaces removed according to function chosen

Examples

  echo("Demonstrate cleaning up strange UNICODE spaces");
  a[] = clean( 'ABC     DEF' );

  print( a[], "  - Char codes: " );
  for (i[] = 0, i[] < a[]{}, i[] ++) print( code(a[]{i[]})," " );
  echo(new line);

  echo("Cleaning up strange UNICODE line breaks");
  b[] = clean( 'ABC&#12;&#10;&#8232;DEF' );

  print( b[], "  - Char codes: " );
  for (i[] = 0, i[] < a[]{}, i[] ++) print( code(b[]{i[]})," " );
  echo(new line);


  echo("Mixed technique: put it all into a parameter set");
  c[] = {'Test',123, 'ABC&nbsp;&#5760;&#8287; &#12288;DEF', { 'X&nbsp;Y', 'A&#12;B'},'ABC&#12;&#10;&#8232;DEF' };
  c[] = clean( c[] );
  echo(c[]);

Output

Demonstrate cleaning up strange UNICODE spaces
ABC     DEF  - Char codes: 65 66 67 32 32 32 32 32 68 69 70

Cleaning up strange UNICODE line breaks
ABC


DEF  - Char codes: 65 66 67 10 10 10 68 69 70 0 0

Mixed technique: put it all into a parameter set
{'Test',123,'ABC     DEF',{'X Y','A
B'},'ABC


DEF'}
Try it yourself: Open LIB_Function_clean.b4p in B4P_Examples.zip. Decompress before use.

See also

table clean