table clean

Prev Next

Function Names

table clean

Description

Many tables obtained from other sources (e.g. from web sites, external parties, etc.) require some initial clean-ups because the data may contain redundant space and line break symbols, or make use of exotic UNICODE space and new line characters which needs to be cleaned up. Follwing cleanup actions are supported and will be applied the entire table, including the header row.

  • Replacing exotic UNICODE space symbols by the regular space characters (UNICODE 32), see table below.
  • Replacing exotic UNICODE new line symbols by standard 'new line' (UNICODE 10 or 13+10 under Windows) symbols, see table below
  • Option: Eliminating dangling spaces at the beginning and end of the contents
  • Option: Eliminating dangling new line symbols at the begining and end of the contents
  • Option: Reduce number of consecutive spaces and line breaks to 1

Note: Character replacements (first two items) cannot be opted out in this function.

Space Symbol Code Description New Line Symbol Description
U+0020 / chr(32) Space U+000A / chr(10) Line feed (= 'new line' in B4P)
U+00A9 / chr(160) No-break space U+000C / chr(12)
U+1680 / chr(5760) Ogham space mark U+0085 / chr(133) Next line
U+2000 / chr(8192)
...
U+200A / chr(8202)
Various typographical spaces U+2028 / chr(8232) Line separator
U+202F / chr(8239) Narrow no-break space U+2029 / chr(8233) Paragraph separator
U+205F / chr(8287) Medium mathematical space
U+3000 / chr(12288) Ideographic space for CJK text

Call as: procedure

Restrictions

Indirect parameter passing is enabled

Parameter count

3-4

Parameters

No.TypeDescription
1.
input
string Name of existing table

2
input
parameter set or string Cleanup Options

Specify one option in a string or multiple options in a parameter set or softquoted string in one string. i.e. {trim spaces, trim line breaks}, 'trim spaces, trim line breaks' are the same, but "trim spaces, trim line breaks" is invalid since this last quoted string is treated as a single sting and will not match with the given options.

trim spaces Eliminate dangling spaces before the beginning and after of the contents
trim spaces before ... before the beginning only
trim spaces after ... after the end only
trim line breaks Eliminate dangling new line symbols before the beginning and after the end of the contents
trim line breaks before ... before the beginning only
trim line breaks after ... after the end only
remove spaces Remove all white spaces
remove line breaks Remove all line breaks
remove adjacent spaces Remove all adjacent spaces (e.g. 2 or more consecutive spaces reduce to 1 space)
remove adjacent line breaks Remove all adjacent line breaks
line breaks to spaces Convert all line breaks to spaces. This is done first before all other options are executed.

Examples


  table initialize    ( dirty, // Initialize table with messy line breaks and white spaces
  { { Descr,        Dangling at Begin, Dangling at End, Dangling both, Multiple Spaces },
    { Spaces,      '  Hello', 'Hello  ',      '  Hello  ', 'Hello    World'},
    { Line Breaks, '
Hello',   'Hello …
','
Hello

',  'Hello
World'} } );

  echo("Demonstrate a simple clean-up:");

  table copy table    ( dirty, clean );
  table clean         ( clean );
  table process cells ( clean, [.] = "'" + [.] + "'" ); // Put some quotation marks around to see the true extent of the contents
  table list          ( clean );

  echo("As a next step, demonstrate removing adjacent line breaks");

  table copy table    ( dirty, clean );
  table clean         ( clean, remove adjacent line breaks );
  table process cells ( clean, [.] = "'" + [.] + "'" ); // Put some quotation marks around to see the true extent of the contents
  table list          ( clean );

  echo("As a final step, demonstrate a good clean-up task:");

  table copy table    ( dirty, clean );
  table clean         ( clean, {trim spaces, line breaks to spaces, remove adjacent spaces} );
  table process cells ( clean, [.] = "'" + [.] + "'" ); // Put some quotation marks around to see the true extent of the contents
  table list          ( clean );

Output

Demonstrate a simple clean-up:
    0 : Descr         | Dangling at Begin | Dangling at End | Dangling both | Multiple Spaces
    1 : 'Spaces'      | '  Hello'         | 'Hello  '       | '  Hello  '   | 'Hello    World'
    2 : 'Line Breaks' | '                 | 'Hello          | '             | 'Hello          
      :               |                   |                 | Hello         |                 
      :               | Hello'            | '               |               | World'          
      :               |                   |                 | '             |                 

As a next step, demonstrate removing adjacent line breaks
    0 : Descr         | Dangling at Begin | Dangling at End | Dangling both | Multiple Spaces
    1 : 'Spaces'      | '  Hello'         | 'Hello  '       | '  Hello  '   | 'Hello    World'
    2 : 'Line Breaks' | '                 | 'Hello          | '             | 'Hello          
      :               | Hello'            | '               | Hello         | World'          
      :               |                   |                 | '             |                 

As a final step, demonstrate a good clean-up task:
    0 : Descr         | Dangling at Begin | Dangling at End | Dangling both | Multiple Spaces
    1 : 'Spaces'      | 'Hello'           | 'Hello'         | 'Hello'       | 'Hello World'  
    2 : 'Line Breaks' | 'Hello'           | 'Hello'         | 'Hello'       | 'Hello World'  

Try it yourself: Open LIB_Function_table_clean.b4p in B4P_Examples.zip. Decompress before use.

See also

clean