table clean
Many tables obtained from other sources (e.g. from web sites, external parties, etc.) require some initial clean-ups because the data may contain redundant space and line break symbols, or make use of exotic UNICODE space and new line characters which needs to be cleaned up. Follwing cleanup actions are supported and will be applied the entire table, including the header row.
Note: Character replacements (first two items) cannot be opted out in this function.
Space Symbol Code | Description | New Line Symbol | Description |
---|---|---|---|
U+0020 / chr(32) | Space | U+000A / chr(10) | Line feed (= 'new line' in B4P) |
U+00A9 / chr(160) | No-break space | U+000C / chr(12) | |
U+1680 / chr(5760) | Ogham space mark | U+0085 / chr(133) | Next line |
U+2000 / chr(8192) ... U+200A / chr(8202) | Various typographical spaces | U+2028 / chr(8232) | Line separator |
U+202F / chr(8239) | Narrow no-break space | U+2029 / chr(8233) | Paragraph separator |
U+205F / chr(8287) | Medium mathematical space | ||
U+3000 / chr(12288) | Ideographic space for CJK text |
Indirect parameter passing is enabled
3-4
No. | Type | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. input |
string | Name of existing table | ||||||||||||||||||||||
2 input |
set or string | Cleanup Options Specify one option in a string or multiple options in a set or softquoted string in one string.
i.e. {trim spaces, trim line breaks}, 'trim spaces, trim line breaks' are the same, but "trim spaces, trim line breaks"
is invalid since this last quoted string is treated as a single sting and will not match with the given options.
|
table initialize ( dirty, // Initialize table with messy line breaks and white spaces
{ { Descr, Dangling at Begin, Dangling at End, Dangling both, Multiple Spaces },
{ Spaces, ' Hello', 'Hello  ', '  Hello  ', 'Hello    World'},
{ Line Breaks, ' Hello', 'Hello …
',' Hello
 ', 'Hello World'} } );
echo("Demonstrate a simple clean-up:");
table copy table ( dirty, clean );
table clean ( clean );
table process cells ( clean, [.] = "'" + [.] + "'" ); // Put some quotation marks around to see the true extent of the contents
table list ( clean );
echo("As a next step, demonstrate removing adjacent line breaks");
table copy table ( dirty, clean );
table clean ( clean, remove adjacent line breaks );
table process cells ( clean, [.] = "'" + [.] + "'" ); // Put some quotation marks around to see the true extent of the contents
table list ( clean );
echo("As a final step, demonstrate a good clean-up task:");
table copy table ( dirty, clean );
table clean ( clean, {trim spaces, line breaks to spaces, remove adjacent spaces} );
table process cells ( clean, [.] = "'" + [.] + "'" ); // Put some quotation marks around to see the true extent of the contents
table list ( clean );
Demonstrate a simple clean-up:
0 : Descr | Dangling at Begin | Dangling at End | Dangling both | Multiple Spaces
1 : 'Spaces' | ' Hello' | 'Hello ' | ' Hello ' | 'Hello World'
2 : 'Line Breaks' | ' | 'Hello | ' | 'Hello
: | | | Hello |
: | Hello' | ' | | World'
: | | | ' |
As a next step, demonstrate removing adjacent line breaks
0 : Descr | Dangling at Begin | Dangling at End | Dangling both | Multiple Spaces
1 : 'Spaces' | ' Hello' | 'Hello ' | ' Hello ' | 'Hello World'
2 : 'Line Breaks' | ' | 'Hello | ' | 'Hello
: | Hello' | ' | Hello | World'
: | | | ' |
As a final step, demonstrate a good clean-up task:
0 : Descr | Dangling at Begin | Dangling at End | Dangling both | Multiple Spaces
1 : 'Spaces' | 'Hello' | 'Hello' | 'Hello' | 'Hello World'
2 : 'Line Breaks' | 'Hello' | 'Hello' | 'Hello' | 'Hello World'