B - Clean

Prev Next

Overview

Data obtained from external sources may be named and structured differently, and the table may contain unnecessary contents which need to be cleaned up first, and/or semantic alignments to be made. Some typical functions are listed below:



Wikipedia Example (continued from step 1)

The file downloaded from Wikipedia does indeed need some cleanup.

  • First, the table contain two header rows where the 2nd header is the row of interest containing column header names.
  • The header names are sophisticated, e.g. "Internet ccTLD".
  • In addition, lots of cross references like [1] are included and they distract.
  • Various fields, particularly country names, contain leading spaces.

Simple Example
  // The table contains an exotic 'hairline spatium' space symbol, various fields begin with spaces,
  // and references with 1 or 2 characters in brackets, e.g. [a], [aa].  The following two statements do the cleanup.
  // Note: str(...) is needed in order to avoid error messages when numbers are read from the table.

  table process all cells        ( countries, [.] = replace all( str([.]), {'[?]', '[??]' }, '' ) );
  table clean                    ( countries, trim spaces );

  table lift header row          ( countries, Name ); // Find header row automatically
  table correct headers          ( countries, {Name, 'A-2', 'A-3'}, {Country name, 'Alpha-2 code', 'Alpha-3 code' } );
  table process                  ( countries, [Country name]= left([Country name], ' (') ); // Remove name additions

  table list( countries, briefly, 5, 1st col, 0, last col, 1 ); // List first and last 5 rows and 2 columns only
The table has been cleaned up.
Row #: Code Text _______________________________________________________
    1:   include ( Support Library );
    2:   table load                     ( countries, temp.csv); // Hide!  Step 2 begins
    3:   // The table contains an exotic 'hairline spatium' space symbol, various fields begin with spaces,
    4:   // and references with 1 or 2 characters in brackets, e.g. [a], [aa].  The following two statements do the cleanup.
    5:   // Note: str(...) is needed in order to avoid error messages when numbers are read from the table.
    6:
    7:   table process all cells        ( countries, [.] = replace all( str([.]), {'[?]', '[??]' }, '' ) );
    8:   table clean                    ( countries, trim spaces );
    9:
   10:   table lift header row          ( countries, Name ); // Find header row automatically
                                                          ^
Calling: table lift header row (countries,{'Name'}

2nd function parameter (columns to check) = {'Name'}:
No row with matching header contents have been found
Table row not found (row number wrong or no match).
Code execution will stop.  Going to interactive mode
-------------------------------------------------------------------------------
Type 'help' for help, 'docs' or 'toc' for online B4P documentation.
>>
Try it yourself: Open GUI_Features_Clean_Data.b4p in B4P_Examples.zip. Decompress before use.