C - Validate

Prev Next

Overview

In the third step, consider carrying out further checks and alignments. Such steps may include the following:

  • Check if all columns are existing and assert exceptions (automatic error messages) right not and not later.
  • Clean up and align header names.
  • Insert missing columns (in case you want to combine multiple tables later on).
  • Remove any inconsistencies or disturbing contents.
  • Fill in gaps, e.g. repeating contents from the rows above
  • Align contents. Example: You merge 2 country tables, one lists Swasiland and the other Eswatini. Decide for one.

Checking and Aligning Columns and Header Names


table check headers() Check if all columns are existing.
table correct headers() Check for header names matching with specified patterns and correct them. Useful if tables originate from different sources and everyone uses slightly different header names.
table insert columns() Insert additional columns.
table insert missing columns() Insert additional columns only if they are still missing.
table rearrange columns() Put the columns in the order you desire.
table check duplicates() Checks repeated entries in case redundant entries need to be removed or consolidated.



Useful Processing functions to validate the Tables


table process() Process every row in the table with the assignments specified. No loops and variables need to be specified.
table process selected rows() Same as above, but only on the rows fulfilling the condition critiera
table process cells() General-purpose function to process all table cells: Example: table process cells( [.] = trim([.]) );
table process columns() Work through the table columns.
table describe() B4P checks the contents in selected columns and provides description. E.g. you can easily identify if a particular column contains telephone numbers, simple names, dates, and not something else.
table compare() is useful to check a current and an older data and isolate all relevant differences for further processing.



Look up Contents from other Tables


table lookup() Look up for data from other tables.
table lookup ignore case() Also useful, as lookup comparisons ignore cases.
table lookup smart() In this case, the lookup table allows wildcards, commas for multiple choices, etc.
table lookup smart ignore case() Also useful, as lookup comparisons ignore cases.



Wikipedia Example (continued from step 2)

Do some validations

  • Change column headers to more meaningful names in English.
  • Remove disturbing thousad comma signs and decimal points in the numbers
  • Remove summary rows applicable to the whole world.
  • For table c2: Use the most recent population data (multiple heders exist)
  • Some rows contain messy data. Remove these rows in table c1. Aplies to Denmark and China.
  • Use a lookup table to align diverting or ambiguous country names

Simple Example


table correct headers           ( c1, { 'Einwohner*', 'Fläche*', Englischer Name },
                                     { Inhabitants,  Area,      Country,        } );
table keep columns              ( c1, { Country, Area, Inhabitants } );
table process cells in columns  ( c1, { Area, Inhabitants }, [.] = replace(str([.]) / '.',',','.') );
// Let's ignore messy data in CHINA and DENMARK (Free text on HK, Macao, Greenland, Faroer Is.)

table keep selected rows        ( c1, is numeric( [Area] ) );
table process                   ( c1, [Country] = left([Country],' oder ') ); // Remove country name in other language after ' oder'
table lookup smart ignore case  ( c1, Country, Country, country names resolved, Country Name Pattern, Country Name to use );


most recent population[]        = filter( [c2:..,0], 'Population*' ) { -1 }; // Take right most matching header name from row 0

table delete selected rows      ( c2, [Country]=World );
table correct headers           ( c2, { most recent population[], '*Continent*' }, { Population, Continent } );
table process cells in columns  ( c2, Population, [.] = str([.]) / ',' );
table keep columns              ( c2, { Country, Continent, Population } );
table delete selected rows      ( c2, [Country]='*(*)*' );             // Overseas departements with home countries in parentheses
table lookup smart ignore case  ( c2, Country, Country, country names resolved, Country Name Pattern, Country Name to use );

echo( "Table C1: ");
table list                      ( c1, briefly, 4, last col, 2 ); // List just 3 columns and first and last 4 rows

The table has been validated.
Table C1:
    0 : Country                  | Area    | Inhabitants
    1 : Abkhazia                 | 8600    | 240000     
    2 : Afghanistan              | 652230  | 41100000   
    3 : Egypt                    | 1001450 | 103500000  
    4 : Albania                  | 28748   | 2800000    
  ... :
  200 : Vietnam                  | 331210  | 99400000   
  201 : Western Sahara           | 266000  | 600000     
  202 : Central African Republic | 622984  | 5600000    
  203 : Cyprus                   | 9251    | 1300000    

Try it yourself: Open TAB_Features_Validate_and_Explore.b4p in B4P_Examples.zip. Decompress before use.