Most data obtained from database and external sources are not in the right format so you can start processing and analyzing the contents. Typical issues include, bugt are not limited to:
B4P provides rich set of functions which allows you to prepare and clean the data
table lift header row() | Identify the header row, then lift the header row to top row, with data blow moving up, too. |
table rename headers() | Rename table headers. |
table correct headers() | Correct the header names if wrong. Do nothing if OK. Useful to eliminate spelling and upper/lower case issues. |
table insert columns() | Introduce additional columns. |
table insert missing columns() | Introduce additional columns if missing, otherwise do nothing. |
table delete columns() | Delete columns not needed. |
table keep columns() | Keep specified columns and delete the other columns. |
table delete blank rows() | Remove all blank rows. |
table delete rows() | Remove 1 or more rows using a specified row number. |
table delete remaining rows() | Delete remaining rows below a specified row number. Various functions are available to search for the end of the table and obtain the row number automatically, for example with table search row(). |
table keep selected rows() | Keep selected rows matching the condition specified. |
table delete selected rows() | The opposite. |
table clean() | General replacement of exotic spaces and line breaks in the UNICODE space to the familar spaces and 'new line', and trimming options for white spaces and line breaks. |
clean num() | Cleans numbers by removing distracting symbols such as 1000-separators, currency symbols, etc. Numbers in parentheses are treated as negative numbers, e.g. (1200) changes to -1200 . |
table process selected rows() | General-purpse function call to process table rows and contents without need to specify loops and variables |
table process cells() | General-purpose function to process all table cells: Example: table process cells( [.] = trim([.]) ); |
The file downloaded from Wikipedia does indeed need some cleanup.
table clean ( c1 );
table delete selected rows ( c1, [0]='',Erde ); // Remove blank and summary rows
table process all cells ( c1, [.] = left( [.], '[' ) ); // Remove cited references
table delete rows ( c1, -1 ); // Delete last row (bottom header row - repeated)
table clean ( c2, trim spaces );
table delete selected rows ( c2, [0]='',World ); // Remove blank and summary rows
table process all cells ( c2, [.] = left( [.], '[' ) ); // Remove cited references
echo( "Table C2: ");
table list ( c2, briefly, 4, last col, 2 ); // List just 3 columns and first and last 4 rows
Table C2:
0 : Country | Population (1 July 2022) | Population (1 July 2023)
1 : India | 1,425,423,212 | 1,438,069,596
2 : China | 1,425,179,569 | 1,422,584,933
3 : United States | 341,534,046 | 343,477,335
4 : Indonesia | 278,830,529 | 281,190,067
... :
234 : Falkland Islands (United Kingdom) | 3.490 | 3.477
235 : Tokelau (New Zealand) | 2.290 | 2.397
236 : Niue (New Zealand) | 1.821 | 1.817
237 : Vatican City | 505 | 496