D - Merge

Prev Next

Overview

In many application cases, information from multiple sources need to be merged and overlapping information consolidated based on rules provided. Data management associated with merging or rearanging mulitple organizations are typical examples. Typical functions to choose from are:

  • table merge() and related functions: Merge two tables where identification columns are defined to identify common data to be consoldated.
  • table lookup() is a suitable alternative combining data from two tables into one.
  • table integrate() works similar liketable lookup(), but applies rules to combine existing data with data looked up, e.g. adding them up.
  • table compare() is useful to check a current and an older data and isolate all relevant differences for further processing.
  • table consolidate() lets you condense the contens in the table using rules provided in order to focus on the essential information you need.



Wikipedia Example (continued from step 3)

Do some validation on this table:

  • Load a list of languages by countries from Wikipedia
  • Align country names in the country table, e.g. 'Viet Nam' to 'Vietnam'.
  • Merge the language data into the country data.

Simple Example

  // Download a list of languages from Wikipedia and do the clean-up first.

  file download overwrite        ( "https://en.wikipedia.org/wiki/List_of_official_languages_by_country_and_territory", languages.html );
  table load                     ( languages, languages.html, HTML, References );
  table process all cells        ( languages, [.] = replace all( str([.]), {'[?]', '[??]', '[citation needed]' }, '' ) );
  table clean              ( languages, trim spaces );

  table delete selected rows     ( languages, ['Country/Region']=Country ); // Headers are repeated inside the long table.
  table keep columns             ( languages, {'Country/Region', Official language} );
  table rename headers           ( languages, {'Country/Region', Official language}, { Country name, Languages } );

  // Merge the data.
  // Note Some languages come without countries (see bottom of table), and vice versa.

  table merge extend columns     ( languages, countries, Country name );
  table list( countries, briefly, 8 ); // List first and last 8 rows only
Merging has been done.
Row #: Code Text _______________________________________________________
    9:   table clean              ( languages, trim spaces );
   10:
   11:   table delete selected rows     ( languages, ['Country/Region']=Country ); // Headers are repeated inside the long table.
   12:   table keep columns             ( languages, {'Country/Region', Official language} );
   13:   table rename headers           ( languages, {'Country/Region', Official language}, { Country name, Languages } );
   14:
   15:   // Merge the data.
   16:   // Note Some languages come without countries (see bottom of table), and vice versa.
   17:
   18:   table merge extend columns     ( languages, countries, Country name );
                                                                             ^
Function call: table merge extend columns(languages,countries,{'Country name'})
Table headers: ISO 3166[1] name[5],World Factbook[6] official state name[a],Sovereignty [6][7][8],ISO 3166-1[2],ISO 3166-1[2] (1),ISO 3166-1[2] (2),ISO 3166-2[3] subdivision codes link,TLD [9]

3rd function parameter (identifier columns to match rows) = {'Country name'}:

Header name 'Country name' not found.
Specified table column name or number is invalid.
Code execution will stop.  Going to interactive mode
-------------------------------------------------------------------------------
Type 'help' for help, 'docs' or 'toc' for online B4P documentation.
>>
Try it yourself: Open GUI_Features_Merge_and_Consolidate.b4p in B4P_Examples.zip. Decompress before use.