Full UNICODE Support

Prev Next

Overview

B4P supports the full UNICODE character set which includes

  • The Basic Multilingual Plane (codes 0 ... 65,535 / U+0000 ... U+FFFF), as well as
  • All 16 additional extended planes ( U+1FFFF ... U+10FFFF ) which includes many emojies.

B4P treats every UNICODE character as one single character. For example, 'Café' counts 4 characters. The full character set is available to define variable names, table names, table header names, user function names, path and file names, etc. Internally, in order to conserve memory needs for large data, all text data is stored and handled in UTF-8 format, but you don't need to worry about ths.

Note that various Latin, Greek and Cyrillic characters contain which look identical, e.g. the capital letter 'A' in the Latin, Greek and Cyrillic alphabets. Even if these characters are optically identical, the actual characters are different when compared. As another example, the Greek mu μ and the micro µ symbols are different, too.

  inhabitants [ Zürich ] = 402000;
  Пётр Чайкoвский [ famous concert ] = Nutcracker; // Piotr Tschaikowski
  echo( inhabitants [ Zürich ] );
  echo( Пётр Чайкoвский [ famous concert ] );
402000
Nutcracker
Try it yourself: Open LAN_Features_Full_UNICODE_Support.b4p in B4P_Examples.zip. Decompress before use.

Loading and Saving Files

B4P will automatically detect the character format of input files, e.g. plain ANSI, WIN-1252, Unicode UTF-8, UTF16 little and big endian.