The Character Encoding Translator filter converts the contents of a message from one encoding to another. The translation consists of reading characters from the input stream using the specified Input Encoding and then writing the characters to the output stream using the specified Output Encoding. To address the problem of maintaining readability in converted messages, you can specify a custom mapping that is applied before the translation, so that alternative characters can be used in cases where there is no match for the input character. The new character, however, must belong to the supported output character set.

As messages are stored as raw bytes, they must be converted to a string representation for manipulation. For example, if you are receiving ADT messages in EBCDIC from an IBM mainframe and you need to send them to a laboratory system running on a UNIX® system in US ASCII, specify the following:

  • Input Character Encoding IBM EBCDIC USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia: Cp037
  • Output Character Encoding American Standard Code for Information Interchange: ASCII

When messages are received from the mainframe, they are translated from EBCDIC to ASCII.

The filter performs a check for XML when it runs. If the message is well-formed XML, The filter sets the encoding attribute correctly and changes the byte encoding; if not, it just changes have the byte encoding.

This filter does not change the XML preprocessor directive.

Configuration Properties

Property

Description

Input Character Encoding

Identifies the character set used to format the input message.

  • Current Message Encoding (default) - uses the current message encoding (if known) for the input encoding. If the current encoding is not known it attempts to auto-detect the encoding by looking at byte order marks, null characters (for UTF-16 and UTF-32), XML preprocessor directives, or finally uses the default system encoding.
  • XML Auto-Detect - auto-detects the character encoding of an XML document by looking at the preprocessor directive.

Output Character Encoding

Identifies the character set used to format the output message (the default is UTF-8).

Character Mapping File

An auxiliary mapping file that will override the global mapping in characterMapping.xml. Refer to Global Mapping File for details.

Global Mapping File

The characterMapping.xml file:

  • Is used to provide replacements for characters that are not covered by the destination encoding and is usually represented by a default character such as ?.
  • Describes the custom mappings for individual encodings and can be edited.
  • Applies globally, however, individual filters can override the mapping by specifying another mapping file in the Character Mapping File property.
  • Must be created in accordance with the example shown below, and placed in the <RhapsodyInstallDirectory>\Rhapsody\rhapsody\data\charactermap directory. The filters will detect this file automatically, if present.

    Example XML Format
    <?xml version="1.0" encoding="UTF-8" ?>
    <encodings>
      <encoding name="EUC-JP">
        <entry>
          <input>?</input>
          <output>0x00a3</output>
        </entry>
        <entry>
          <input>0xc2a3</input>
          <output>0x0040</output>
        </entry>
        <!-- more entries -->
      </encoding>
      <encoding name="MS932">
        <entry>
          <input>2385</input>
          <output>f</output>
        </entry>
      </encoding>
      <!-- more encodings -->
    </encodings>
    

The auxiliary files to override the global file must be attached to the individual filters.

You can specify the time interval to check the global mapping file for modifications by altering the CharacterMapService.modificationCheckInterval property in the \Rhapsody\rhapsody\rhapsody.properties file. The value is specified in seconds, and has a minimum value of 10 and a maximum value of 3600. The default value is 60 seconds.

The mappings for the input and output characters can be specified as one of the following:

  • Unicode code-points in hexadecimal – use 0x.....
  • Unicode code-points in decimal.
  • The character itself.
  • Duplicate encoding names are not allowed. If duplicate <encoding> elements are present, only the last element will be processed.
  • The file must be UTF-8 encoded, especially if literal characters are used, else they will not be read correctly. 
  • It is recommended you use Unicode hexadecimal values.