See http://www.unicode.org/reports/tr44/#General_Category_Values for categories and corresponding values upper: 1022 chars in old ctype and 1610 chars in new ctype Missing 31 characters of old ctype in new ctype. Titlecase (Lt) '0x1c5', '0x1c8', '0x1cb', '0x1f2', '0x1f88', '0x1f89', '0x1f8a', '0x1f8b', '0x1f8c', '0x1f8d', '0x1f8e', '0x1f8f', '0x1f98', '0x1f99', '0x1f9a', '0x1f9b', '0x1f9c', '0x1f9d', '0x1f9e', '0x1f9f', '0x1fa8', '0x1fa9', '0x1faa', '0x1fab', '0x1fac', '0x1fad', '0x1fae', '0x1faf', '0x1fbc', '0x1fcc', '0x1ffc'] These characters were wrongly mapped in earlier i18n file. All these characters have General property 'Lt' [1] which is Titlecase. So these characters should be not in Upper class. **************************************************** lower: 1046 chars in old ctype and 2030 chars in new ctype Missing 5 characters of old ctype in new ctype Titlecase (Lt) '0x1c5', '0x1c8', '0x1cb', '0x1f2', Letter Number (Nl) '0x2188'] These 5 characters were wrongly mapped in earlier i18n file. They have General property Titlecase 'Lt' & Letter Number 'Nl', so these characters should be not in Lower class. **************************************************** combining3: 794 chars in old ctype and 1239 chars in new ctype Missing 5 characters of old ctype in new ctype 7.0: 103F;MYANMAR LETTER GREAT SA;Lo;0;L;;;;;N;;;;; (L0= Other_Letter , syllable ) 7.0: 06DE;ARABIC START OF RUB EL HIZB;So;0;ON;;;;;N;;;;; (S0=Other symbol) 7.0 0BD0;TAMIL OM;Lo;0;L;;;;;N;;;;; (L0= Other_Letter , syllable ) 7.0 0375;GREEK LOWER NUMERAL SIGN;Sk;0;ON;;;;;N;;;;; (Sk Modifier_Symbol a non-letterlike modifier symbol) As per comment in gen-unicode-ctype.c /* Up to Unicode 3.0.1 we took the Combining property from the PropList.txt file. In 3.0.1 it was identical to the union of the general categories "Mn", "Mc", "Me". In Unicode 3.1 this property has been dropped from the PropList.txt file, so we take the latter definition. */ Above characters does not satisfy the criteria. These characters should be not in CTYPE combining class. **************************************************** Space: 22 chars in old ctype and 21 chars in new ctype Missing 1 characters of old ctype in new ctype Format control character ['0x180e'] U180E has General property "Cf" which is "Format". So should not be in SPACE group. **************************************************** graph: 238046 chars in old ctype and 250408 chars in new ctype Missing 0 characters of old ctype in new ctype [] **************************************************** blank: 16 chars in old ctype and 15 chars in new ctype Missing 1 characters of old ctype in new ctype Format control character ['0x180e'] U180E has General property "Cf" which is Format. So should not be in SPACE group. **************************************************** Combining: 1189 chars in old ctype and 1830 chars in new ctype Missing 3 characters of old ctype in new ctype ['0x103f', '0x6de', '0xbd0'] 7.0: 103F;MYANMAR LETTER GREAT SA;Lo;0;L;;;;;N;;;;; (L0= Other_Letter , syllable ) 7.0: 06DE;ARABIC START OF RUB EL HIZB;So;0;ON;;;;;N;;;;; (S0=Other symbol) 7.0 0BD0;TAMIL OM;Lo;0;L;;;;;N;;;;; (L0= Other_Letter , syllable ) As per comment in gen-unicode-ctype.c /* Up to Unicode 3.0.1 we took the Combining property from the PropList.txt file. In 3.0.1 it was identical to the union of the general categories "Mn", "Mc", "Me". In Unicode 3.1 this property has been dropped from the PropList.txt file, so we take the latter definition. */ Above characters does not satisfy the criteria. These characters should be not in CTYPE combining class. **************************************************** digit: 10 chars in old ctype and 10 chars in new ctype Missing 0 characters of old ctype in new ctype [] **************************************************** punct: 143506 chars in old ctype and 146679 chars in new ctype Missing 16 characters of old ctype in new ctype LO Other letter '0x103f', Nd Decimal number '0x1090', '0x1091', '0x1092', '0x1093', '0x1094', '0x1095', '0x1096', '0x1097', '0x1098', '0x1099', Lm Modifier letter '0x2e2f', '0x2ec', '0x374', '0xa60c', '0xa67f'] As per http://www.unicode.org/reports/tr44/#General_Category_Values "Punctuation Pc | Pd | Ps | Pe | Pi | Pf | Po" Lm, Nd and LO should not be in Punct class. **************************************************** printc: 238198 chars in old ctype and 250422 chars in new ctype Missing 0 characters of old ctype in new ctype [] **************************************************** xdigit: 22 chars in old ctype and 22 chars in new ctype Missing 0 characters of old ctype in new ctype [] **************************************************** alpha: 94325 chars in old ctype and 104077 chars in new ctype Missing 447 characters of old ctype in new ctype Missing characters are belong to following classes. Number: (Nd) '0x1040', '0x1041', '0x1042', '0x1043', '0x1044', '0x1045', '0x1046', '0x1047', '0x1048', '0x1049', '0x17e0', '0x17e1', '0x17e2', '0x17e3', '0x17e4', '0x17e5', '0x17e6', '0x17e7', '0x17e8', '0x17e9', '0x1810', '0x1811', '0x1812', '0x1813', '0x1814', '0x1815', '0x1816', '0x1817', '0x1818', '0x1819', '0x1946', '0x1947', '0x1948', '0x1949', '0x194a', '0x194b', '0x194c', '0x194d', '0x194e', '0x194f', '0x19d0', '0x19d1', '0x19d2', '0x19d3', '0x19d4', '0x19d5', '0x19d6', '0x19d7', '0x19d8', '0x19d9', '0x1b50', '0x1b51', '0x1b52', '0x1b53', '0x1b54', '0x1b55', '0x1b56', '0x1b57', '0x1b58', '0x1b59', '0x1bb0', '0x1bb1', '0x1bb2', '0x1bb3', '0x1bb4', '0x1bb5', '0x1bb6', '0x1bb7', '0x1bb8', '0x1bb9', '0x1c40', '0x1c41', '0x1c42', '0x1c43', '0x1c44', '0x1c45', '0x1c46', '0x1c47', '0x1c48', '0x1c49', '0x1c50', '0x1c51', '0x1c52', '0x1c53', '0x1c54', '0x1c55', '0x1c56', '0x1c57', '0x1c58', '0x1c59', '0x1d7ce', '0x1d7cf', '0x1d7d0', '0x1d7d1', '0x1d7d2', '0x1d7d3', '0x1d7d4', '0x1d7d5', '0x1d7d6', '0x1d7d7', '0x1d7d8', '0x1d7d9', '0x1d7da', '0x1d7db', '0x1d7dc', '0x1d7dd', '0x1d7de', '0x1d7df', '0x1d7e0', '0x1d7e1', '0x1d7e2', '0x1d7e3', '0x1d7e4', '0x1d7e5', '0x1d7e6', '0x1d7e7', '0x1d7e8', '0x1d7e9', '0x1d7ea', '0x1d7eb', '0x1d7ec', '0x1d7ed', '0x1d7ee', '0x1d7ef', '0x1d7f0', '0x1d7f1', '0x1d7f2', '0x1d7f3', '0x1d7f4', '0x1d7f5', '0x1d7f6', '0x1d7f7', '0x1d7f8', '0x1d7f9', '0x1d7fa', '0x1d7fb', '0x1d7fc', '0x1d7fd', '0x1d7fe', '0x1d7ff', '0x660', '0x661', '0x662', '0x663', '0x664', '0x665', '0x666', '0x667', '0x668', '0x669', '0x6f0', '0x6f1', '0x6f2', '0x6f3', '0x6f4', '0x6f5', '0x6f6', '0x6f7', '0x6f8', '0x6f9', '0x7c0', '0x7c1', '0x7c2', '0x7c3', '0x7c4', '0x7c5', '0x7c6', '0x7c7', '0x7c8', '0x7c9', '0x966', '0x967', '0x968', '0x969', '0x96a', '0x96b', '0x96c', '0x96d', '0x96e', '0x96f', '0x9e6', '0x9e7', '0x9e8', '0x9e9', '0x9ea', '0x9eb', '0x9ec', '0x9ed', '0x9ee', '0x9ef', '0x9f4', '0x9f5', '0x9f6', '0x9f7', '0x9f8', '0x9f9', '0xa620', '0xa621', '0xa622', '0xa623', '0xa624', '0xa625', '0xa626', '0xa627', '0xa628', '0xa629', '0xa66', '0xa67', '0xa68', '0xa69', '0xa6a', '0xa6b', '0xa6c', '0xa6d', '0xa6e', '0xa6f', '0xa789', '0xa78a', '0xa900', '0xa901', '0xa902', '0xa903', '0xa904', '0xa905', '0xa906', '0xa907', '0xa908', '0xa909', '0xaa50', '0xaa51', '0xaa52', '0xaa53', '0xaa54', '0xaa55', '0xaa56', '0xaa57', '0xaa58', '0xaa59', '0xae6', '0xae7', '0xae8', '0xae9', '0xaea', '0xaeb', '0xaec', '0xaed', '0xaee', '0xaef', '0xb66', '0xb67', '0xb68', '0xb69', '0xb6a', '0xb6b', '0xb6c', '0xb6d', '0xb6e', '0xb6f', '0xb70', '0xbe6', '0xbe7', '0xbe8', '0xbe9', '0xbea', '0xbeb', '0xbec', '0xbed', '0xbee', '0xbef', '0xbf0', '0xbf1', '0xbf2', '0xbf3', '0xc66', '0xc67', '0xc68', '0xc69', '0xc6a', '0xc6b', '0xc6c', '0xc6d', '0xc6e', '0xc6f', '0xc78', '0xc79', '0xc7a', '0xc7b', '0xc7c', '0xc7d', '0xc7e', '0xc7f', '0xce6', '0xce7', '0xce8', '0xce9', '0xcea', '0xceb', '0xcec', '0xced', '0xcee', '0xcef', '0xd66', '0xd67', '0xd68', '0xd69', '0xd6a', '0xd6b', '0xd6c', '0xd6d', '0xd6e', '0xd6f', '0xd70', '0xd71', '0xd72', '0xd73', '0xd74', '0xd75', '0xd79', '0xe50', '0xe51', '0xe52', '0xe53', '0xe54', '0xe55', '0xe56', '0xe57', '0xe58', '0xe59', '0xed0', '0xed1', '0xed2', '0xed3', '0xed4', '0xed5', '0xed6', '0xed7', '0xed8', '0xed9', '0xf20', '0xf21', '0xf22', '0xf23', '0xf24', '0xf25', '0xf26', '0xf27', '0xf28', '0xf29', '0xff10', '0xff11', '0xff12', '0xff13', '0xff14', '0xff15', '0xff16', '0xff17', '0xff18', '0xff19' '0x104a0', '0x104a1', '0x104a2', '0x104a3', '0x104a4', '0x104a5', '0x104a6', '0x104a7', '0x104a8', '0x104a9', Punct(Po) '0xdf4', Symbol(So) '0x2129', '0x249c', '0x249d', '0x249e', '0x249f', '0x24a0', '0x24a1', '0x24a2', '0x24a3', '0x24a4', '0x24a5', '0x24a6', '0x24a7', '0x24a8', '0x24a9', '0x24aa', '0x24ab', '0x24ac', '0x24ad', '0x24ae', '0x24af', '0x24b0', '0x24b1', '0x24b2', '0x24b3', '0x24b4', '0x24b5', '0x9f2', '0x9f3', '0xaf1', '0xbf4', '0xbf5', '0xbf6', '0xbf7', '0xbf8', '0xbf9', '0xbfa', Nonspacing_Mark (Mn) (Nukta and Virama ) '0x93c', '0x94d', '0x951', '0x952', '0x953', '0x954', '0x9bc', '0x9cd', '0x9fa', '0xa3c', '0xa4d', '0xa92b', '0xa92c', '0xa92d', '0xabc', '0xacd', '0xb3c', '0xb4d', '0xbcd', '0xc4d', '0xcbc', '0xccd', '0xd4d', '0xdca', '0xe47', '0xe48', '0xe49', '0xe4a', '0xe4b', '0xe4c', '0xe4e', gen-unicode-ctype.c was processing alpha rules manually and due to this these were wrongly mapped. Unicode has given specific file DerivedCoreProperties.txt directly providing characters under Alphabetic property. Rule used in DerivedCoreProperty.txt for Alphabetic is # Derived Property: Alphabetic # Generated from: Uppercase + Lowercase + Lt + Lm + Lo + Nl + Other_Alphabetic" **************************************************** cntrl: 67 chars in old ctype and 67 chars in new ctype Missing 0 characters of old ctype in new ctype [] **************************************************** Processing for TOUPPER pair group Completed processing for TOUPPER pair group Processing for TOLOWER pair group Completed processing for TOLOWER pair group Processing for TOTITLE pair group Completed processing for TOTITLE pair group 1. http://www.unicode.org/reports/tr44/#General_Category_Values