2024-11-05 Theppitak Karoonboonyanan Remove standardized terms from tdict-science. * data/tdict-science.txt, data/tdict-spell.txt: - Remove "กาแล็กโทส" and "แล็กโทส", which are already in tdict-std. - Move spelling variations to tdict-spell. 2024-11-05 Theppitak Karoonboonyanan Add entries to std, std-compound. * data/tdict-std.txt, data/tdict-std-compound.txt: - Add some new entries from the web. 2024-10-30 Arthit Suriyawongkul Add data README explaining word lists. * +data/README: Added to describe the purpose of each word list. 2024-08-22 Theppitak Karoonboonyanan Update std. spelling ทุกฏ -> ทุกกฏ in dictionary. According to RI dict 2542 -> 2554 update. * tdict-std.txt: - Add new spelling 'ทุกกฏ'. * tdict-std.txt, tdict-spell.txt: - Move old spelling 'ทุกฏ' to tdict-spell. 2024-08-22 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: - Add words. * data/tdict-common.txt: - Correct collation order of 'บิสซิเนส'. 2023-12-06 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-currency.txt: * data/tdict-district.txt: * data/tdict-history.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: * data/tdict-std.txt: - Add words. 2023-12-06 Montree Phromthong Add country "คอซอวอ". * data/tdict-country.txt: - Add "คอซอวอ". 2023-12-06 Arthit Suriyawongkul Update README. * README: Add note on first release date. 2022-11-18 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-std-compound.txt: - Remove rare compound "สู่สม", which caused weird word segmentation "สู่สม|ดุล". 2022-11-18 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-geo.txt: * data/tdict-history.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-slang.txt: - Add words. 2022-06-25 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-std.txt: - Add some entries from on-line Royal Academy dictionary, as found while working on thai-synonym. 2021-12-30 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-std.txt: - Update some entries from on-line Royal Academy dictionary, as found while working on TeX hyphenation patterns. 2021-12-21 Theppitak Karoonboonyanan * NEWS: === Version 0.1.29 === 2021-12-21 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-collection.txt: * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-slang.txt: - Add words. 2021-09-18 Theppitak Karoonboonyanan Handle possible malloc failures. * src/thbrk/brk-maximal.c (brk_maximal_do_impl, brk_recover_try): - Handle cases where brk_pool_node_new() possibly returns NULL. In both places, fail gracefully by behaving as if the dict state were single there. Partially addresses Issue #15. 2021-09-18 Theppitak Karoonboonyanan Fix typo in TIS-620 character name. * include/thai/tis.h: - Fix typo TIS_[S]YMBOL_BAHT, with backward compatibility macro. 2021-08-31 Theppitak Karoonboonyanan Update Doxyfile for doxygen 1.9.1. * doc/Doxyfile.in: - Apply 'doxygen -u'. - Fix invalid PAPER_TYPE 'a4wide' to 'a4'. * configure.ac: - Bump doxygen required version. 2021-08-31 Theppitak Karoonboonyanan Apply 'autoupdate' for autoconf 2.71 * configure.ac: - Quote m4 strings in AC_INIT() parameters. - Replace obsolete AC_PROG_LIBTOOL with LT_INIT. With this, drop AC_LIBTOOL_WIN32_DLL, as we haven't really declared dllexport anywhere yet. - Replace obsolete AC_LIBTOOL_LINKER_OPTION() with _LT_LINKER_OPTION(). Also quote an m4 string. - Replace obsolete AC_HELP_STRING() with AS_HELP_STRING(). - Replace parameterized AC_OUTPUT() with AC_CONFIG_FILES() and parameter-less AC_OUTPUT. - Update AC_PREREQ() from 2.59 to 2.71. 2021-08-19 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt, data/tdict-spell.txt: - Move "องคชาติ" from common to spell. 2021-08-19 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-geo.txt: * data/tdict-history.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-slang.txt: * data/tdict-spell.txt: - Add words. 2021-03-22 Theppitak Karoonboonyanan Prohibit break between Thai/Alpha and Percent LB24 of UAX#14 prohibits break between AL and PO, and between PO and AL, without example. But cases like "ten%" can be found in the internet, and similar usage like "สิบ%" sounds sensible in Thai. * tests/test_thbrk.c (TestSamples): - Add test case * src/thbrk/brk-ctype.c (_break_table): - (THAI, NUM_NBB): _A -> _I - (ALPHA, NUM_NBB): _A -> _I 2021-03-19 Theppitak Karoonboonyanan Prohibit break between Currency and Alpha LB24 of UAX#14 prohibits break between PR and AL, and between AL and PR, without example. But some cases can be found from Wikipedia: - Brazilian real (usually written as R$) - Nicaraguan córdoba (usually written as C$) - Other uses: Micro$oft, George Luca$, etc. https://en.wikipedia.org/wiki/Dollar_sign Similar usage is not clear for Thai. * tests/test_thbrk.c (TestSamples): - Add test case, with "C++" to catch side-effect of PR and PR * src/thbrk/brk-ctype.c (_break_table): - (ALPHA, NUM_CUR): _A -> _I - (NUM_CUR, ALPHA): _A -> _I - (NUM_CUR, NUM_CUR): _A -> _I 2021-03-15 Theppitak Karoonboonyanan Prohibit break on the other side of parentesis LB30 of UAX#14 prohibits break in "(s)he" and "person(s)". The rule for Thai is more complicated. So, we only cover Latin for now. * tests/test_thbrk.c (TestSamples): - Add test case * src/thbrk/brk-ctype.c (_break_table): - (ALPHA, NBA): _A -> _I - (CLOSE, ALPHA): _A -> _I 2021-03-09 Theppitak Karoonboonyanan Prohibit break between Thai/Alpha and number LB23 of UAX#14 prohibits break between AL and NU, and between NU and AL. This makes sense for Latin. For Thai, prohibiting break between AL and NU makes sense for codes like "กข43" as a rice strain, but "40บาท" should still be broken. * tests/test_thbrk.c (TestSamples): - Add test case * src/thbrk/brk-ctype.c (_break_table): - (ALPHA, NUM): _A -> _I - (NUM, ALPHA): _A -> _I - (THAI, NUM): _A -> _I 2021-03-05 Theppitak Karoonboonyanan Handle Latin acronyms LB29 of UAX#14 prohibits break between IS and AL for acronyms such as "e.g.", but this causes side-effects on other IS punctuations like comma, colon, semicolon as well. This leads to an error in case of ",ก" in Unicode LineBreakTest.txt as tested by Pango, which I think it makes perfect sense to allow break. So, I'm not trying to fix it, but try to extend acronym handling to cover Latin instead. * tests/test_thbrk.c (TestSamples): - Add test case * src/thbrk/thbrk.c (th_brk_find_breaks): - Rename 'thai_chunk' variable to 'chunk'. - Move brk_op() lookup to after chunk handling. - Adjust acronym handling to cover BRK_CLASS_ALPHA acronyms. - Reset chunk pointers every time break class changes, not just when entering Thai chunk. 2021-03-01 Theppitak Karoonboonyanan Prohibit some breaks around number sign UAX#14 prohibits break within "#ก", which makes sense for hashtags, but its classification as AL would affect cases like "hello|สวัสดี". So, re-classify it as QUOTE instead. * tests/test_thbrk.c (TestSamples): - Add test cases * src/thbrk/brk-ctype.c (_char_class): - Change class of '#' from BRK_CLASS_ALPHA to BRK_CLASS_QUOTE. Partially addresses issue #6. 2021-02-22 Theppitak Karoonboonyanan Prohibit break between right parenthesis and currency It's common to write "12๎๎฿" instead of "฿12". So, line break between right parenthesis and currency, such as in "(12)฿", should be prohibited similarly to the left-side form. * tests/test_thbrk.c (TestSamples): - Add test cases * src/thbrk/brk-ctype.h (brk_class_t): - Add BRK_CLASS_CLOSE to separate ')', ']', '}' from BRK_CLASS_NBB. * src/thbrk/brk-ctype.c (_char_class): - Change class of ')', ']', '}' from BRK_CLASS_NBB to BRK_CLASS_CLOSE. * src/thbrk/brk-ctype.c (_break_table): - Insert row and column for CLOSE, with contents copied from NBB. - Change (CLOSE, NUM_CUR): _A -> _P 2021-02-19 Theppitak Karoonboonyanan Prohibit break between currency and left parenthesis In UAX#14, although LB18 (break after spaces) makes prohibited breaks between PR and {NU,OP} in LB25 equivalent to indirect breaks, the description of PR Property insists that it must be prohibited "even if a space character intervenes", with "$ (100.00)" as an example with no break opportunity. [1] So, we take them as prohibited breaks. [1] https://www.unicode.org/reports/tr14/#PR * tests/test_thbrk.c (TestSamples): - Add test case as interpreted from UAX#14 * src/thbrk/brk-ctype.c (_break_table): - (NUM_CUR, NBA): _A -> _P 2021-02-18 Theppitak Karoonboonyanan Prohitbit 2 break cases before percent In UAX#14, although LB18 (break after spaces) makes prohibited breaks between {NU,CL,CP} and PO in LB25 equivalent to indirect breaks, the description of PO Property insists that it must be prohibited "even if one or more space characters intervene", with "(12.00) %" as an example with no break opportunity. [1] So, we take them as prohibited breaks. [1] https://www.unicode.org/reports/tr14/#PO * tests/test_thbrk.c (TestSamples): - Add test case as interpreted from UAX#14 * src/thbrk/brk-ctype.c (_break_table): - (NUM, NUM_NBB): _I -> _P - (NBB, NUM_NBB): _A -> _P 2021-02-14 Theppitak Karoonboonyanan Rewrite thbrk test for extensibility. * tests/test_thbrk.c: - Rewrite to read and test from sample list. - Encode samples in UTF-8 and convert them as required. 2021-02-14 Theppitak Karoonboonyanan Simplify thbrk test. * tests/test_thbrk.c: - Remove interactive mode. - Continue testing, rather than exit, on failures. 2021-02-05 Theppitak Karoonboonyanan Provide our own version of INSTALL instruction. * INSTALL: - Explain simple installation steps and build requirements. * .gitignore: - Remove INSTALL ignoring. This is to address a frequently found issue when 'autoconf-archive' is missing, like in issue #9. 2021-01-28 Theppitak Karoonboonyanan Another protection against invalid array access. In addition to commit 8e7ab6967f862a409352f7e8985fd4b445d37f4d. * src/thbrk/thbrk.c (th_brk_find_breaks): - Protect against accessing pos[-1] which may be caused by a failure in brk_maximal_do(). Thanks Ratchanan Srirattanamet for the report via personal mail. 2020-08-16 Theppitak Karoonboonyanan Update word break dictionary. * data/Makefile.am, +data/tdict-currency.txt: - Add new category. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-slang.txt: * data/tdict-spell.txt: - Add words. * data/tdict-science.txt: - Fix typo "อัลลอยด์" -> "อัลลอย". 2020-03-23 Ross Burton configure.ac: remove duplicate AC_CONFIG_MACRO_DIR Autoconf 2.70 will fatally error out if AC_CONFIG_MACRO_DIR is called more than once: | configure.ac:25: error: AC_CONFIG_MACRO_DIR can only be used once * configure.ac: - Remove duplicate AC_CONFIG_MACRO_DIR 2020-01-14 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-geo.txt: * data/tdict-history.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-slang.txt: * data/tdict-spell.txt: - Add words. * data/tdict-proper.txt, data/tdict-city.txt: - Move "ลิเวอร์พูล" from proper to city. * data/tdict-proper.txt, data/tdict-history.txt: - Move "เมจิ" from proper to history. * data/tdict-common.txt: - Correct some sorting order. 2019-12-20 Theppitak Karoonboonyanan Use GitHub issue tracker as bug report address. * configure.ac: - Replace bug report e-mail address with GitHub issue tracker URL. 2018-12-17 Theppitak Karoonboonyanan Split compound in dictionary. * data/tdict-std.txt: - Split "ชิงช้าชาลี" into a new entry "ชาลี", as is the case in recent version of Royal Institute dictionary. 2018-08-01 Theppitak Karoonboonyanan * NEWS: === Version 0.1.28 === 2018-08-01 Theppitak Karoonboonyanan Update README. * README: - Use HTTPS in project URL. 2018-08-01 Theppitak Karoonboonyanan Update library versioning. * configure.ac: - Bump library revision to reflect code change. 2018-08-01 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: - Add words. 2018-06-21 Theppitak Karoonboonyanan Fix warning on comma at the end of enum list * src/thbrk/brk-ctype.h (brk_op_t): - Remove comma at the end of enum list (-Wpedantic) 2018-06-14 Theppitak Karoonboonyanan Adjust pointer declaration style * src/thbrk/brk-common.c (brk_load_default_dict): - Adjust pointer declaration style for 'dict_trie' 2018-06-14 Theppitak Karoonboonyanan Avoid non-ANSI C snprintf() * src/thbrk/brk-common.c (+full_path, brk_load_default_dict): - Instead of preparing full path name with snprintf(), which is non-ANSI, and still risks path name trimming, do it with size-calculated malloc(). - free() it as needed. 2018-06-12 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-spell.txt: - Add words suggested by @nuttee15 in issue #2 https://github.com/tlwg/libthai/issues/2#issuecomment-396442354 https://github.com/tlwg/libthai/issues/2#issuecomment-396456551 Thank you very much! 2018-06-12 Theppitak Karoonboonyanan Add 2 words from new RI dictionary. * data/tdict-std.txt: - Add 'กายินทรีย์' and 'กาเยนทรีย์' from new RI dict. 2018-06-12 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-geo.txt: * data/tdict-history.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-slang.txt: * data/tdict-spell.txt: - Add words. 2018-05-31 Theppitak Karoonboonyanan Add missing includes in header files. * include/thai/thwbrk.h: * include/thai/thwcoll.h: * include/thai/thwinp.h: * include/thai/thwrend.h: - Add missing #include 2018-01-08 Theppitak Karoonboonyanan Move some entries in tdict-common to proper files. * data/Makefile.am, +data/tdict-slang.txt: - Add file for keeping slangs. * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-slang.txt: * data/tdict-spell.txt: - Move entries in tdict-common to more proper files. 2018-01-08 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-country.txt: * data/tdict-district.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: - Add words. * data/tdict-history.txt: - Replace "ลาลูแบร์" with just "ลูแบร์". 2017-10-25 Theppitak Karoonboonyanan * NEWS: === Version 0.1.27 === 2017-10-25 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-geo.txt: * data/tdict-history.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: - Add words. * data/tdict-proper.txt: - Correct order for some entries. 2017-08-07 Theppitak Karoonboonyanan Strip prefix for C source files in doc * doc/Doxyfile.in: - Append "@top_srcdir@/src" and "@top_srcdir@/tests" to STRIP_FROM_PATH, so that references to source directories in the generated document do not capture full build path. Defect caught by Debian Reproducible 2017-06-29 Theppitak Karoonboonyanan Remove duplicated dict entry. * data/tdict-common.txt: - Remove 'ดัมพ์' which is already in tdict-std. * data/tdict-common.txt: * data/tdict-spell.txt: - Move 'ดั๊มพ์' from tdict-common to tdict-spell. 2017-06-29 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-geo.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: - Add words. 2016-12-14 Theppitak Karoonboonyanan * NEWS: === Version 0.1.26 === 2016-12-14 Theppitak Karoonboonyanan Include git-version-gen in tarball * Makefile.am: - Add build-aux/git-version-gen to EXTRA_DIST. 2016-12-14 Theppitak Karoonboonyanan Move word. * data/tdict-common.txt, data/tdict-ict.txt: - Move 'แอดมิน' from tdict-common to tdict-ict. 2016-12-14 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-geo.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: - Add words. 2016-07-09 Theppitak Karoonboonyanan Move some uncommon compound names out of tdict-std dictionary. * data/tdict-std.txt, data/tdict-std-compound.txt: - Move 'กระดองหาย', 'กระดาดขาว', 'นางรำ', 'นางหงส์' to tdict-std-compound. 2016-07-08 Theppitak Karoonboonyanan Move compounds out of tdict-std dictionary. * data/tdict-std.txt: - Remove long compound 'ตะลึงพรึงเพริด'. - Keep only 'สะพรึง' instead of 'สะพรึงกลัว'. * data/tdict-std.txt, data/tdict-std-compound.txt: - Move compounds 'ตะลุ่มนก', 'ตากวาง', 'ตานนกกด' to tdict-std-compound. 2016-07-08 Theppitak Karoonboonyanan Match git tag with version pattern. * build-aux/git-version-gen: - Apply --match pattern to git-describe to prevent confusion with other tags. - Drop unused LF var. 2016-07-07 Theppitak Karoonboonyanan Use versioning based on Git snapshot. * Makefile.am: - Add dist-hook to generate VERSION file on tarball generation. * +build-aux/git-version-gen: - Add script to generate version based on 'git describe' if in git tree, or using VERSION file if in release tarball. * configure.ac: - Call git-version-gen to get package version. 2016-06-28 Theppitak Karoonboonyanan * configure.ac, NEWS: === Version 0.1.25 === 2016-06-28 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: - Add words. 2016-06-27 Theppitak Karoonboonyanan Document about multi-thread use. * src/thbrk/thbrk.c (th_brk_new): - Add documentation about deleting after use, and about creating/destroying word breaker instance in critical sections. 2016-06-26 Theppitak Karoonboonyanan Use 'wc' prefix instead of suffix for wide-char word break APIs. To be consistent with other modules, wide-char APIs should be prefixed, not suffixed with 'wc'. * include/thai/thwbrk.h: * src/libthai.c: * src/libthai.def: * src/libthai.map: * src/thwbrk/thwbrk.c: * tests/test_thwbrk.c: - Rename th_brk_find_breaks_wc() -> th_brk_wc_find_breaks(). - Rename th_brk_insert_breaks_wc() -> th_brk_wc_insert_breaks(). 2016-06-25 Theppitak Karoonboonyanan Rename the new APIs to be more meaningful. The methods for ThBrk I proposed to Mark Brown in our discussion were too confusing. Before they get public, let's pick more meaningful names instead: - th_brk_brk() -> th_brk_find_breaks() - th_brk_brk_line() -> th_brk_insert_breaks() - th_brk_wbrk() -> th_brk_find_breaks_wc() - th_brk_wbrk_line() -> th_brk_insert_breaks_wc() * include/thai/thbrk.h: * include/thai/thwbrk.h: * src/libthai.c: * src/libthai.def: * src/libthai.map: * src/thbrk/thbrk.c: * src/thwbrk/thwbrk.c: * tests/test_thbrk.c: * tests/test_thwbrk.c: - Rename functions as listed above. - Rename the 'n' argument in the functions to indicate whose size it describes. 2016-06-24 Theppitak Karoonboonyanan Add function deprecation warnings. * include/thai/thailib.h: - Add TH_DEPRECATED and TH_DEPRECATED_FOR() macros, taken from GLib. * include/thai/thbrk.h (th_brk, th_brk_line): * include/thai/thwbrk.h (th_wbrk, th_wbrk_line): - Apply TH_DEPRECATED_FOR() to deprecated functions, with suggestions for alternatives. 2016-06-24 Theppitak Karoonboonyanan Update tests to test new APIs. * tests/test_thbrk.c (main), tests/test_thwbk.c (main): - Create, use, and free ThBrk instance with the new APIs instead of the deprecated ones. 2016-06-23 Theppitak Karoonboonyanan Update documentation for API deprecation. * src/thbrk/thbrk.c, src/thwbrk/thwbrk.c: - Add info for deprecated functions. - Add availability info for new functions. * src/thinp/thinp.c: - Add availability info for th_validate_leveled(). * src/libthai.c: - Add missing link to th_validate_leveled(). 2016-06-23 Theppitak Karoonboonyanan Update library versioning. * configure.ac: - Bump library versions to reflect the added API. 2016-06-22 Mark Brown Implement a new thread-safe interface for word break. To achieve more thread-safety without depending on mutex mechanisms, a new set of APIs is added so that the client can create a shared instance of word break engine by him/herself under appropriate mutex. Then, the word break functions can be safely called in parallel using the shared engine. * include/thai/thbrk.h: * include/thai/thwbrk.h: * src/libthai.c: * src/libthai.def: * src/libthai.map: - Add new exported APIs: th_brk_new(), th_brk_delete(), th_brk_brk(), th_brk_brk_line(), th_brk_wbrk(), th_brk_wbrk_line(). * src/thbrk/brk-common.h, src/thbrk/brk-common.c (-brk_on_unload, -brk_get_dict, +brk_load_default_dict): - Remove old shared dict management. It's to be as part of ThBrk implementation in ThBrk layer instead. - The logic for finding and loading dictionary at default paths is still retained here. * src/thbrk/thbrk.c (th_brk_new, th_brk_delete): - Implement ThBrk (de)allocation, with dictionary loading at specified path or at default paths if not specified. * src/thbrk/brk-maximal.h, src/thbrk/brk-maximal.c (struct _BrkEnv, brk_env_new): - Add ThBrk engine as a member of BrkEnv. * src/thbrk/brk-maximal.c (brk_root_pool): - Access dict trie from ThBrk in BrkEnv instead of getting shared dict directly. * src/thbrk/thbrk.c (th_brk -> th_brk_brk, th_brk_line -> th_brk_brk_line): * src/thwbrk/thwbrk.c (th_wbrk -> th_brk_wbrk, th_wbrk_line -> th_brk_wbrk_line): - Modify old functions to new ones by adding ThBrk* parameter. * src/thbrk/Makefile.am, +src/thbrk/thbrk-priv.h, src/thbrk/thbrk.c (brk_get_shared_brk, brk_free_shared_brk): - Add functions for managing the shared engine to preserve old behavior. * src/libthai.c (_libthai_on_unload): - Call brk_free_shared_brk() on unload. * src/thbrk/thbrk.c (th_brk, th_brk_line): * src/thwbrk/thwbrk.c (th_wbrk, th_wbrk_line): - Provide old APIs as wrappers to the new APIs, for backward compatibility. Merging pull request #1. 2016-05-02 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-spell.txt: Add words. 2016-03-28 Theppitak Karoonboonyanan Do not test word breaking if dict is disabled. * tests/Makefile.am: - Only add test-thbrk.sh and test-thwbrk.sh to TESTS when dictionary is enabled. 2016-03-03 Theppitak Karoonboonyanan Get rid of unused variable. * src/thrend/thrend.c (tis620_0_shape_table_): - Comment out unused variable reported by GCC 6. 2016-01-28 Theppitak Karoonboonyanan Make source more readable. * src/thbrk/brk-common.c (brk_get_dict): - Break assignment and comparison chain on reading LIBTHAI_DICTDIR environment. 2016-01-20 Theppitak Karoonboonyanan Fix compilation error with GCC 6 * include/thai/thctype.h (_th_bitmsk): - Use unsigned int instead of unsigned short for bitmask base, as GCC 6 takes ~0 as -1 for unsigned short, and causes compilation error for scim-thai. Thanks Martin Michlmayr for the report via Debian #811690 http://bugs.debian.org/811690 2016-01-20 Theppitak Karoonboonyanan * configure.ac: Post-release version suffix added. 2015-11-22 Theppitak Karoonboonyanan * configure.ac: - Bump library revision to reflect code changes. * configure.ac, NEWS: === Version 0.1.24 === 2015-11-22 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. 2015-11-22 Theppitak Karoonboonyanan Fix infinite loop introduced by recent optimization. * src/thbrk/brk-maximal.c (brk_recover_try): - Update 'pool' correctly when deleting matched BrkPool node. brk_pool_delete_node() cannot be started at pool_tail indeed. Thanks Zack Weinberg for the report via Debian #805703 http://bugs.debian.org/805703 2015-11-22 Theppitak Karoonboonyanan * configure.ac: Post-release version suffix added. 2015-10-22 Theppitak Karoonboonyanan * configure.ac: - Bump library revision to reflect code changes. * configure.ac, NEWS: === Version 0.1.23 === 2015-10-21 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-country.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: Add words. 2015-10-16 Theppitak Karoonboonyanan Share BrkEnv longer. The code before the threadsafe commit used to initialize the free list earlier in th_brk() and then called brk_maximal_do() multiple times. This allowed the free list to be shared among all such calls. The threadsafe commit did it deep in brk_maximal_do_impl(), limiting the share within the single brk_maximal_do() call only. We should widen the use to be the same as previously done. * src/thbrk/brk-maximal.h (+brk_env_new, +brk_env_free): - Re-add initialization/quit functions in terms of BrkEnv. - Add BrkEnv* arg to brk_maximal_do(). * src/thbrk/brk-maximal.c (brk_env_init -> brk_env_new, brk_env_destroy -> brk_env_free): - Turn the init/destroy functions into new/free version. * src/thbrk/brk-maximal.c (brk_maximal_do, brk_maximal_do_impl): - Add the BrkEnv* arg and use it instead of the locally created one. * src/thbrk/thbrk.c (th_brk): - Create BrkEnv object at start and free it at end. - Add the env arg to brk_maximal_do() calls. This appears to save more 2.4% of the total runtime and makes it on par with the code before the threadsafe commit again. 2015-10-14 Theppitak Karoonboonyanan Call brk_pool_match() linearly. * src/thbrk/brk-maximal.c (brk_maximal_do_impl, brk_recover_try): - Start next brk_pool_match() on node next to previously matched one, instead of the pool head, making the match O(n) instead of O(n^2). - [brk_maximal_do_impl] Also prevent using the deleted node to start the next round. - [brk_recover_try] Also start brk_pool_delete_node() on the pool tail, making it also linear. This is not possible for brk_maximal_do_impl, as the node to be deleted is not fixed. This appears to save time by 8.85% on brk_pool_match() calls, lowering its bottleneck rank by 2 positions, and saves time by 0.0388% of total run time. 2015-10-14 Theppitak Karoonboonyanan Declare variable inside block. * src/thbrk/brk-maximal.c (brk_maximal_do_impl, brk_recover_try): - Move "BrkPool *match" declaration into the block it's used. 2015-10-12 Theppitak Karoonboonyanan Declare const arg for brk_pool_match(). * src/thbrk/brk-maximal.c (brk_pool_match): - Add const-ness to the read-only 'node' arg. 2015-10-11 Theppitak Karoonboonyanan Minor optimization on brk_pool_match(). Callgrind reports brk_pool_match() among top bottlenecks. So, try to optimize it as we can. * src/thbrk/brk-maximal.c (brk_pool_match): - Instead of sharing the loop for two different cases, check the case once and run loop for each case separately, to minimize branching down to a single time. - Evaluate the unchanged 'node->shot.brk_pos[node_cur_pos - 1]' expression only once, instead of on every round. This appears to save time by 9.3% for total brk_pool_match() calls, and by 0.067% of total run time. 2015-10-11 Theppitak Karoonboonyanan Fix 'make check' error on long path names. * tests/thsort.c (main): - Increase file name buffers from 64 to 512, to afford long path names in command-line arguments. 2015-10-10 Theppitak Karoonboonyanan Replace static global free-list with local one. The static global free-list in brk-maximal.c makes it not thread-safe. * src/thbrk/brk-maximal.c: - Add type BrkEnv to keep break pool free list, with brk_env_init() and brk_env_destroy() methods. - (brk_pool_node_new, brk_pool_node_free, brk_pool_free, brk_pool_delete, brk_root_pool, brk_recover_try, brk_recover): Add (BrkEnv *) arg to relevant functions and use it properly. - (brk_maximal_do): Create and destroy BrkEnv instance for use locally. Pass it to the required function calls. - (brk_pool_delete -> brk_pool_delete_node): Rename function to avoid confusion with brk_pool_free(). * src/thbrk/brk-maximal.c: * src/thbrk/brk-maximal.h: * src/thbrk/thbrk.c: - Get rid of now-unnecessary brk_maximal_init() and brk_maximal_quit(). Thanks Behdad Esfahbod for the report. 2015-09-07 Theppitak Karoonboonyanan Fix doxygen version checking. * configure.ac: - Correctly compare doxygen versions. Simple expr comparison didn't work with version 1.8.10. Taken from Petr Gajdos 's patch for libdatrie. 2015-06-04 Theppitak Karoonboonyanan Protect against invalid array access. * src/thbrk/thbrk.c (th_brk): - Add a condition to protect against accessing pos[-1] which may be caused by a failure in brk_maximal_do() on the first Thai chunk, which makes cur_pos remain 0. Thanks Behdad Esfahbod for the report. 2015-06-04 Theppitak Karoonboonyanan * configure.ac: Post-release version suffix added. 2015-05-08 Theppitak Karoonboonyanan * configure.ac, NEWS: === Version 0.1.22 === 2015-05-08 Theppitak Karoonboonyanan * configure.ac: Bump library revision, due to code change. 2015-05-08 Theppitak Karoonboonyanan Fix 'make distcheck' failure. * doc/Makefile.am: - Remove doxygen db file on clean. 2015-05-06 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. 2015-05-06 Theppitak Karoonboonyanan Use my Gmail e-mail address everywhere. * AUTHORS: * include/thai/thailib.h: * include/thai/thbrk.h: * include/thai/thcell.h: * include/thai/thcoll.h: * include/thai/thctype.h: * include/thai/thinp.h: * include/thai/thrend.h: * include/thai/thstr.h: * include/thai/thwbrk.h: * include/thai/thwchar.h: * include/thai/thwcoll.h: * include/thai/thwctype.h: * include/thai/thwinp.h: * include/thai/thwrend.h: * include/thai/thwstr.h: * include/thai/tis.h: * include/thai/wtt.h: * man/man3/libthai.3: * man/man3/thctype.3: * man/man3/wtt.3: * man/template.3: * src/libthai.c: * src/thbrk/brk-common.c: * src/thbrk/brk-common.h: * src/thbrk/brk-ctype.c: * src/thbrk/brk-ctype.h: * src/thbrk/brk-maximal.c: * src/thbrk/brk-maximal.h: * src/thbrk/thbrk.c: * src/thcell/thcell.c: * src/thcoll/cweight.c: * src/thcoll/cweight.h: * src/thcoll/thcoll.c: * src/thctype/thctype.c: * src/thctype/wtt.c: * src/thinp/thinp.c: * src/thrend/thrend.c: * src/thstr/thstr.c: * src/thwbrk/thwbrk.c: * src/thwchar/thwchar.c: * src/thwctype/thwctype.c: * src/thwstr/thwstr.c: * tests/thsort.c: - Replace my e-mail address with the Gmail one. 2015-05-06 Theppitak Karoonboonyanan More LIKELY/UNLIKELY hints. * src/thbrk/brk-maximal.c (brk_recover): - Add UNLIKELY to recovery history hits, reducing time by 0.03% 2015-05-06 Theppitak Karoonboonyanan Split LIKELY/UNLIKELY to a separate header. This prevents modules from depending on brk-common.h, which provides low-level APIs for breaker backends. * src/thbrk/Makefile.am, src/thbrk/brk-common.h +src/thbrk/thbrk-utils.h: - Move LIKELY/UNLIKELY macro definitions from brk-common.h to a new header thbrk-utils.h * src/thbrk/brk-common.c, src/thbrk/brk-maximal.c, src/thbrk/thbrk.c: - Include thbrk-utils.h instead of brk-common.h for LIKELY/UNLIKELY. 2015-05-04 Theppitak Karoonboonyanan More LIKELY/UNLIKELY hints. * src/thbrk/thbrk.c (th_brk_line): - Add UNLIKELY() on error conditions. Note: adding UNLIKELY() to th_brk() first if-statement makes it slower, which is weird. So, not added there. 2015-05-02 Theppitak Karoonboonyanan * configure.ac: Bump library revision, due to code change. 2015-05-02 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: Add words. 2015-04-30 Theppitak Karoonboonyanan Catch error on break hints allocation. * src/thbrk/brk-maximal.c (brk_maximal_do): - Do not continue if brkpos_hints allocation fails. 2015-04-30 Theppitak Karoonboonyanan Do TIS-to-Unicode conversion only once on word breaking. * src/thbrk/brk-maximal.c (brk_maximal_do): - Convert input string to Unicode before passing to brk_maximal_do_impl(). * src/thbrk/brk-maximal.c (brk_maximal_do_impl, brk_recover, brk_recover_try): - Redeclare to accept wide char string. - Do not call th_tis2uni() on every input char. This appears to reduce word breaking time by 0.28% on test case. Thanks edgehogapp@gmail.com for the suggestion. https://groups.google.com/forum/#!topic/thai-linux-foss-devel/Be-OLMRYF7M 2015-04-29 Theppitak Karoonboonyanan Bump doxygen required version. * configure.ac: - Bump doxygen required version to 1.8.8, according to recent Doxyfile update. 2015-03-07 Theppitak Karoonboonyanan Also check for 'trietool' (without -0.2 suffix) * configure.ac: - Check for both trietool-0.2 (for libdatrie < 0.2.9) and trietool (for libdatrie >= 0.2.9) utility. 2015-03-06 Theppitak Karoonboonyanan Update Doxyfile. * doc/Doxyfile.in: - Updated for doxygen 1.8.8 with 'doxygen -u'. 2015-03-06 Theppitak Karoonboonyanan Fix compiler warning. * src/thbrk/brk-maximal.c (brk_recover): - Initialize last_brk_pos on declaration. 2015-02-05 Theppitak Karoonboonyanan Disable timestamp in Doxygen-generated doc. * doc/Doxyfile.in: - Set HTML_TIMESTAMP to NO to make the document reproducible. (reported by Debian Reproducible) 2015-02-04 Theppitak Karoonboonyanan Micro-optimize with likely/unlikely hints. * src/thbrk/brk-common.h: - Add LIKELY() and UNLIKELY() macros based on compiler extension. * src/thbrk/brk-common.c (brk_get_dict): * src/thbrk/brk-maximal.c (brk_maximal_do_impl, brk_root_pool, brk_shot_init, brk_pool_node_new, best_brk_new): - Use LIKELY() and UNLIKELY() where it is known to be so, mostly for one-time initialization and failure handling. Callgrind says it does help speed up a little bit. 2014-12-16 Theppitak Karoonboonyanan Update word break dictionary. * data/tdict-city.txt: * data/tdict-collection.txt: * data/tdict-common.txt: * data/tdict-country.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: Add words. 2014-08-23 Theppitak Karoonboonyanan * configure.ac: Post-release version suffix added. 2014-08-21 Theppitak Karoonboonyanan * configure.ac, NEWS: === Version 0.1.21 === 2014-08-19 Theppitak Karoonboonyanan Be more careful about CFLAGS. * src/Makefile.am: - Add $(DATRIE_CFLAGS) to CFLAGS so that thbrk/brk-common.h included from libthai.c can safely find datrie/trie.h header. 2014-08-19 Theppitak Karoonboonyanan Fix build failure due to missing header. * src/Makefile.am: - Add CFLAGS so that thbrk/brk-common.h included from libthai.c can find thai/thctype.h header. 2014-08-18 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. 2014-08-09 Theppitak Karoonboonyanan thbrk: Be greedy when recovering from error. * src/thbrk/brk-maximal.c (brk_recover_try): - Loop to match as many words as possible on every node picked from pool, to increase the chance of early sufficient words matched, and to avoid too much possibility exploration. 2014-08-09 Theppitak Karoonboonyanan Minor optimization. Callgrind says brk_recover_try() is the most frequently called. So, try to optimize it as we can. * src/thbrk/brk-maximal.c (brk_recover, brk_recover_try): - Only return the last break position instead of all. * src/thbrk/brk-maximal.c (brk_recover_try): - Remove unneeded assignment on is_terminal for the case in which its value is irrelevant. - Remove unneeded check on is_terminal where its value is known. 2014-08-09 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c: - Add missing #include . 2014-08-05 Theppitak Karoonboonyanan Code clean up. * src/thbrk/brk-maximal.c (brk_shot_init): - Split assignment and if-condition. * src/thbrk/brk-maximal.c (brk_pool_free_node -> brk_pool_node_free): - Rename the function to obj-action form, in accordance with brk_pool_node_new(). - Add forward declaration for it. 2014-08-04 Theppitak Karoonboonyanan Refactor common thbrk code into brk-common. This prevents mutual dependency between brk-* and thbrk. * src/thbrk/Makefile.am: * src/thbrk/thbrk-private.h -> src/thbrk/brk-common.h: - Rename file, as thbrk-private.h contains only functions for common brk-* codes. - Rename thbrk_* functions back to brk_*. * src/thbrk/Makefile.am: * src/thbrk/thbrk.c: * +src/thbrk/brk-common.c: - Move common thbrk_* functions implementations to brk-common.c. - Rename the functions according to changes in brk-common.h. * src/thbrk/brk-maximal.c: - Update #include for thbrk-private.h to brk-common.h. - Update calls to thbrk_* to brk_*. - Remove #include's previously used by the common code. * src/libthai.c (_libthai_on_unload): - Update #include for thbrk-private.h to brk-common.h. - Update call to thbrk_on_unload() to brk_on_unload(). 2014-08-03 Theppitak Karoonboonyanan Move brk_get_dict() to common thbrk code. * src/thbrk/brk-maximal.c, src/thbrk/thbrk-private.h: - Move brk_get_dict() declaration to thbrk-private.h. - Rename the function to thbrk_get_dict() for consistency. * src/thbrk/brk-maximal.c, src/thbrk/thbrk.c: - Move implementation of brk_get_dict() to thbrk.c. - Rename function & variable accordingly. * src/thbrk/brk-maximal.c (brk_root_pool): - Rename function call to thbrk_get_dict(). * src/thbrk/brk-maximal.h, src/thbrk/brk-maximal.c, src/thbrk/thbrk.c: - Move implementation of brk_maximal_on_unload() to thbrk_on_unload(). - Remove brk_maximal_on_unload() which is no longer needed. 2014-08-03 Theppitak Karoonboonyanan Rename th_brkpos_hints() to thbrk_brkpos_hints(). * src/thbrk/thbrk-private.h, src/thbrk/thbrk.c, src/thbrk/brk-maximal.c: - Change function prefix for consistency. 2014-08-03 Theppitak Karoonboonyanan Move th_brkpos_hints() to common thbrk code. This is a preparation for alternative word break algorithms. * src/thbrk/brk-maximal.c, src/thbrk/thbrk-private.h: - Move declaration of th_brkpos_hints() to thbrk-private.h. * src/thbrk/brk-maximal.c: - Include thbrk-private.h to get th_brkpos_hints() declaration. * src/thbrk/brk-maximal.c, src/thbrk/thbrk.c: - Move implementation of th_brkpos_hints() to thbrk.c. 2014-07-31 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-lang-ethnic.txt: Add words. 2014-06-30 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-geo.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. * data/tdict-std.txt: - Remove rare word 'จอแส' which causes wrong segmentation for 'จอแสดงผล' as 'จอแส|ดง|ผล'. 2014-03-06 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-spell.txt: Add words. * data/tdict-common.txt: - Remove 'บ๊วย' which duplicates an entry in tdict-std.txt. * data/tdict-proper.txt, data/tdict-district.txt: - Move 'กรือเซะ' from tdict-proper.txt to tdict-district.txt. * data/tdict-proper.txt, data/tdict-geo.txt: - Move 'แอตแลนติส' from tdict-proper.txt to tdict-geo.txt. 2014-01-11 Theppitak Karoonboonyanan Fix warning in test suite. * tests/test_thbrk.c (main): - Fix compiler warning on unused return value from fgets(). Caught by debian package building with test suites enabled. 2013-12-21 Theppitak Karoonboonyanan * data/tdict-std-compound.txt, data/tdict-std.txt: - Move 'ชุติมา' from -std-compound to -std. 2013-12-21 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. 2013-12-21 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. 2013-10-25 Theppitak Karoonboonyanan * configure.ac, NEWS: === Version 0.1.20 === 2013-10-25 Theppitak Karoonboonyanan Fix warnings in tests. * tests/test_thwbrk.c (main): - Do some useful checks for 'unicodeCutCodeLength' unused var. - Cast (size_t) returned value of wcslen() to (long) and use "%ld" format to print it. - Remove trailing spaces in source. 2013-10-25 Theppitak Karoonboonyanan Fix warnings in tests. * tests/test_thwchar.c (main): - Cast (thchar_t *) to (char *) to fix signedness differences. - Cast (thwchar_t) to (unsigned long) for "%lx" string format. - Cast (size_t) return values from wcslen() and strlen() to long, and use "%ld" format to print them. - Replace 'outputLength' var with 'tisLength' and 'uniLength' and do some useful checks on return values of th_tis2uni_line() and th_uni2tis_line(). 2013-10-25 Theppitak Karoonboonyanan Fix warnings in tests. * tests/test_thinp.c (test_th_validate): - Cast (thchar_t *) to (char *), to fix signedness differences. * tests/thsort.c (readData): - Cast (size_t) to (long) and use "%ld" format to print it. - Redeclare maxData from int to size_t, to match that of nData. 2013-10-24 Theppitak Karoonboonyanan * README: Remove version-specific text. 2013-10-22 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-geo.txt: * data/tdict-history.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: Add words. 2013-09-24 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. 2013-09-23 Theppitak Karoonboonyanan Check for doxygen required version. * configure.ac: - When doxygen-doc is enabled, also check doxygen version. 2013-09-23 Theppitak Karoonboonyanan Update Doxyfile. * doc/Doxyfile.in: - Updated with 'doxygen -u'. 2013-09-23 Theppitak Karoonboonyanan Fix automake warnings. * src/thbrk/Makefile.am: * src/thcell/Makefile.am: * src/thcoll/Makefile.am: * src/thctype/Makefile.am: * src/thinp/Makefile.am: * src/thrend/Makefile.am: * src/thstr/Makefile.am: * src/thwbrk/Makefile.am: * src/thwchar/Makefile.am: * src/thwctype/Makefile.am: * src/thwstr/Makefile.am: * tests/Makefile.am: - Replace deprecated INCLUDES with AM_CPPFLAGS. 2013-08-02 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: Add words. * data/tdict-std.txt: Remove 'พาส' whose use is rare and which potentially causes ambiguity like in 'พาสมองเสื่อม'. 2013-02-24 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-geo.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. 2013-02-24 Theppitak Karoonboonyanan Remove compound words that potentially cause ambiguity. * data/tdict-std-compound.txt: - Remove 'มีหน้า' which can cause 'มีหน้า|ที่' instead of 'มี|หน้าที่'. - Remove 'ลืมตา' which can cause 'ลืมตา|มหา' instead of 'ลืม|ตาม|หา'. 2013-02-24 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. 2013-01-29 Theppitak Karoonboonyanan * configure.ac, NEWS: === Version 0.1.19 === 2013-01-28 Theppitak Karoonboonyanan Modernize autoconf & switch to XZ tarball compression. * configure.in -> configure.ac: - Rename file for modern autoconf. - Replace deprecated AC_INIT() form with PACKAGE, VERSION form. - Use AC_CONFIG_SRCDIR() instead for the old AC_INIT() form. - Add AC_CONFIG_MACRO_DIR(). - Replace deprecated AC_INIT_AUTOMAKE() form with OPTIONS form, passing "dist-xz no-dist-gzip" options to switch to XZ tarball. * Makefile.am: - Add "-I m4" to ACLOCAL_AMFLAGS, as suggested by libtool. 2013-01-28 Theppitak Karoonboonyanan Change th_validate() default level back to ISC_STRICT. * src/thinp/thinp.c (th_validate): - Change ISC level back from ISC_BASICCHECK to ISC_STRICT. ISC_BASICCHECK fails 'make check', and the previous value ISC_STRICT should be better for general input with correction. 2013-01-28 Theppitak Karoonboonyanan * data/tdict-science.txt: Fix typo. 2013-01-27 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-science.txt: * data/tdict-geo.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-district.txt: Add words. * data/tdict-std-compound.txt: Drop loose compound 'สว่างไสว'. 2012-11-07 Theppitak Karoonboonyanan Add input method API for correction with levels. * include/thai/thinp.h: * src/libthai.def: * src/libthai.map (+th_validate_leveled): - Add API for input validation with strictness arg. * src/thinp/thinp.c (th_validate, th_validate_leveled): - Convert old th_validate() code to th_validate_leveled() and replace the hard-coded ISC_STRICT with the arg. - Make the old th_validate() a wrapper to th_validate_leveled() with default param (ISC_BASICCHECK, not ISC_STRICT, though). * configure.in: - Bump library versions to reflect the added API. 2012-11-07 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-science.txt: * data/tdict-ict.txt: * data/tdict-lang-ethnic.txt: Add words. 2012-10-16 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-district.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-science.txt: Add words. 2012-07-30 Theppitak Karoonboonyanan * doc/Doxyfile.in: Upgrade to doxygen 1.8.1.2 format. 2012-07-29 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: Add words. 2012-07-30 Theppitak Karoonboonyanan * doc/Doxyfile.in: Upgrade to doxygen 1.8.1.2 format. 2012-06-15 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. 2012-06-12 Theppitak Karoonboonyanan * configure.in, NEWS: === Version 0.1.18 === 2012-06-12 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-lang-ethnic.txt: * data/tdict-proper.txt: Add words. 2012-06-11 Theppitak Karoonboonyanan * data/tdict-std.txt: Remove some prefix-words as found during working on thailatex hyphenation. 2012-06-10 Theppitak Karoonboonyanan * data/tdict-common.txt: * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-ict.txt: Add words. 2012-06-08 Theppitak Karoonboonyanan * configure.in: Belated post-release version suffix added. 2012-06-08 Theppitak Karoonboonyanan * data/tdict-city.txt: * data/tdict-common.txt: * data/tdict-ict.txt: * data/tdict-proper.txt: * data/tdict-spell.txt: Add words. 2012-03-07 Theppitak Karoonboonyanan * data/tdict-std-compound.txt, data/tdict-std.txt: Fix typos. 2012-02-24 Theppitak Karoonboonyanan * data/tdict-common.txt: Add words. 2012-02-21 Theppitak Karoonboonyanan * configure.in, NEWS: === Version 0.1.17 === 2012-02-21 Theppitak Karoonboonyanan * configure.in: Bump library revision, due to code change. 2012-02-21 Theppitak Karoonboonyanan * tests/test_thbrk.c, tests/test_thwbrk.c: Add acronym to the tests. 2012-02-21 Theppitak Karoonboonyanan * data/tdict-history.txt: * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-spell.txt: * data/tdict-ict.txt: * data/tdict-common.txt: Add words. 2012-02-20 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (th_brk): Do not break last Thai chunk with dictionary if it's acronym. 2012-02-14 Theppitak Karoonboonyanan * data/tdict-history.txt: * data/tdict-spell.txt: * data/tdict-common.txt: Add words. * data/tdict-geo.txt: Fix typo. 2012-02-08 Theppitak Karoonboonyanan * data/tdict-history.txt: * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-science.txt: * data/tdict-geo.txt: * data/tdict-ict.txt: * data/tdict-spell.txt: * data/tdict-district.txt: * data/tdict-common.txt: Add words. 2012-02-07 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (th_brk): Handle Thai acronyms by detecting groups of <= 3 Thai characters + '.'. Thanks John Tapsell for the report and initial patch. 2012-02-07 Theppitak Karoonboonyanan * src/thrend/thrend.c (th_render_cell_): Reformat source a little bit. 2012-01-18 Theppitak Karoonboonyanan * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-science.txt: * data/tdict-ict.txt: * data/tdict-common.txt: Add words. * data/tdic-proper.txt, data/tdict-history.txt: Move "ศรีวิชัย" to -history. 2012-01-18 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. 2011-11-05 Theppitak Karoonboonyanan * configure.in, NEWS: === Version 0.1.16 === 2011-11-05 Theppitak Karoonboonyanan * configure.in: - Remove remaining '$Id'. - Remove unnecessary AC_SUBST(VERSION). 2011-11-05 Theppitak Karoonboonyanan * -TODO: Removed, the plan is obsolete. 2011-11-05 Theppitak Karoonboonyanan * README: Update project URL & e-mail. * configure.in: Use googlegroups as bug report address. 2011-11-05 Theppitak Karoonboonyanan * configure.in: Add bug report address. 2011-11-05 Theppitak Karoonboonyanan * data/tdict-history.txt: * data/tdict-proper.txt: * data/tdict-spell.txt: * data/tdict-common.txt: Add words. * data/tdict-country.txt: Use more popular short name for Saudi Arabia. 2011-10-25 Theppitak Karoonboonyanan * data/tdict-common.txt, data/tdict-city.txt: "ไดฟุกุ" is city name. 2011-10-25 Theppitak Karoonboonyanan * data/tdict-science.txt: * data/tdict-common.txt: Add words. 2011-09-03 Theppitak Karoonboonyanan * data/tdict-proper.txt: * data/tdict-common.txt: Add words. * data/tdict-std.txt: Remove "มาย" which is rare but can potentially cause ambiguity (e.g. "มายกร่าง"). A valid compound, "เมามาย" is already in tdict-std-compound.txt. 2011-07-19 Theppitak Karoonboonyanan * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-geo.txt: * data/tdict-ict.txt: * data/tdict-common.txt: Add words. * data/tdict-std-compound.txt: Remove loose compound. * data/tdict-std.txt: - Remove rare words which potentially cause ambiguities. - Remove a duplicated entry. 2011-06-08 Theppitak Karoonboonyanan * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-science.txt: * data/tdict-geo.txt: * data/tdict-ict.txt: * data/tdict-spell.txt: * data/tdict-common.txt: * data/tdict-lang-ethnic.txt: Add words. 2011-06-08 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. 2011-03-24 Theppitak Karoonboonyanan * configure.in, NEWS: === Version 0.1.15 === 2011-03-24 Theppitak Karoonboonyanan * data/tdict-history.txt: * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-science.txt: * data/tdict-spell.txt: * data/tdict-district.txt: * data/tdict-common.txt: Add words. 2011-03-22 Theppitak Karoonboonyanan * data/tdict-proper.txt: * data/tdict-science.txt: * data/tdict-spell.txt: * data/tdict-district.txt: * data/tdict-common.txt: Add words. 2011-03-22 Theppitak Karoonboonyanan * data/tdict-history.txt: * data/tdict-proper.txt: * data/tdict-district.txt: Move 2 names from proper, district to history. * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-district.txt: * data/tdict-common.txt: Add words. 2011-03-20 Theppitak Karoonboonyanan * data/tdict-district.txt: * data/tdict-history.txt: Move "หริภุญชัย" from district to history list. * data/tdict-city.txt: Change "ฟูกุโอกะ" to "ฟูกูโอกะ", which is statistically more popular. * data/tdict-history.txt: * data/tdict-proper.txt: * data/tdict-city.txt: * data/tdict-science.txt: * data/tdict-geo.txt: * data/tdict-spell.txt: * data/tdict-common.txt: Add words. 2011-03-15 Theppitak Karoonboonyanan Split historical names out of tdict-city.txt and tdict-district.txt. * data/Makefile.am: * +data/tdict-history.txt: * data/tdict-city.txt: * data/tdict-district.txt: Split historical names into tdict-history.txt. * data/tdict-city.txt: * data/tdict-geo.txt: * data/tdict-lang-ethnic.txt: Move 2 names from tdict-city.txt to tdict-geo.txt and tdict-lang-ethnic.txt. 2011-03-15 Theppitak Karoonboonyanan Split tdict-country.txt and tdict-city.txt out of tdict-geo.txt and let tdict-geo.txt keeps only geographical names. * data/Makefile.am: * +data/tdict-country.txt: * +data/tdict-city.txt: * data/tdict-geo.txt: - Split country names into tdict-country.txt - Split city names into tdict-city.txt * data/tdict-geo.txt: * data/tdict-lang-ethnic.txt: - Split language/etchnic names into tdict-lang-ethnic.txt 2011-03-15 Theppitak Karoonboonyanan * data/tdict-proper.txt: * data/tdict-common.txt: * data/tdict-geo.txt: Add words. 2011-03-14 Theppitak Karoonboonyanan * data/tdict-spell.txt: * data/tdict-proper.txt: * data/tdict-common.txt: * data/tdict-geo.txt: Add words. 2011-03-09 Theppitak Karoonboonyanan * data/tdict-ict.txt: * data/tdict-district.txt: * data/tdict-common.txt: * data/tdict-science.txt: Add words. 2010-12-27 Theppitak Karoonboonyanan Adjust dictionary, as per Widhaya's feedback: http://groups.google.com/group/thai-linux-foss-devel/browse_thread/ thread/986f74d786c9bace * data/tdict-std-compound.txt: * data/tdict-std-common.txt: - Remove some compounds that may cause potential ambiguities. * data/tdict-spell.txt: * data/tdict-common.txt: - Add words. 2010-12-23 Theppitak Karoonboonyanan * data/tdict-{common,geo}.txt: Add words. 2010-05-17 Theppitak Karoonboonyanan * data/tdict-{district,common,science}.txt: Add words. 2010-03-14 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. Turn off manpages generation * doc/Doxyfile.in: Turn GENERATE_MAN off. * doc/Makefile.am: Remove man targets. 2010-02-28 Theppitak Karoonboonyanan * configure.in: Bump library revision. * configure.in, NEWS: === Version 0.1.14 === 2010-02-28 Theppitak Karoonboonyanan * data/tdict-{ict,proper,common}.txt: Add words. 2010-02-28 Theppitak Karoonboonyanan * src/thcoll/cweight.c (char_weight_tbl_): Replace TIS-620 characters in comments with character names. * src/thctype/thctype.c (_th_ctype_tbl, _th_chlevel_tbl): Remove non-TIS-620 characters from comments. Convert source to UTF-8. 2010-02-27 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_get_dict): Limit path[] scope. 2010-02-27 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_get_dict): Print some warning when dictionary opening fails. 2010-02-27 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_get_dict): Do not try to open dictionary again if once tried. Multiple failing trials degraded performance. 2010-02-25 Theppitak Karoonboonyanan * include/thai/*.h, src/*.c, src/*/*.c: Add my e-mail address to license header as the contact address. * src/thctype/wtt.c, * src/thctype/thctype.c, * src/thcoll/cweight.h, * src/thcoll/cweight.c, * src/thwctype/thwctype.c: Add editor modeline. 2010-02-24 Theppitak Karoonboonyanan * src/thinp/thinp.c, * src/thrend/thrend.c, * src/thcoll/thcoll.c, * src/thcoll/cweight.c, * src/thwstr/thwstr.c, * src/thwchar/thwchar.c, * src/thwbrk/thwbrk.c, * src/thstr/thstr.c, * src/thcell/thcell.c: Reformat source. Add editor modeline. 2010-02-24 Theppitak Karoonboonyanan * tests/thsort.c: Yet another GPL header leftover removal. Update my e-mail address. 2010-02-24 Theppitak Karoonboonyanan Replace with a rewritten one, so we can enforce the license. * include/thai/tis.h: Rewritten based on Unicode names, with TIS_ prefix. * src/thinp/thinp.c (corrections, th_validate), * src/thrend/thrend.c (shiftdown_tone_ad, shiftdownleft_tone_ad, shiftleft_tone_ad, shiftdown_bv_bd, tailcutcons, th_render_cell_, th_render_cell_tis), * src/thbrk/brk-maximal.c (th_brkpos_hints), * src/thcell/thcell.c (th_next_cell, th_prev_cell), * tests/test_thcell.c (test_ans_nodecomp_am, test_ans_decomp_am): Update character names to the new TIS_* ones. 2010-02-24 Theppitak Karoonboonyanan * include/thai/*.h, src/*.c, src/*/*.c, * Makefile.am, include/Makefile.am, include/thai/Makefile.am, * src/Makefile.am, src/*/Makefile.am, * tests/*.c, tests/*.sh, * tests/Makefile.am, * data/Makefile.am, * man/template.3, man/man3/*.3, * man/Makefile.am, man/man3/Makefile.am: Remove CVS $Id in all files. * src/thctype/wtt.c: Fix file name in comment. 2010-02-24 Theppitak Karoonboonyanan * include/thai/*.h, src/*.c, src/*/*.c: Add license header to every source file. 2010-02-23 Theppitak Karoonboonyanan * doc/Makefile.am: The added *.c dependencies must be from src/*/*.c, not include/thai/*.c. Silly copy-and-paste error. 2010-02-23 Theppitak Karoonboonyanan * doc/Makefile.am: Add *.c to doxygen.stamp dependency, as the doc comments have been moved there. 2010-02-17 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_root_pool): Fail if brk_get_dict() returns NULL. 2010-02-17 Theppitak Karoonboonyanan * data/tdict-{ict,proper,common,science}.txt: Add words. 2010-01-30 Theppitak Karoonboonyanan * src/thcoll/cweight.c, src/thcoll/cweight.h, src/thcoll/thcoll.c: Remove GPL header left over. The library is LGPL. 2010-01-25 Theppitak Karoonboonyanan * src/brk-maximal.c (brk_root_pool): Fail if brk_pool_node_new() returns NULL. * src/brk-maximal.c (brk_maximal_do_impl): Fail if best_brk_new() returns NULL. 2010-01-25 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. 2010-01-15 Theppitak Karoonboonyanan * configure.in: Bump library revision. * configure.in, NEWS: === Version 0.1.13 === 2010-01-15 Theppitak Karoonboonyanan * configure.in: Add 'AC_CONFIG_MACRO_DIR([m4])' for the new libtool. 2010-01-15 Theppitak Karoonboonyanan * src/thwchar/thwchar.c (th_tis2uni_line, th_uni2tis_line): Fix type of 'result' argument to match that in header file, so doxygen recognizes and generates doc for it. 2010-01-15 Theppitak Karoonboonyanan * data/tdict-{proper,common}.txt: Add words. 2010-01-14 Theppitak Karoonboonyanan * include/thai/thctype.h, src/thctype/thctype.c: Replace non-portable isascii() with most-significant bit check. 2010-01-14 Theppitak Karoonboonyanan * data/tdict-{common,geo}.txt: Add words. 2010-01-14 Theppitak Karoonboonyanan * include/thai/thailib.h, src/libthai.c: Move more doc comments. 2010-01-09 Theppitak Karoonboonyanan Move documentation from *.h to *.c, so libthai developers have the doc at hand. Users can still read the doxygen-generated doc BTW. * include/thai/*.h, src/*/*.c: Move doc comments from *.h to *.c. * doc/Doxyfile.in: Set INPUT to @top_srcdir@. 2010-01-08 Theppitak Karoonboonyanan * data/tdict-{proper,common,geo}.txt: Add words. 2010-01-08 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_shot_init): Return failure/success status. * src/thbrk/brk-maximal.c (brk_pool_node_new): Check & handle return code from brk_shot_init(). 2010-01-08 Theppitak Karoonboonyanan * src/thwbrk/thwbrk.c (th_wbrk, th_wbrk_line): Rewritten, for harmony with the rests of the source, plus potential integer overflow vulnerability removed. 2010-01-08 Theppitak Karoonboonyanan Fix potential integer overflow vulnerabilities. Thanks Tim Starling for pointing out. * src/thbrk/thbrk.c (th_brk_line): Return zero if malloc size overflows or malloc fails. * src/thbrk/brk-maximal.c (best_brk_new): Return NULL if malloc size overflows. 2009-08-22 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. * data/tdict-std-compound.txt: Remove loose compounds. * data/tdict-{ict,district,proper,common,science,lang-ethnic,geo}.txt: Add words. 2009-06-18 Theppitak Karoonboonyanan * configure.in: Bump library revision. * configure.in, NEWS: === Version 0.1.12 === 2009-06-18 Theppitak Karoonboonyanan * data/tdict-std-compound.txt: Remove loose compounds. 2009-06-18 Theppitak Karoonboonyanan * data/tdict-std-compound.txt: Remove loose compounds. * data/tdict-{ict,district,proper,common,science,geo}.txt: Add words. 2009-06-17 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maximal_do_impl, best_brk_contest): Back out weighted scoring for solution contest, as it sometimes tries too hard to minimize word count, even though another solution with more known words can match more text and results in smaller unknown zone. Accidental words problem, which this scheme tried to address, has already been alleviated with increased recovery threshold. 2009-06-17 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maximal_do_impl): Add break pos at string end and let it enter the contest, instead of backing it out by moving back str_pos as done in previous commit. That commit just missed the point. Doing the right thing also guarantees consistency with the case when some text of valid words is appended to the string. 2009-06-17 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maximal_do_impl): Set str_pos for cases that fail at string end back to the last progress position, to prevent it from wrongly winning the contest over some good candidates. 2009-06-17 Theppitak Karoonboonyanan * src/tkbrk/brk-maximal.c (RECOVERED_WORDS): Increase recovery threshold, 2 is too few and can trigger wrong recovery position too easily. 2009-06-15 Theppitak Karoonboonyanan * data/Makefile.am, +data/tdict-lang-ethnic.txt: Add a new category. * data/tdict-{common,geo}.txt, data/tdict-lang-ethnic.txt: Move some words to the new category. * data/tdict-{lang-ethnic,ict,spell,proper,common,science,geo}.txt: Add words. * data/tdict-std.txt: Remove a duplicated compound. 2009-06-13 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (best_brk_contest): Compare penalties with '<=' like in previous code, instead of '<', to retain longest-matching effect in case of equal scores. 2009-06-10 Theppitak Karoonboonyanan * data/tdict-std-compound.txt: Remove loose compounds. * data/tdict-{common,proper}.txt: Move a proper noun to tdict-proper. * data/tdict-{ict,spell,district,proper,common,science,geo}.txt: Add words. 2009-06-10 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maximal_do_impl, best_brk_contest): Use weighted scoring, rather than parameters prioritization, to choose better solution. 2009-06-10 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maximal_do_impl): When recovering from error, try to recover from the position next to the last break, rather than right at the crash position. We care the input string, not what it's being partially matched against. 2009-06-10 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_recover): When checking a character in text, brk pos returned from brk_recover_try() are relative to 'text + p', not 'text'. 2009-06-10 Theppitak Karoonboonyanan Split normal word break and recovery routines apart. * src/thbrk/brk-maximal.c (brk_maximal_do_impl, +brk_recover_try): - Split brk_maximal_do_impl() into itself with (do_recover == 1) and brk_recover_try() with (do_recover == 0), for logic simplification. - Strip unnecessary tasks like best break contests and penalties from brk_recover_try(). All it needs is existence of a solution, regardless of its quality. - Use memcpy() to copy brk pos arrays instead of for loop. * src/thbrk/brk-maximal.c (brk_maximal_do, brk_recover): - Adjust calls to brk_maximal_do_impl() and brk_recover_try() according to the changed prototypes. 2009-06-10 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maximal_do_impl): Add one missing change for last commit. 2009-06-10 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maximal_do_impl): Update 'shot' after 'node' is changed, instead of referencing through the possibly new 'node' value, for code readability. 2009-06-10 Theppitak Karoonboonyanan Beautify code. * src/thbrk/brk-maximal.c (th_brkpos_hints): Adjust indents. * src/thbrk/brk-maximal.c (brk_recover): Adjust brackets. 2009-06-10 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (th_brkpos_hints): Add check for cases like 'เฉพาะ'. 2009-06-07 Theppitak Karoonboonyanan * configure.in: Remove obsolete Win32 checking stuffs. 2009-06-07 Theppitak Karoonboonyanan Add symbol versioning. * configure.in: Check for -version-script support in linker. * +src/libthai.map: Add symbol map. * src/Makefile.am: - Conditionally supply -version-script, if supported, or -export-symbols otherwise, to linker. - Conditionally specify libthai.map or libthai.def as libthai.la dependency, according to corresponding linker option applied. - Include libthai.map in EXTRA_DISTS. 2009-06-05 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (th_brkpos_hints): Fix comment style to C, not C++. 2009-06-05 Theppitak Karoonboonyanan Fix wrong case of syllable like 'ช็อก'. * src/thbrk/brk-maximal.c (th_brkpos_hints): - The case of SARA E/AE with next third character as MAITAIKHU should not apply when the MAITAIKHU is followed by O ANG or WO WAEN. That's a signature of new syllable, such as 'ช็อก'. - Add the case of syllable like 'ช็อก' or 'ช็วก'. 2009-06-05 Theppitak Karoonboonyanan Refactor so brk hints is prepared only once per brk_maximal_do() call. * src/thbrk/brk-maximal.c (brk_maximal_do, +brk_maximal_do_impl): Split implementation code into _impl() function, and let the original brk_maximal_do() be a wrapper doing necessary preparations. * src/thbrk/brk-maximal.c (brk_recover): Call brk_maximal_do_impl() instead of brk_maximal_do(), passing the appropriate brk hints subarray to it. * src/thbrk/brk-maximal.{c,h} (brk_maximal_do): * src/thbrk/thbrk.c (th_brk): Drop the do_recover arg from brk_maximal_do(), as it's now a wrapper which can seed initial condition for the mutual recursion. * src/thbrk/brk-maximal.c (th_brkpos_hints): Add prototype. 2009-06-04 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (th_isleadable -> th_brkpos_hints): - Turn simple leadable check into a pre-processed break position hinting. Instead of doing the complicated checks on every loop, now the clients just check the prepared boolean array. - Add more complicated cases, such as Thai composite vowels. * src/thbrk/brk-maximal.c (brk_recover): - Change prototype to accept brk pos hints, so it doesn't need to be recalculated. - Drop additional checks of preceeding ldvowel, as it's already covered by the brk pos hints. * src/thbrk/brk-maximal.c (brk_maximal_do): - Pre-calculate brk pos hints on entry, and free it on exit. - Check the hints instead of calling the obsolete th_isleadable(). - Pass the hints to brk_recover(). 2009-06-03 Theppitak Karoonboonyanan * data/tdict-{ict,proper,common,science,geo}.txt: Add words. 2009-05-27 Theppitak Karoonboonyanan * data/tdict-{std-compound,common}.txt: Remove some loose compounds. * data/tdict-common.txt: Reorder some words. * data/tdict-{ict,proper,common,science,geo}.txt: Add words. 2009-05-24 Theppitak Karoonboonyanan * src/Makefile.am: Refactor and add sublibs to libthai.la dependencies, so the final library gets rebuilt when its sub-libraries are changed. 2009-05-22 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (th_isleadable): Consonant following MAITAIKHU is not always unleadable, such as after "ก็". So, drop the case. 2009-05-18 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (th_isleadable): Also check for certain vowels like MAITAIKHU, MAIHUNAKAT, SARA UEE in previous cell for non-leadable consonants. * src/thbrk/brk-maximal.c (brk_maximal_do, brk_recover): Adjust th_isleadable() calls according to prototype change. 2009-05-16 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (th_isleadable): Prohibit consonants with THANTHAKHAT from being leadable. * src/thbrk/brk-maximal.c (brk_maximal_do, brk_recover): Adjust th_isleadable() calls according to prototype change. 2009-05-15 Theppitak Karoonboonyanan * data/tdict-{ict,proper,common,science,geo}.txt: Add words. 2009-05-13 Theppitak Karoonboonyanan * data/Makefile.am, data/tdict-*.txt: Convert wordlists from TIS-620 to UTF-8, for easy management with modern tools. 2009-05-13 Theppitak Karoonboonyanan * data/tdict-std-compound.txt: Remove some loose compounds. * data/tdict-{ict,spell,district,proper,common,science,geo}.txt: Add words. 2009-05-07 Theppitak Karoonboonyanan * data/tdict-{ict,district,proper,common,science,geo}.txt: Add words. * data/tdict-proper.txt: Sort some words properly. 2009-04-29 Theppitak Karoonboonyanan * data/tdict-{ict,proper,common,science,geo}.txt: Add words. 2009-04-22 Theppitak Karoonboonyanan * configure.in: Post-release version prefix added. * data/tdict-{ict,district,proper,common,geo}.txt: Add words. * data/tdict-{common,collection}.txt: Move some collection of words to appropriate category. 2009-04-06 Theppitak Karoonboonyanan * configure.in: Bump library revision. * configure.in, NEWS: === Version 0.1.11 === 2009-04-05 Theppitak Karoonboonyanan * data/tdict-{ict,district,proper,common,science,geo}.txt: Add words. 2009-03-31 Theppitak Karoonboonyanan * configure.in: Post-release version suffix added. * libthai.pc.in: Require datrie-0.2 privately, to avoid exposing it unnecessarily to applications. This had even caused upgrading problem when libdatrie with different soname versions are loaded simultaneously, one from the application, and the other from libthai. And with libdatrie poor symbols handling, this had caused name clash. Thanks Loic Minier for pointing out the cause. 2009-03-30 Theppitak Karoonboonyanan * libthai.pc.in: Requires datrie-0.2 instead of datrie. * configure.in: Bump library revision. * configure.in, NEWS: === Version 0.1.10 === 2009-03-30 Theppitak Karoonboonyanan * tdict-{common,district,ict,proper}.txt: Add words. * tdict-std.txt: Remove 2 potentially ambiguous words. 2009-03-30 Theppitak Karoonboonyanan * tdict-{district,proper,common,geo}.txt: Add words. * tdict-collection.txt: Add another variant of "x" transcription. 2009-03-20 Theppitak Karoonboonyanan * data/tdict-std-compound.txt: Removed potentially ambiguous word. * data/tdict-{ict,district,science}.txt: Add words. 2009-03-12 Theppitak Karoonboonyanan * data/tdict-{common,geo,ict,spell}.txt: Add words. Move word from -common to -spell as appropriate. 2009-01-21 Theppitak Karoonboonyanan First SVN commit. * data/tdict-{common,geo}.txt: Add words. 2009-01-08 Theppitak Karoonboonyanan * data/tdict-common.txt, data/tdict-proper.txt: Move more proper nouns from tdict-common. * data/tdict-{collection,common,district,geo,ict,proper,science, spell}.txt: Add words. 2008-12-28 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_get_dict): Move comment to proper place. 2008-12-28 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_get_dict): Adjust code so that brk_dict is not checked twice for normal cases. 2008-12-28 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_maxmimal_do): Call trie_state_is_single() instead of trie_state_is_leaf() when a state is known to be terminal, eliminating redundant check. (Requires libdatrie >= 0.1.99.2+cvs20081228) 2008-12-27 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_shot_init, brk_shot_reuse, best_brk_contest): Use memcpy() to copy arrays, rather than iteration via array index. Yet another minor performance improvement. * src/thbrk/brk-maximal.c (brk_shot_reuse): Re-allocate 'brk_pos' array only when growing, and never bother shrinking it. 2008-12-26 Theppitak Karoonboonyanan Reuse break pool nodes more deeply, improving performance a little bit. (Requires libdatrie >= 0.1.99.2+cvs20081226) * src/thbrk/brk-maximal.c (brk_shot_copy -> brk_shot_init, +brk_shot_reuse): Split brk_shot_copy() into brk_shot_init() and brk_shot_reuse(). The former is for newly created nodes, while the latter is for reusing old nodes, with memory reallocation minimized. * src/thbrk/brk-maximal.c (brk_pool_node_new): Separately call brk_shot_init() and brk_shot_reuse() per case. * src/thbrk/brk-maximal.c (brk_pool_free): Do not destruct BrkShot member. Keep it for further reuses. * src/thbrk/brk-maximal.c (brk_pool_allocator_clear): ..And actually destruct it here. * src/thbrk/brk-maximal.c (brk_root_pool): Save one malloc() call for temporary 'root_shot'. 2008-12-26 Theppitak Karoonboonyanan * src/thbrk/brk-maximal.c (brk_pool_get_node, brk_maximal_do): Get rid of unused arg 's' for brk_pool_get_node(). 2008-12-26 Theppitak Karoonboonyanan * data/tdict-ict.txt: Add words. 2008-12-23 Theppitak Karoonboonyanan Add housekeeping routine to clean up dictionary. * src/Makefile.am, src/dummy.c -> libthai.c, src/libthai.def: Add destructor function in .fini section, to be called on library unload. * src/thbrk/Makefile.am, +src/thbrk/thbrk-private.h, src/thbrk/thbrk.c (thbrk_on_unload): Add on-unload function for thbrk module. * src/thbrk/brk-maximal.h, src/thbrk/brk-maximal.c (brk_get_dict, brk_maximal_on_unload): - Move 'brk_dict' static var from function to file scope - Add on-unload function to clean up dictionary 2008-12-23 Theppitak Karoonboonyanan * src/brk-maximal.c (brk_maximal_do): Adjust if-block a little bit. 2008-12-23 Theppitak Karoonboonyanan * data/tdict-{geo,ict}.txt: Add words. 2008-12-17 Theppitak Karoonboonyanan * data/Makefile.am, +data/tdict-proper.txt, data/tdict-common.txt: Split proper nouns from tdict-common.txt into a new list. * data/tdict-{common,proper}.txt: Add words 2008-12-16 Theppitak Karoonboonyanan * data/Makefile.am: No need to install 'thbrk.abm' any more. 2008-12-15 Theppitak Karoonboonyanan * data/tdict-{common,geo,ict}.txt: Add words. 2008-12-15 Theppitak Karoonboonyanan Switch to libdatrie 0.2. (Requires libdatrie >= 0.1.99.2) * configure.in: - Check for 'trietool-0.2' binary instead of 'trietool' - Check for 'datrie-0.2' pkg-config instead of 'datrie' * src/thbrk/brk-maximal.c: - Replace 'SBTrie' with 'Trie' - Replace 'SBTrieState' with 'TrieState' - Replace 'sb_trie_*' function calls with corresponding 'trie_*' - For 'sb_trie_state_walk()', also convert TIS-620 char to UCS-4 before actually walking - For 'sb_trie_open()', replace with 'trie_new_from_file()', plus dir + path combined * data/Makefile.am: - Replace '*.br' and '*.tl' targets with '*.tri' - Add '-e tis-620' command-line argument to trietool * data/Makefile.am, data/thbrk.sbm -> data/thbrk.abm: - Replace 'thbrk.sbm' (TIS-620) with 'thbrk.abm' (Unicode) * tests/Makefile.am: - Add 'libthwchar.la' where 'libthbrk.la' is used, as th_tis2uni() is required there 2008-06-24 Theppitak Karoonboonyanan * configure.in, Makefile.am: Add --disable-dict option to prevent dictionary data generation. 2007-10-18 Theppitak Karoonboonyanan * include/thai/thctype.h: * src/thcell/thcell.c: * src/thcoll/cweight.c: * src/thcoll/cweight.h: * src/thcoll/thcoll.c: * src/thctype/thctype.c: * src/thctype/wtt.c: * src/thinp/thinp.c: * src/thrend/thrend.c: * src/thstr/thstr.c: * src/thwchar/thwchar.c: * src/thwstr/thwstr.c: Update my e-mail address. 2007-10-17 Theppitak Karoonboonyanan * src/libthai.def: List only symbols in plain format, for Mac build. Thanks Vee Satayamas for the report. * src/Makefile.am: Add libthai.def as libthai_la_DEPENDENCIES. 2007-08-28 Theppitak Karoonboonyanan * NEWS: === Version 0.1.9 === 2007-08-28 Theppitak Karoonboonyanan * tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Update check values, according to the new compound words support. 2007-08-28 Theppitak Karoonboonyanan * doc/Doxyfile.in: Update for doxygen 1.5.3. 2007-08-22 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (O Ang - Ho Nokhuk) * data/tdict-{common,spell}.txt: Add words. 2007-08-15 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (Ho Hip) * data/tdict-{common,geo}.txt: Add words. 2007-08-07 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (So Sua) * data/tdict-{common,district,geo,spell}.txt: Add words. 2007-07-20 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (Wo Waen - So Rusi) * data/tdict-{common,ict,scicence,spell}.txt: Add words. 2007-07-12 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (Ro Rua - Lu) * data/tdict-{common,science}.txt: Add words. * data/tdict-district.txt: Move non-province names to the bottom, so they are separated from provinces. Add two more names. 2007-07-09 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (Mo Ma - Yo Yak) * data/tdict-{common,geo,ict,science,spell}.txt: Add words. 2007-07-06 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (Pho Phan - Pho Samphao) * data/tdict-{common,spell,ict}.txt: Add words. 2007-06-30 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove rare words. Rearrange and add compounds. (Tho Thahan - Fo Fa) * data/tdict-{common,spell}.txt: Add words. 2007-06-25 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Remove more rare words. Move some compound words from -std to -std-compound. Add some missing entries found. (Restarted from Ko Kai - Tho Thung. We need more space to continue adding compounds from last commit.) * data/tdict-{common,ict}.txt: Add words. 2007-06-21 Theppitak Karoonboonyanan * data/tdict-std[-compound].txt: Add more compound words. Move some compound words from -std to -std-compound. Remove some rare entries, to make room for more entries. (~80% done) * data/tdict-{common,ict}.txt: Add words. 2007-06-18 Theppitak Karoonboonyanan * data/Makefile.am, +data/tdict-std-compound.txt, data/tdict-std.txt: Split compound words into a new file. Selectively add compound words. (Half done.) * data/tdict-{common,ict,science,spell}.txt: Add words. 2007-06-12 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (th_brk): Don't break between CR and LF. Remove last break if at string end. * tests/test_th[w]brk.c (main): Update test values. 2007-06-11 Theppitak Karoonboonyanan Redesign itemization code for th_brk(), aiming at Unicode UAX #14 compatiblity. * src/thbrk/Makefile.am, +src/thbrk/brk-ctype.{c,h}: Add character classification table, as well as operation table for breaking between all class combinations. * src/thbrk/thbrk.c (th_brk): Rewrite the itemization code, based on the break class table. * tests/test_th[w]brk.c (main): Update test values. 2007-06-08 Theppitak Karoonboonyanan * configure.in: Post-release version bump. * data/tdict-{std,common}.txt: Add words. 2007-03-03 Theppitak Karoonboonyanan * NEWS: Updated. === Version 0.1.8 === 2007-03-03 Theppitak Karoonboonyanan * tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Fix word break check values, as white space handling has now been changed. 2007-03-03 Theppitak Karoonboonyanan * configure.in: Add AC_LIBTOOL_WIN32_DLL as required for Win32. * src/Makefile.am (libthai_la_LDFLAGS): Always pass -no-undefined to enforce all resolved symbols. Thanks Loïc Minier. 2007-03-03 Theppitak Karoonboonyanan * src/Makefile.am (EXTRA_DIST, libthai_la_LDFLAGS), +src/libthai.def: Add -export-symbols flag to limit exported symbols. Thanks Loïc Minier for suggestion. * src/thwstr/thwstr.c (th_wthaichunk): Declare the non-extern func as static. 2007-03-02 Theppitak Karoonboonyanan * data/tdict-common.txt: Added words. 2007-02-04 Theppitak Karoonboonyanan Yet another fix to white space bug in th_brk(), as spotted by Suppachoke Santiwitchaya. This is just temporary fix for use while the planned redesign does not happen. * src/thbrk/thbrk.c (th_brk): Allow break between Thai punct and white space. * src/thbrk/thbrk.c (is_breakable): Allow break between punct and white space. Remove rule that inhibited break between space and MAIYAMOK. It was not sufficient anyway, as the space before MAIYAMOK was still wrapped. * ChangeLog: Fix wrong date in previous commit. 2007-02-02 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (is_breakable): Allow break before white space. This fixes wrong treatment of whitespace in HTML in mozlibthai component, which caused glitches in webpages. 2007-01-13 Theppitak Karoonboonyanan * data/tdict-{common,geo,std}.txt: Added words. 2006-10-14 Theppitak Karoonboonyanan * configure.in: Post-release version bump. 2006-10-14 Theppitak Karoonboonyanan * ChangeLog: Converted to UTF-8. * NEWS: Updated. === Version 0.1.7 === 2006-10-14 Theppitak Karoonboonyanan * data/Makefile.am: Specify LC_ALL=C to make sure 'sort' always works. 2006-10-14 Theppitak Karoonboonyanan Fix 'make distcheck', plus a little enhancement on dict location. * src/thbrk/brk-maximal.c (brk_get_dict): Try openning dict at $LIBTHAI_DICTDIR environment before the default location. * tests/Makefile.am, +tests/test-thbrk.sh, +tests/test-thwbrk.sh: Added wrapper scripts to call test_th[w]brk programs with LIBTHAI_DICTDIR set to trie in build tree. * data/Makefile.am (EXTRA_DIST): Do not ship the auto-generated tdict.txt in source. 2006-10-14 Theppitak Karoonboonyanan * tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Rearrange source. Remove some unnecessary variables. Adjust style. Fix warnings. 2006-10-13 Theppitak Karoonboonyanan * tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Fix checking value for the length of the output from th_[w]brk_line() tests. 2006-10-11 Theppitak Karoonboonyanan * data/tdict-std.txt: Added some compound words. * data/tdict-{common,ict}.txt: Added words. 2006-10-01 Theppitak Karoonboonyanan * data/tdict-std.txt: Added some compound words. * data/tdict-{common,ict}.txt: Added words. 2006-09-19 Theppitak Karoonboonyanan * src/thbrk/Makefile.am (libthbrk_la_SOURCES), src/thbrk/thbrk.c, +src/thbrk/brk-maximal.{h,c}: Split low-level mechanisms to brk-maximal module. * -src/thbrk/cttex.c, -src/thbrk/dict2state.c: Removed unused files. 2006-09-19 Theppitak Karoonboonyanan * data/tdict-std.txt: Removed rare word (ยาจนก) which potentially causes wierd ambiguity. Added some more compound words. * data/tdict-{common,ict,spell}.txt: Added words. 2006-09-17 Theppitak Karoonboonyanan * TODO: Updated plan. Cleared what have been done. 2006-09-17 Theppitak Karoonboonyanan * data/tdict-std.txt: Removed rare words (มาระ, มาริ) which potentially cause weird ambiguities. Added some more compound words. * data/tdict-{common,geo}.txt: Added words. 2006-09-15 Theppitak Karoonboonyanan * data/tdict-std.txt: Added some compound words. * data/tdict-{common,district,geo,ict,spell}.txt: Added words. 2006-09-12 Theppitak Karoonboonyanan * data/tdict-std.txt: Removed rare word (มมาก) which potentially causes ambiguity. Moved entry (มาย) into its compound forms, as it alone can cause ambiguity. Added some more compound words. * data/tdict-{common,geo,ict,spell}.txt: Added words. 2006-09-11 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): Used is_breakable() to determine breakability at the end of Thai chunk, instead of hard-coded condition. * src/thbrk/thbrk.c (is_breakable): Added condition for Thai chunk ending. Also added condition so text is not breakable right after period, comma and semicolon. * data/tdict-std.txt: Broke "{เมทิล|เอทิล}แอลกอฮอล์" into two words. * data/tdict-{common,geo,science,spell}.txt: Added words. 2006-09-11 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (is_breakable): Added non-breakable cases: space + Mai Yamok; * + {right parenthesis|Khomut|...}. 2006-09-11 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (th_brk, is_breakable): Do not break after certain punctuations like left quote, left parenthesis, etc., also covering Paiyan Yai sequence. * data/tdict-common.txt: Removed Paiyan Yai. Added some more words. 2006-09-11 Theppitak Karoonboonyanan * data/tdict-std.txt: Removed rare word (ทิวสะ) that caused weird ambiguities. Added some compound words. * data/tdict-{common,geo,ict,spell}.txt: Added words. 2006-09-07 Theppitak Karoonboonyanan * data/tdict-{std,common,district,geo,ict,science,spell}.txt: Added words. 2006-09-06 Theppitak Karoonboonyanan * data/tdict-std.txt: Removed two rare words (การก, ผลอ) that caused weird ambiguities. Added some compound words. * data/tdict-{common,district,geo,ict,science}.txt: Added words. 2006-09-05 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_recover): Guarded against accessing beyond string end. * data/tdict-{std,common,geo,ict}.txt: Added more entries. 2006-09-03 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): Adjusted condition in previous change a little bit. 2006-09-03 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): (Optimization) In recovery mode, stop immediately when first solution is found. 2006-09-02 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_recover, brk_do): (Optimization) Remembered previous recovery result for reuse, cutting off a few repeated recoveries at the same position. * data/tdict-{std,common}.txt: Added entries. 2006-09-01 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (best_brk_contest): Adjusted condition so that equally scored solution that comes later overrides previous one. Longest matching is preferred as a result for such situation. * data/tdict-std.txt: Added three more words. 2006-09-01 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (th_isleadable): RU and LU are also leadable. And don't bother checking for Thai digits. They are never passed. * src/thbrk/thbrk.c (brk_do): Fixed wrong choosing of nodes with error at end of string. Added penalty for such cases, and made sure the break position is not marked. * data/tdict-{std,science,ict,common}.txt: Added & removed entries. 2006-09-01 Theppitak Karoonboonyanan * data/Makefile.am, +data/tdict-collection.txt, data/tdict-common.txt: Split collection sets into tdict-collection. * data/Makefile.am, +data/tdict-spell.txt, data/tdict-std.txt: Split common typos or variations into tdict-spell. * data/tdict-{std,common,ict,district,geo,science}.txt: Moved more words out of tdict-std. Removed more redundant entries. Fixed typos. Added more words. 2006-09-01 Theppitak Karoonboonyanan * src/thbrk/Makefile.am (dictdatadir): Added variable missed during the tdict split. * data/tdict-{std,ict,common}.txt: Moved some words out of tdict-std. Removed duplicated and redundant entries. Added some more words. 2006-08-31 Theppitak Karoonboonyanan * configure.in, Makefile.am, src/thbrk/Makefile.am, +data/Makefile.am, src/thbrk/tdict.sbm -> data/tdict.sbm, src/thbrk/tdict.txt -> data/tdict-{common,district,geo,ict,science,std}.txt: Moved tdict generation from source to data directory. 2006-08-31 Theppitak Karoonboonyanan === merged from datrie_wbrk-branch into HEAD === 2006-08-30 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): (Optimization) Unified the recovered node immediately. Chance is that it gets superseded, rather than picked up in later loop. Rearranged code to eliminate source duplications. 2006-08-30 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): (Optimization) When unifying converted nodes, clear all matches rather than just the first. 2006-08-30 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): (Optimization) When successfully recovered, stop walking immediately, increasing chance to be superseded earlier by better candidate. Also removed unnecessary check for str_pos < len, because it's guaranteed by brk_recover() when return value is not -1. 2006-08-29 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (+brk_pool_allocator_use, brk_pool_allocator_clear): Guarded the free list with ref count, for thread safety. * src/thbrk/thbrk.c (th_brk): Requested to use the break pool allocator at the beginning. 2006-08-29 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_pool_node_new, brk_pool_free_node): ~(Optimization) Kept freed BrkPool nodes for reuse in next allocation, reducing calls to malloc(). * src/thbrk/thbrk.c (+brk_pool_allocator_clear, th_brk): Cleared the free list when work is done. 2006-08-29 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): Calculated penalty for unrecoverable string with (len - recent break), not (strlen(s) - recent break). 2006-08-29 Theppitak Karoonboonyanan * configure.in (LT_REVISION): Incremented library revision. * src/thbrk/thbrk.c (brk_do): (Optimization) Do not contest best break when trie walking crashes in recover mode. It won't win recovery criterion anyway. Also got rid of one inner loop condition. 2006-08-29 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_recover): Do not try to recover after a leading vowel. 2006-08-26 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (th_brk): Tokenized mixed Thai-English text and called brk_do() chunk by chunk. * src/thbrk/thbrk.c (brk_do, brk_recover): Accepted string and length rather than null-terminated string, to support chunk-wise breaking. 2006-08-26 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_pool_delete): Adjusted code, for tiny performance improvement, esp. when deleting first node. 2006-08-25 Theppitak Karoonboonyanan * src/thbrk/tdict.txt: Manually revised word list. Removed some archaic or obsolete words. Added some new terms. 2006-08-24 Theppitak Karoonboonyanan * src/thbrk/thbrk.c: s/penulty/penalty/. :-P 2006-08-24 Theppitak Karoonboonyanan * src/thbrk/thbrk.c (brk_do): Calculated penalty more accurately by measuring distance from recent break pos, rather than the crash pos. Also added penalty on recovery failure. * src/thbrk/thbrk.c (best_brk_contest): Fixed boolean expression by adding parentheses. 2006-08-23 Theppitak Karoonboonyanan * libthai.pc.in: Added datrie to Requires. * src/thbrk/Makefile.am: Removed old dict before rebuilding. * src/thbrk/thbrk.c (th_brk_line): Added implementation. * src/thbrk/thbrk.c (brk_do): Be satisfied with terminal state only if the following character can begin a word. * src/thbrk/thbrk.c (BrkShot, BestBrk, brk_root_pool, brk_do, brk_shot_copy, best_brk_new, best_brk_contest): Added penulty on crash recovery, and considered it when contesting shots. This can prevent long crash shots from showing up as maximally matched. 2006-08-22 Theppitak Karoonboonyanan === begin of datrie_wbrk-branch === * configure.in, src/thbrk/Makefile.am, src/thbrk/thbrk.c, +src/thbrk/thbrk.sbm: Replaced old thbrk from cttex with my new version written from scratch. 2006-08-22 Theppitak Karoonboonyanan * configure.in: Post-release version bump. 2006-08-05 Theppitak Karoonboonyanan * NEWS, configure.in: Version 0.1.6. 2006-08-05 Theppitak Karoonboonyanan * configure.in: Update library version info, according to recent API additions. 2006-08-02 Theppitak Karoonboonyanan * configure.in: Disable debug by default. 2006-08-02 Theppitak Karoonboonyanan * configure.in: Enable doxygen by default. * configure.in, doc/Makefile.am: Use htmldocdir variable instead of pkgdocdir/docdir. Let default behavior install html under html/ subdir. * configure.in, Makefile.am: Exclude man/ subdir, and use doxygen-generated man pages instead. * doc/Doxyfile.in: Disable SHOW_INCLUDE_FILES, as showing include files included from the documented file just confuses readers in the man pages. 2006-08-02 Theppitak Karoonboonyanan * include/thai/thctype.h, man/man3/thctype.3: Clarify character level 2 and 3 in the document. 2006-08-02 Theppitak Karoonboonyanan * src/thinp/thinp.c: Use WTTOp enum instead of int, as per recent change in wtt.h 2006-08-01 Theppitak Karoonboonyanan * doc/Makefile.am: Rearrange the man pages. 2006-08-01 Theppitak Karoonboonyanan * doc/Makefile.am, doc/Doxyfile.in: Generate man links and selectively install man pages. 2006-08-01 Theppitak Karoonboonyanan * doc/Makefile.am: Use html.stamp target to prevent unneccessary generation of document. 2006-08-01 Theppitak Karoonboonyanan * include/thai/thctype.h: Fix HTML errors in documentation comments. * include/thai/thctype.h, man/man3/thctype.3: Fix typo. * include/thai/thrend.h: Fix errors in doxygen tags in comments. 2006-08-01 Theppitak Karoonboonyanan * include/thai/wtt.h, src/thctype/wtt.c: Add documentation from wtt.3. Use enum instead of #define. * man/man3/wtt.c: Revised. 2006-08-01 Theppitak Karoonboonyanan * include/thai/thctype.h: Add documentation taken from thctype.3. * man/man3/thctype.3: Revised. * doc/Doxyfile.in: Append "include" to STRIP_FROM_PATH. 2006-07-31 Theppitak Karoonboonyanan * include/thailib.h: Add main page using content from libthai.3. * man/man3/libthai.3: Fix typos. 2006-07-31 Theppitak Karoonboonyanan * include/thai/{thbrk,thcoll}.h: Fix typos in comments. * include/thai/{thwbrk,thwchar,thwcoll,thwctype,thwstr}.h: Edit comments for documentation. * include/thai/thwctype.h, src/thwctype/thwctype.c: Add missing functions corresponding to thctype: th_wcisthdiac, th_wcistaillesscons, th_wcisovershootcons, th_wcisundershootcons, th_wcisundersplitcons. 2006-07-31 Theppitak Karoonboonyanan * configure.in, Makefile.am, +doc/, +doc/Doxyfile.in, +doc/Makefile.am: Add document generation with doxygen. * include/thai/{thailib,thbrk,thcell,thcoll,thctype,thinp,thrend, thstr}.h: Edit comments for documentation. 2006-07-03 Theppitak Karoonboonyanan * src/thcoll/cweight.c (th_char_weight_()): Remove unnecessary check of c's range that always yields true. * src/thwchar/thwchar.c (uni2thai_ext_()): Fix loop condition that always yielded true due to range of data type. 2006-07-03 Theppitak Karoonboonyanan * src/thinp/thinp.c, src/thrend/thrend.c: Add missing . 2006-05-04 Theppitak Karoonboonyanan * configure.in, src/Makefile.am: Add check and condition for Win32 build with cygwin/mingw. 2006-03-13 Theppitak Karoonboonyanan * NEWS, configure.in: Version 0.1.5. 2006-03-13 Theppitak Karoonboonyanan * tests/test_thwbrk.c: Allocate more buffer for th_brk_line() output, fixing bogus bug that blocks the test. 2006-03-13 Theppitak Karoonboonyanan * tests/test_thwbrk.c: Print out the strings in problem when the result from th_brk_line() and th_wbrk_line() do not match. 2006-03-13 Theppitak Karoonboonyanan * tests/test_thcell.c: Fix bug in test_th_prev_cell() due to wrong variable referencing. 2006-02-18 Theppitak Karoonboonyanan * src/tkbrk/Makefile.am: Remove map.c from cttex sources, to fix compiling bug caused by building map.o with and without libtool. 2006-02-18 Theppitak Karoonboonyanan * configure.in, src/Makefile.am: Add -version-info into LDFLAGS. We start maintaining library version info from now on. 2005-10-28 Theppitak Karoonboonyanan * COPYING: Update FSF address. 2005-10-09 Theppitak Karoonboonyanan * configure.in: Formatted configure options help strings with AC_HELP_STRING(). Used --disable/--enable help style rather than --enable with default yes or no. 2005-03-15 Theppitak Karoonboonyanan * src/thrend/thrend.c: Add Mac PUAs for baseless Tho Than and Yo Ying, as well as lowered version of lower marks. * tests/test_thrend.c: Update tester accodingly. 2005-03-02 Pattara Kiatisevi * src/thbrk/: Debug information (default commented out) Some new words added to tdict.txt 2005-01-13 Pattara Kiatisevi * src/thbrk/thbrk.c: A space is considered and returned as a breakable point 2004-10-16 Theppitak Karoonboonyanan * NEWS, configure.in: Version 0.1.4. 2004-10-13 Theppitak Karoonboonyanan * src/thstr/thstr.c (th_normalize): Handle char of "Level 3". * src/thinp/thinp.c (th_validate): Handle correction of char of "Level 3". 2004-10-13 Theppitak Karoonboonyanan * src/thcell/thcell.c (th_next_cell): Check for acell.hilo before placing repeated SARA AM into the same cell. * tests/test_thcell.c: Add test case for repeated SARA AM. 2004-10-12 Theppitak Karoonboonyanan * tests/test_thctype.c (test_bool_funcs): Rewrite if-do-while loops with clearer while loops. Get rid of warning of range checking. * tests/test_thbrk.c, tests/test_thwbrk.c, test_thwchar.c: Add missing string.h includes. 2004-10-12 Theppitak Karoonboonyanan Handle the case of NIKHAHIT and MAITAIKHU, which can be in either above or top level. This behavior is defined as "level 3" in th_chlevel() returning value. * src/thcell/thcell.c (th_next_cell, th_prev_cell): Conditionally place character of weight 3 at above or top level. * src/thctype/thctype.h (th_chlevel): Document the "level 3". * src/thctype/thctype.c (_th_chlevel_tbl): Change levels of NIKHAHIT and MAITAIKHU to 3. * tests/test_thcell.c, tests/test_thrend.c: Add the test cases of NIKHAHIT and MAITAIKHU at top level. 2004-05-15 Pattara Kiatisevi * set the mode of cttex to firstmode=1 (we want it fast) 2004-04-08 Theppitak Karoonboonyanan * AUTHORS: Update e-mail addresses. 2004-02-22 Theppitak Karoonboonyanan * tests/test_thrend.c: Update check values as per new blank base glyph. * NEWS, configure.in: Version 0.1.3 2004-01-20 Theppitak Karoonboonyanan * include/thai/thrend.h, src/thrend/thrend.c: Use TH_BLANK_BASE_GLYPH as a base for floating upper/lower vowels. 2003-09-10 Theppitak Karoonboonyanan *** Stop using CVS-generated ChangeLog, as it's hard to read *** * include/thai/Makefile.am: List the missing thwstr.h. * man/man3/Makefile.am: Add EXTRA_DIST to include missing man pages. * man/template.3, man/man3/*.3: Change my e-mail address. * tests/test-thcoll.sh: Also remove out.txt, to pass 'make distcheck'. * TODO: Add TODO about normalization rules. * NEWS, configure.in: Version 0.1.2. 2003-03-23 Sunday 20:26 Theppitak Karoonboonyanan * ChangeLog (1.55), src/thctype/wtt.c (1.3): Fix TACio_op_[] data for (FV1,NON) and (FV1,LV). 2003-03-23 Sunday 20:23 Theppitak Karoonboonyanan * ChangeLog (1.54), include/thai/thcell.h (1.6), src/thcell/thcell.c (1.7): Add th_cell_init() function. 2003-02-02 Sunday 15:45 Pattara Kiatisevi * debian/: ex.package.doc-base (1.2), files (1.2), libthai0.postinst.debhelper (1.2), libthai0.postrm.debhelper (1.2), libthai0.substvars (1.2): -some more bullshits 2003-02-02 Sunday 15:44 Pattara Kiatisevi * debian/: conffiles.ex (1.2), cron.d.ex (1.2), emacsen-install.ex (1.2), emacsen-remove.ex (1.2), emacsen-startup.ex (1.2), init.d.ex (1.2), manpage.1.ex (1.2), manpage.sgml.ex (1.2), menu.ex (1.2), postinst.ex (1.2), postrm.ex (1.2), preinst.ex (1.2), prerm.ex (1.2), shlibs.local.ex (1.2), watch.ex (1.2): -remove these bullshits 2003-02-02 Sunday 15:35 Pattara Kiatisevi * debian/: README.Debian (1.1), changelog (1.1), conffiles.ex (1.1), control (1.1), copyright (1.1), cron.d.ex (1.1), dirs (1.1), docs (1.1), emacsen-install.ex (1.1), emacsen-remove.ex (1.1), emacsen-startup.ex (1.1), ex.package.doc-base (1.1), files (1.1), init.d.ex (1.1), libthai-dev.dirs (1.1), libthai-dev.files (1.1), libthai0.dirs (1.1), libthai0.files (1.1), libthai0.postinst.debhelper (1.1), libthai0.postrm.debhelper (1.1), libthai0.substvars (1.1), manpage.1.ex (1.1), manpage.sgml.ex (1.1), menu.ex (1.1), postinst.ex (1.1), postrm.ex (1.1), preinst.ex (1.1), prerm.ex (1.1), rules (1.1), shlibs.local.ex (1.1), watch.ex (1.1): -let's debianize it 2002-09-20 Friday 21:40 Theppitak Karoonboonyanan * libthai.pc.in (1.2): Fix Cflags to not descend to thai/ subdir in include path 2001-10-03 Wednesday 21:52 poonlap * man/man3/libthai.3 (1.1): Main manpage. 2001-09-30 Sunday 21:03 Theppitak Karoonboonyanan * ChangeLog (1.53), configure.in (1.18), include/thai/thwstr.h (1.2), src/Makefile.am (1.13), src/thwstr/.cvsignore (1.1), src/thwstr/Makefile.am (1.1), src/thwstr/thwstr.c (1.1): Add a quick-and-dirty implementation of thwstr. 2001-09-30 Sunday 18:48 Theppitak Karoonboonyanan * ChangeLog (1.52), tests/test_thwchar.c (1.4): Add tests for uni<->win/mac conversion. Use stderr in output messages. 2001-09-25 Tuesday 19:26 Theppitak Karoonboonyanan * ChangeLog (1.51), Makefile.am (1.4), configure.in (1.17), man/.cvsignore (1.1), man/Makefile.am (1.1), man/man3/.cvsignore (1.1), man/man3/Makefile.am (1.1): Add Makefiles for man pages. 2001-09-22 Saturday 15:19 poonlap * src/Makefile.am (1.12): Thwbrk is not listed in the library(typo-error), fixed. 2001-09-17 Monday 08:53 Theppitak Karoonboonyanan * AUTHORS (1.4), ChangeLog (1.50), include/thai/thwinp.h (1.4): Add K.Poonlap to AUTHORS. Rename thwciscompose() to thwcisaccept(). 2001-09-16 Sunday 21:59 poonlap * man/: template.3 (1.1), man3/thctype.3 (1.1), man3/wtt.3 (1.1): Create first man pages for thctype.3, wtt.3 Also template.3 2001-09-14 Friday 21:36 Theppitak Karoonboonyanan * ChangeLog (1.48), configure.in (1.16), include/thai/thwctype.h (1.3), src/Makefile.am (1.11), src/thwctype/.cvsignore (1.1), src/thwctype/Makefile.am (1.1), ChangeLog (1.49), src/thwctype/thwctype.c (1.1): Add a quick implementation of thwctype. 2001-09-14 Friday 21:07 Theppitak Karoonboonyanan * tests/: test_thbrk.c (1.5), test_thcell.c (1.5): Fix warning. 2001-09-14 Friday 21:05 Theppitak Karoonboonyanan * ChangeLog (1.47), src/thwchar/Makefile.am (1.3), src/thwchar/hashtbl.c (1.2), src/thwchar/hashtbl.h (1.2), src/thwchar/thwchar.c (1.7): Remove hashing. Use simple linear search, to get rid of memory leak. 2001-09-14 Friday 20:48 Theppitak Karoonboonyanan * ChangeLog (1.46), include/thai/thwchar.h (1.8), src/thwchar/thwchar.c (1.6): Use lazy creation for hash tables. 2001-09-14 Friday 00:54 Theppitak Karoonboonyanan * ChangeLog (1.45), include/thai/thwchar.h (1.7), src/thwchar/Makefile.am (1.2), src/thwchar/hashtbl.c (1.1), src/thwchar/hashtbl.h (1.1), src/thwchar/thwchar.c (1.5): Implement uni2winthai(), uni2macthai() with hash table added in hashtbl.[ch]. 2001-08-22 Wednesday 11:41 Theppitak Karoonboonyanan * ChangeLog (1.44), include/thai/wtt.h (1.2), src/thcell/thcell.c (1.6), src/thctype/wtt.c (1.2), src/thinp/thinp.c (1.8), src/thrend/thrend.c (1.10), src/thstr/thstr.c (1.3), src/thwchar/thwchar.c (1.4): Add comments. Check naming convention. Refine coding style. 2001-08-14 Tuesday 13:18 Theppitak Karoonboonyanan * ChangeLog (1.43), NEWS (1.5) (utags: r0_1_1): Release Version 0.1.1. 2001-08-14 Tuesday 13:16 Theppitak Karoonboonyanan * ChangeLog (1.42): Prepare to release Version 0.1.1. 2001-08-14 Tuesday 13:14 Theppitak Karoonboonyanan * configure.ac (1.3): Remove configure.ac. 2001-08-14 Tuesday 13:11 Theppitak Karoonboonyanan * configure.in (1.15), include/thai/thctype.h (1.8), src/dummy.c (1.2), src/thbrk/dict2state.c (1.3), src/thcoll/cweight.c (1.3), src/thcoll/cweight.h (1.3), src/thinp/thinp.c (1.7), src/thstr/thstr.c (1.2) (utags: r0_1_1): Require autoconf 2.50. Add --enable-ansi and --enable-debug flags. Turn on all warnings. Fix warnings. 2001-08-13 Monday 13:07 Theppitak Karoonboonyanan * ChangeLog (1.41), NEWS (1.4), README (1.6, r0_1_1), TODO (1.2, r0_1_1), configure.ac (1.2), configure.in (1.14): Version 0.1.1 2001-08-10 Friday 18:09 Theppitak Karoonboonyanan * ChangeLog (1.40), src/thrend/thrend.c (1.9, r0_1_1): Merge special case of NIKHAHIT with tone/ad in general. 2001-08-10 Friday 18:07 Theppitak Karoonboonyanan * ChangeLog (1.39), include/thai/thrend.h (1.8, r0_1_1), src/thrend/thrend.c (1.8), tests/Makefile.am (1.10, r0_1_1), tests/test_thrend.c (1.1, r0_1_1): Add test suite for thrend. Fix bugs in thrend: th_chlevel() typo; treat NIKHAHIT and MAITAIKHU with shift_left_tone_ad() in th_render_cell(); also shift top-level character left when collapsed with NIKHAHIT split from SARA_AM. 2001-08-10 Friday 15:25 Theppitak Karoonboonyanan * ChangeLog (1.38), include/thai/thcell.h (1.5, r0_1_1), src/thcell/thcell.c (1.5, r0_1_1): Allow null pointer to struct thcell_t to be passed in th_next_cell() and th_prev_cell(), where no cell data will be written back. 2001-08-09 Thursday 20:33 Theppitak Karoonboonyanan * ChangeLog (1.37), tests/test_thcell.c (1.4, r0_1_1), tests/test_thstr.c (1.3, r0_1_1): Add header comments. 2001-08-09 Thursday 20:27 Theppitak Karoonboonyanan * ChangeLog (1.35), tests/Makefile.am (1.9), ChangeLog (1.36), tests/test_thinp.c (1.1, r0_1_1): Add thinp test suite. 2001-08-09 Thursday 18:52 Theppitak Karoonboonyanan * ChangeLog (1.34), include/thai/thcell.h (1.4), src/thcell/thcell.c (1.4), src/thrend/thrend.c (1.7), tests/test_thcell.c (1.3): Change return types and arguments of thcell functions to reduce information redundancy. 2001-08-09 Thursday 18:13 Theppitak Karoonboonyanan * ChangeLog (1.33), tests/test_thcell.c (1.2): Add test_th_make_cells(). 2001-08-09 Thursday 17:55 Theppitak Karoonboonyanan * ChangeLog (1.32), tests/Makefile.am (1.8), tests/test_thcell.c (1.1): Add thcell test suite. 2001-08-09 Thursday 00:57 Theppitak Karoonboonyanan * ChangeLog (1.31), include/thai/thcell.h (1.3), include/thai/thinp.h (1.8, r0_1_1), include/thai/thrend.h (1.7), src/thcell/thcell.c (1.3), src/thinp/thinp.c (1.6), src/thrend/thrend.c (1.6): Rename struct thcell to struct thcell_t, for naming consistency. 2001-08-09 Thursday 00:52 Theppitak Karoonboonyanan * ChangeLog (1.30), include/thai/thinp.h (1.7), src/thinp/thinp.c (1.5): Change th_validate() to use a cell as context. Define struct thinpconv_t for describing input buffer conversion. 2001-08-08 Wednesday 20:58 Theppitak Karoonboonyanan * ChangeLog (1.29), include/thai/thcell.h (1.2), src/thcell/thcell.c (1.2): Add th_prev_cell(). 2001-08-08 Wednesday 19:49 Theppitak Karoonboonyanan * ChangeLog (1.28), autogen.sh (1.4, r0_1_1), configure.ac (1.1): Add result checking in autogen.sh. Add configure.ac for autoconf 2.50+. 2001-08-08 Wednesday 19:39 Theppitak Karoonboonyanan * ChangeLog (1.27), configure.in (1.13), include/thai/Makefile.am (1.5, r0_1_1), include/thai/thcell.h (1.1), include/thai/thrend.h (1.6), src/Makefile.am (1.10, r0_1_1), src/thcell/.cvsignore (1.1, r0_1_1), src/thcell/Makefile.am (1.1, r0_1_1), src/thcell/thcell.c (1.1), src/thrend/thrend.c (1.5): Split thcell functions into a separate module. 2001-08-08 Wednesday 11:45 Theppitak Karoonboonyanan * ChangeLog (1.26), src/thrend/thrend.c (1.4): Add guards for cells without base or hilo char. 2001-08-07 Tuesday 19:43 Theppitak Karoonboonyanan * ChangeLog (1.25), src/thinp/thinp.c (1.4): Remove uncommon predefined sequence correction. 2001-08-07 Tuesday 19:40 Theppitak Karoonboonyanan * ChangeLog (1.24), include/thai/thrend.h (1.5), include/thai/tis.h (1.2, r0_1_1), src/thrend/thrend.c (1.3): Fix NULL redefinition in tis.h. Change rendering function API to return total glyphs. Implement Win/Mac shaping. Implement string rendering functions. 2001-08-07 Tuesday 17:29 Theppitak Karoonboonyanan * ChangeLog (1.23), include/thai/thrend.h (1.4), src/thrend/thrend.c (1.2): Treat SARA_AM cases. Add is_decomp_am argument to determine its appearance. Redefine WTT "cell" for decomposed SARA_AM case. Add code for plain TIS rendering with SARA_AM concerned. 2001-08-07 Tuesday 16:12 Theppitak Karoonboonyanan * ChangeLog (1.22), include/thai/thctype.h (1.7), src/thctype/thctype.c (1.7, r0_1_1), tests/test_thctype.c (1.2, r0_1_1): Add consonant shapes classification in thctype, for the use of elegant renderers. 2001-08-06 Monday 21:08 Theppitak Karoonboonyanan * ChangeLog (1.21), configure.in (1.12), include/thai/thrend.h (1.3), src/Makefile.am (1.9), src/thinp/thinp.c (1.3), src/thrend/.cvsignore (1.1, r0_1_1), src/thrend/Makefile.am (1.1, r0_1_1), src/thrend/thrend.c (1.1): Add partial implementation for thrend (cell clustering). Change cell definition (WTT hilo byte encoding is too complicated). Use menomic TIS character names in thinp.c. 2001-08-05 Sunday 19:42 Theppitak Karoonboonyanan * ChangeLog (1.20), include/thai/thinp.h (1.6), src/thinp/thinp.c (1.2): Change th_validate() to take more context. Define conversion mechanism more deliberately. Implement th_validate(). 2001-08-04 Saturday 23:04 Pattara Kiatisevi * ChangeLog (1.19, r0_1_0): -before release 0.1.0 2001-08-04 Saturday 22:46 Theppitak Karoonboonyanan * include/thai/thailib.h (1.5, r0_1_1, r0_1_0): Resolve conflict. 2001-08-04 Saturday 22:32 Pattara Kiatisevi * include/thai/thailib.h (1.4): -sunCC prefers ((thchar_t) ~0) than ~((thchar_t) 0) (from P'Thep) 2001-08-04 Saturday 22:26 Pattara Kiatisevi * src/thwbrk/thwbrk.c (1.5), tests/test_thbrk.c (1.4) (utags: r0_1_1, r0_1_0): -casting 2001-08-04 Saturday 22:11 Pattara Kiatisevi * tests/test_thwbrk.c (1.3, r0_1_1, r0_1_0): -casting 2001-08-04 Saturday 22:08 Pattara Kiatisevi * tests/test_thwchar.c (1.3, r0_1_1, r0_1_0): -change comments to C-style 2001-08-04 Saturday 21:59 Pattara Kiatisevi * src/thbrk/thbrk.c (1.8, r0_1_1, r0_1_0): -fix missing casting between char * and unsigned char * 2001-08-04 Saturday 21:58 Theppitak Karoonboonyanan * include/thai/thwchar.h (1.6, r0_1_1, r0_1_0): Remove duplicated macro THCHAR_ERR (already defined in thailib.h) 2001-08-04 Saturday 21:45 Pattara Kiatisevi * src/: thbrk/thbrk.c (1.7), thwbrk/thwbrk.c (1.4): -change comments to C-style :( 2001-08-04 Saturday 21:40 Pattara Kiatisevi * tests/test_thbrk.c (1.3): -change comments to C-style :( 2001-08-04 Saturday 21:24 Theppitak Karoonboonyanan * tests/test_thstr.c (1.2, r0_1_0): Fix signed/unsigned char warning. 2001-08-04 Saturday 20:57 Theppitak Karoonboonyanan * src/thctype/thctype.c (1.6, r0_1_0): Remove weird characters. 2001-08-04 Saturday 20:41 Pattara Kiatisevi * ChangeLog (1.18): -hmm, do I always need to commit this ChangeLog? 2001-08-04 Saturday 20:35 Pattara Kiatisevi * tests/test_thwchar.c (1.2): -remove wprintf, useless 2001-08-04 Saturday 20:29 Theppitak Karoonboonyanan * ChangeLog (1.17), NEWS (1.3, r0_1_0): More detailed info in NEWS. 2001-08-04 Saturday 20:16 Pattara Kiatisevi * NEWS (1.2), README (1.5, r0_1_0): -move release information to NEWS 2001-08-04 Saturday 20:11 Pattara Kiatisevi * README (1.4): -more info in RELEASE INFORMATION 2001-08-04 Saturday 20:07 Pattara Kiatisevi * README (1.3), AUTHORS (1.3, r0_1_1, r0_1_0): -more descriptive 2001-08-04 Saturday 20:04 Theppitak Karoonboonyanan * configure.in (1.11), include/thai/Makefile.am (1.4), include/thai/thinp.h (1.5), include/thai/wtt.h (1.1, r0_1_1), src/Makefile.am (1.8), src/thctype/Makefile.am (1.2, r0_1_1), src/thctype/wtt.c (1.1, r0_1_1), src/thinp/.cvsignore (1.1, r0_1_1), src/thinp/Makefile.am (1.1, r0_1_1), src/thinp/thinp.c (1.1) (utags: r0_1_0): Add WTT 2.0 tables. Partially implement thinp. 2001-08-04 Saturday 18:08 Theppitak Karoonboonyanan * AUTHORS (1.2), ChangeLog (1.16), README (1.2), TODO (1.1, r0_1_0), configure.in (1.10): Release first version 0.1.0 2001-08-03 Friday 18:20 Theppitak Karoonboonyanan * ChangeLog (1.15), src/thbrk/Makefile.am (1.7, r0_1_1, r0_1_0), src/thbrk/cttex.c (1.3, r0_1_1, r0_1_0), src/thbrk/thbrk.c (1.6), tests/Makefile.am (1.7, r0_1_0): Make thbrk and cttex use thstr instead of fixline() and adj(). Fix a memory leak. 2001-08-03 Friday 17:51 Theppitak Karoonboonyanan * ChangeLog (1.14), configure.in (1.9), include/thai/Makefile.am (1.3), include/thai/thstr.h (1.1, r0_1_1, r0_1_0), include/thai/thwstr.h (1.1, r0_1_1, r0_1_0), src/Makefile.am (1.7), src/thctype/thctype.c (1.5), src/thstr/.cvsignore (1.1, r0_1_1, r0_1_0), src/thstr/Makefile.am (1.1, r0_1_1, r0_1_0), src/thstr/thstr.c (1.1, r0_1_0), tests/.cvsignore (1.5, r0_1_1, r0_1_0), tests/Makefile.am (1.6), tests/test_thstr.c (1.1): Add thstr, thwstr API set. Add th_normalize() implementation (extracted from cttex). Fix level table for top characters in thctype. 2001-08-01 Wednesday 20:13 Pattara Kiatisevi * src/thwbrk/thwbrk.c (1.3): -new implementation, not convert cutCode to tis620 back and forth 2001-08-01 Wednesday 05:21 Pattara Kiatisevi * tests/test_thbrk.c (1.2): -cosmetic on error messages 2001-08-01 Wednesday 05:20 Pattara Kiatisevi * src/thbrk/thbrk.c (1.5): -fix const casting problem 2001-07-31 Tuesday 22:03 Theppitak Karoonboonyanan * ChangeLog (1.13), src/thwbrk/thwbrk.c (1.2), tests/test_thwbrk.c (1.2): Fix changes affected by the revised thwchar. 2001-07-31 Tuesday 22:01 Theppitak Karoonboonyanan * src/thwchar/thwchar.c (1.3, r0_1_1, r0_1_0): Revise thwchar. 2001-07-31 Tuesday 20:51 Theppitak Karoonboonyanan * tests/test_thctype.c (1.1, r0_1_0): Add test_thctype.c to repository. 2001-07-31 Tuesday 20:50 Theppitak Karoonboonyanan * ChangeLog (1.12), tests/.cvsignore (1.4): Add .cvsignore entries in tests/. 2001-07-31 Tuesday 20:47 Theppitak Karoonboonyanan * include/thai/thctype.h (1.6, r0_1_0), src/thctype/thctype.c (1.4), tests/Makefile.am (1.5): Add test program for thctype. Fix many bugs in thctype. 2001-07-31 Tuesday 18:14 Theppitak Karoonboonyanan * ChangeLog (1.11), include/thai/thctype.h (1.5), src/thcoll/thcoll.c (1.3, r0_1_1, r0_1_0), src/thctype/thctype.c (1.3): Refine thctype interface & implementation (remove win/mac checking, add combining char checking, fix vowel class bitmasks). Make thcoll use thctype. 2001-07-30 Monday 18:43 Pattara Kiatisevi * src/thwbrk/.cvsignore (1.1, r0_1_1, r0_1_0): -.cvsignore 2001-07-30 Monday 18:42 Pattara Kiatisevi * configure.in (1.8): -add thwbrk 2001-07-30 Monday 18:41 Pattara Kiatisevi * tests/: Makefile.am (1.4), test_thwbrk.c (1.1), test_thwchar.c (1.1): -test drivers for thwbrk and thwchar 2001-07-30 Monday 18:40 Pattara Kiatisevi * src/thwbrk/: Makefile.am (1.1, r0_1_1, r0_1_0), thwbrk.c (1.1): -thwbrk implementation 2001-07-30 Monday 18:38 Pattara Kiatisevi * src/: Makefile.am (1.6), thwchar/thwchar.c (1.2): -thwchar uni2tis620-0 implementation 2001-07-30 Monday 18:37 Pattara Kiatisevi * include/thai/: thbrk.h (1.4, r0_1_1, r0_1_0), thwbrk.h (1.3, r0_1_1, r0_1_0), thwchar.h (1.5): -add new interfaces and descriptions. 2001-07-27 Friday 17:43 Theppitak Karoonboonyanan * configure.in (1.7), include/thai/thwchar.h (1.4), src/Makefile.am (1.5), src/thwchar/.cvsignore (1.1, r0_1_1, r0_1_0), src/thwchar/Makefile.am (1.1, r0_1_1, r0_1_0), src/thwchar/thwchar.c (1.1): Add half-implementation of thwchar. 2001-07-27 Friday 17:17 Theppitak Karoonboonyanan * ChangeLog (1.10), src/thbrk/.cvsignore (1.4, r0_1_1, r0_1_0), tests/.cvsignore (1.3): Update .cvsignore's in accordance with test_thbrk 2001-07-27 Friday 17:12 Theppitak Karoonboonyanan * src/thbrk/Makefile.am (1.6), src/thbrk/test_thbrk.c (1.5), tests/Makefile.am (1.3), tests/test_thbrk.c (1.1): Move test_thbrk to tests/. 2001-07-25 Wednesday 09:11 Theppitak Karoonboonyanan * .cvsignore (1.3, r0_1_1, r0_1_0), ChangeLog (1.9), include/.cvsignore (1.2, r0_1_1, r0_1_0), include/thai/.cvsignore (1.2, r0_1_1, r0_1_0), src/.cvsignore (1.2, r0_1_1, r0_1_0), src/thbrk/.cvsignore (1.3), src/thcoll/.cvsignore (1.2, r0_1_1, r0_1_0), src/thctype/.cvsignore (1.2, r0_1_1, r0_1_0), tests/.cvsignore (1.2): List more auto-generated files to .cvsignore's. 2001-07-25 Wednesday 04:31 Pattara Kiatisevi * include/thai/thbrk.h (1.3), src/thbrk/test_thbrk.c (1.4), src/thbrk/thbrk.c (1.4): -modified the th_brk_line to accept cutCode as char* instead of int. -modified according to the changed interface the test program. 2001-07-24 Tuesday 21:23 Theppitak Karoonboonyanan * Makefile.am (1.3, r0_1_1, r0_1_0), configure.in (1.6), libthai.pc.in (1.1, r0_1_1, r0_1_0): Add pkg-config metadata. 2001-07-24 Tuesday 20:14 Theppitak Karoonboonyanan * autogen.sh (1.3, r0_1_0): Remove ineffective warning message about 'configure'. 2001-07-24 Tuesday 20:00 Theppitak Karoonboonyanan * include/.cvsignore (1.1), include/thai/.cvsignore (1.1), src/.cvsignore (1.1), src/thbrk/.cvsignore (1.2), src/thcoll/.cvsignore (1.1), src/thctype/.cvsignore (1.1), tests/.cvsignore (1.1): Add .cvsignore to subdirs. 2001-07-24 Tuesday 19:51 Theppitak Karoonboonyanan * COPYING (1.3, r0_1_1, r0_1_0), src/thbrk/Makefile (1.3): Change license to LGPL. Remove excessive Makefile. 2001-07-24 Tuesday 19:49 Theppitak Karoonboonyanan * autogen.sh (1.2), configure.in (1.5), src/thbrk/Makefile (1.2): Remove 'configure' invoking from autogen.sh, output some message. 2001-07-24 Tuesday 19:41 Theppitak Karoonboonyanan * .cvsignore (1.2), COPYING (1.2), ChangeLog (1.8), INSTALL (1.2), Makefile.in (1.7), aclocal.m4 (1.3), config.guess (1.2), config.sub (1.2), configure (1.4), install-sh (1.2), ltmain.sh (1.2), missing (1.2), mkinstalldirs (1.2), include/Makefile.in (1.6), include/thai/Makefile.in (1.6), src/Makefile.in (1.6), src/thbrk/Makefile.in (1.6), src/thcoll/Makefile.in (1.6), src/thctype/Makefile.in (1.5), tests/Makefile.in (1.6): Remove autotool-generated files. 2001-07-24 Tuesday 19:31 Theppitak Karoonboonyanan * .cvsignore (1.1), Makefile.in (1.6), autogen.sh (1.1), include/Makefile.in (1.5), include/thai/Makefile.in (1.5), src/Makefile.in (1.5), src/thbrk/Makefile.in (1.5), src/thcoll/Makefile.in (1.5), src/thctype/Makefile.in (1.4), tests/Makefile.in (1.5): Add .cvsignore and autogen.sh 2001-07-23 Monday 04:44 Pattara Kiatisevi * src/thbrk/Makefile.am (1.5): -changed to use dynamic link instead of static for test_thwbk 2001-07-23 Monday 04:32 Pattara Kiatisevi * src/thbrk/: test_thbrk.c (1.3), thbrk.c (1.3): -fix the segfault in char* stuff 2001-07-23 Monday 04:30 Pattara Kiatisevi * src/thbrk/.cvsignore (1.1): -testing 2001-07-19 Thursday 06:43 Pattara Kiatisevi * src/thbrk/test_thbrk.c (1.2): -change to be non-interactive unless "-i" option is specified. -still segfault, not yet finished 2001-07-17 Tuesday 12:23 Chanop Silpa-Anan * ChangeLog (1.7): latest updated 2001-07-17 Tuesday 12:20 Chanop Silpa-Anan * src/thbrk/: Makefile.am (1.4), Makefile.in (1.4): merge map.c and map.h into one target, save typo error 2001-07-17 Tuesday 12:00 Chanop Silpa-Anan * src/thbrk/: Makefile.am (1.3), Makefile.in (1.3), map.c (1.2), map.h (1.2): Dynamically generate map.c and map.h in a build directory. Remove map.c and map.h from the repository. 2001-07-17 Tuesday 10:54 Theppitak Karoonboonyanan * Makefile.in (1.5), include/Makefile.in (1.4), include/thai/Makefile.in (1.4), src/Makefile.in (1.4), src/thbrk/Makefile.am (1.2), src/thbrk/Makefile.in (1.2), src/thcoll/Makefile.in (1.4), src/thctype/Makefile.in (1.3), tests/Makefile.in (1.4): Fix dependencies for test_thbrk, so map.c won't always be rebuilt. 2001-07-17 Tuesday 05:30 Pattara Kiatisevi * ChangeLog (1.6): -ChangeLog is now automatically generated by manually run ../bin/cvs2cl.pl :) 2001-07-17 Tuesday 05:27 Pattara Kiatisevi * ChangeLog (1.5): -add cvs2cl.pl 2001-07-16 Monday 17:54 Theppitak Karoonboonyanan * ChangeLog (1.4), Makefile.in (1.4), aclocal.m4 (1.2), configure (1.3), configure.in (1.4), include/Makefile.in (1.3), include/thai/Makefile.in (1.3), src/Makefile.am (1.4), src/Makefile.in (1.3), src/thbrk/Makefile.am (1.1), src/thbrk/Makefile.in (1.1), src/thbrk/Makefile.ott (1.1, r0_1_1, r0_1_0), src/thbrk/cttex.c (1.2), src/thbrk/dict2state.c (1.2, r0_1_0), src/thbrk/map.c (1.1), src/thbrk/map.h (1.1), src/thbrk/thbrk.c (1.2), src/thcoll/Makefile.in (1.3), src/thctype/Makefile.in (1.2), tests/Makefile.in (1.3): Add automake files for building thbrk. 2001-07-15 Sunday 23:35 Pattara Kiatisevi * include/thai/thbrk.h (1.2): -add th_brk_line interface 2001-07-15 Sunday 23:31 Pattara Kiatisevi * src/thbrk/: Makefile (1.1), README.TXT (1.1, r0_1_1, r0_1_0), cttex.c (1.1), dict2state.c (1.1), tdict.txt (1.1, r0_1_1, r0_1_0), test_thbrk.c (1.1), thbrk.c (1.1): -first version. -wrapper for P'Hui's cttex. 2001-07-09 Monday 21:54 Pattara Kiatisevi * src/thbrk/tdict-preparation/: Makefile (1.1), README.TXT (1.1), dictsort (1.1), dictsort.c (1.1), tdict.local (1.1), tdict.nonsorted (1.1), tdict.org (1.1), tdict.txt (1.1), nectec/adjv.txt (1.1), nectec/archaic.txt (1.1), nectec/conj.txt (1.1), nectec/exclam.txt (1.1), nectec/formal.txt (1.1), nectec/idiom.txt (1.1), nectec/local.txt (1.1), nectec/nibat.txt (1.1), nectec/nouns.txt (1.1), nectec/obsolete.txt (1.1), nectec/poem.txt (1.1), nectec/prefix.txt (1.1), nectec/prep.txt (1.1), nectec/pron.txt (1.1), nectec/raja.txt (1.1), nectec/riheads.txt (1.1), nectec/riheads_org.txt (1.1), nectec/riwords.txt (1.1), nectec/riwords_org.txt (1.1), nectec/slang.txt (1.1), nectec/suffix.txt (1.1), nectec/verb.txt (1.1) (utags: r0_1_1, r0_1_0): -tdict.txt preparation part from cttex with some modifications 2001-06-15 Friday 19:01 Theppitak Karoonboonyanan * ChangeLog (1.3), Makefile.in (1.3), include/thai/thctype.h (1.4), src/thctype/thctype.c (1.2): Add th_isdiac() API. Add function body for thctype functions. 2001-06-14 Thursday 10:29 Theppitak Karoonboonyanan * ChangeLog (1.2): Add ChangeLog item. 2001-06-13 Wednesday 22:35 Theppitak Karoonboonyanan * Makefile.in (1.2), configure (1.2), configure.in (1.3), include/Makefile.in (1.2), include/thai/Makefile.in (1.2), include/thai/thctype.h (1.3), src/Makefile.am (1.3), src/Makefile.in (1.2), src/blank.c (1.2), src/dummy.c (1.1, r0_1_0), src/thcoll/Makefile.in (1.2), src/thctype/Makefile.am (1.1), src/thctype/Makefile.in (1.1), src/thctype/thctype.c (1.1), tests/Makefile.in (1.2): Add thctype implementation (first draft, so dirty) 2001-06-12 Tuesday 20:18 Theppitak Karoonboonyanan * Makefile.am (1.2), configure.in (1.2), include/Makefile.am (1.2, r0_1_1, r0_1_0), include/thai/Makefile.am (1.2), src/Makefile.am (1.2), src/thcoll/Makefile.am (1.2, r0_1_1, r0_1_0), tests/Makefile.am (1.2): Add $Id: ChangeLog,v 1.171 2009-01-08 14:58:07 thep Exp $ 2001-06-12 Tuesday 20:14 Theppitak Karoonboonyanan * src/thcoll/cweight.c (1.2, r0_1_0), src/thcoll/cweight.h (1.2, r0_1_0), src/thcoll/thcoll.c (1.2), tests/test-thcoll.sh (1.2, r0_1_1, r0_1_0), tests/thsort.c (1.2, r0_1_1, r0_1_0): Add $Id: ChangeLog,v 1.171 2009-01-08 14:58:07 thep Exp $ 2001-06-12 Tuesday 20:12 Theppitak Karoonboonyanan * AUTHORS (1.1), COPYING (1.1), ChangeLog (1.1), INSTALL (1.1), Makefile.am (1.1), Makefile.in (1.1), NEWS (1.1), README (1.1), aclocal.m4 (1.1), config.guess (1.1), config.sub (1.1), configure (1.1), configure.in (1.1), install-sh (1.1), ltmain.sh (1.1), missing (1.1), mkinstalldirs (1.1), include/Makefile.am (1.1), include/Makefile.in (1.1), include/thai/Makefile.am (1.1), include/thai/Makefile.in (1.1), src/Makefile.am (1.1), src/Makefile.in (1.1), src/blank.c (1.1), src/cweight.c (1.2), src/cweight.h (1.2), src/thcoll.c (1.2), src/thcoll/Makefile.am (1.1), src/thcoll/Makefile.in (1.1), src/thcoll/cweight.c (1.1), src/thcoll/cweight.h (1.1), src/thcoll/thcoll.c (1.1), tests/Makefile.am (1.1), tests/Makefile.in (1.1), tests/sorted.txt (1.1, r0_1_1, r0_1_0), tests/test-thcoll.sh (1.1): add building system 2001-06-11 Monday 21:16 Theppitak Karoonboonyanan * src/cweight.c (1.1), src/cweight.h (1.1), src/thcoll.c (1.1), tests/sorttest.txt (1.1, r0_1_1, r0_1_0), tests/thsort.c (1.1): add a draft thcoll implementation 2001-05-18 Friday 11:51 Theppitak Karoonboonyanan * include/thai/: thailib.h (1.3), thcoll.h (1.3, r0_1_1, r0_1_0), thinp.h (1.4), thwcoll.h (1.3, r0_1_1, r0_1_0), thwinp.h (1.3, r0_1_1, r0_1_0): size_t definition in , strict_t to thstrict_t in 2001-05-18 Friday 01:15 Theppitak Karoonboonyanan * include/thai/thinp.h (1.3): Change the spec (comment) of th_validate() 2001-05-18 Friday 01:07 Theppitak Karoonboonyanan * include/thai/thwchar.h (1.3): fix arg & return types of uni2* 2001-05-18 Friday 01:04 Theppitak Karoonboonyanan * include/thai/thbrk.h (1.1): use thbrk.h for 8-bit code, to prevent confusion 2001-05-18 Friday 00:58 Theppitak Karoonboonyanan * include/thai/: thailib.h (1.2), thcoll.h (1.2), thctype.h (1.2), thinp.h (1.2), thrend.h (1.2, r0_1_0), thwbrk.h (1.2), thwchar.h (1.2), thwcoll.h (1.2), thwctype.h (1.2, r0_1_1, r0_1_0), thwinp.h (1.2), thwrend.h (1.2, r0_1_1, r0_1_0): Add comments. Fix many declarations. Add thglyph_t. 2001-05-17 Thursday 23:12 Theppitak Karoonboonyanan * include/thai/: thailib.h (1.1.1.1), thcoll.h (1.1.1.1), thctype.h (1.1.1.1), thinp.h (1.1.1.1), thrend.h (1.1.1.1), thwbrk.h (1.1.1.1), thwchar.h (1.1.1.1), thwcoll.h (1.1.1.1), thwctype.h (1.1.1.1), thwinp.h (1.1.1.1), thwrend.h (1.1.1.1), tis.h (1.1.1.1, r0_1_0) (utags: libthai_0_1): first draft API 2001-05-17 Thursday 23:12 Theppitak Karoonboonyanan * include/thai/: thailib.h (1.1), thcoll.h (1.1), thctype.h (1.1), thinp.h (1.1), thrend.h (1.1), thwbrk.h (1.1), thwchar.h (1.1), thwcoll.h (1.1), thwctype.h (1.1), thwinp.h (1.1), thwrend.h (1.1), tis.h (1.1): Initial revision