/** * @typedef {[string, number]} TrigramTuple * @typedef {TrigramTuple[]} TrigramTuples * @typedef {Record} TrigramDictionary */ import {trigram} from 'n-gram' import {collapseWhiteSpace} from 'collapse-white-space' const own = {}.hasOwnProperty /** * Clean `value`. * Removed general non-important (as in, for language detection) punctuation * marks, symbols, and digits. * * @param {string|null} [value] * @returns {string} */ export function clean(value) { if (value === null || value === undefined) { return '' } return collapseWhiteSpace(String(value).replace(/[\u0021-\u0040]+/g, ' ')) .trim() .toLowerCase() } /** * Get clean, padded, trigrams. * * @param {string} [value] * @returns {string[]} */ export function trigrams(value) { return trigram(' ' + clean(value) + ' ') } /** * Get an `Object` with trigrams as its attributes, and their occurence count as * their values. * * @param {string} value * @returns {TrigramDictionary} */ export function asDictionary(value) { const values = trigrams(value) /** @type {TrigramDictionary} */ const dictionary = {} let index = -1 while (++index < values.length) { if (own.call(dictionary, values[index])) { dictionary[values[index]]++ } else { dictionary[values[index]] = 1 } } return dictionary } /** * Get an `Array` containing trigram--count tuples from a given value. * * @param {string} value * @returns {TrigramTuples} */ export function asTuples(value) { const dictionary = asDictionary(value) /** @type {TrigramTuples} */ const tuples = [] /** @type {string} */ let trigram for (trigram in dictionary) { if (own.call(dictionary, trigram)) { tuples.push([trigram, dictionary[trigram]]) } } tuples.sort(sort) return tuples } /** * Get an `Array` containing trigram--count tuples from a given value. * * @param {TrigramTuples} tuples * @returns {TrigramDictionary} */ export function tuplesAsDictionary(tuples) { /** @type {TrigramDictionary} */ const dictionary = {} let index = -1 while (++index < tuples.length) { dictionary[tuples[index][0]] = tuples[index][1] } return dictionary } /** * Deep regular sort on item at `1` in both `Object`s. * * @param {TrigramTuple} a * @param {TrigramTuple} b * @returns {number} */ function sort(a, b) { return a[1] - b[1] }