text_one_hot.Rd
One-hot encode a text into a list of word indexes in a vocabulary of size n.
text_one_hot(text, n, filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE, split = " ")
text | Input text (string). |
---|---|
n | Size of vocabulary (integer) |
filters | Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines. |
lower | Whether to convert the input to lowercase. |
split | Sentence split marker (string). |
List of integers in [1, n]
. Each integer encodes a word (unicity
non-guaranteed).
Other text preprocessing: make_sampling_table
,
pad_sequences
, skipgrams
,
text_hashing_trick
,
text_to_word_sequence