Text statisticsΒΆ
nn_search2 can display some useful text statistics. For example, the number of tokens, sentences, most common ngrams, POS-tags distribution, lexical diversity (how many unique words are used), sentiment polarity (how positive or negative is the text), subjectivity and correctness. Statistic properties have a scale of [0, 1] or [-1, 1] where 0 and 1 are min and max values. For example “polarity” of 0.65 means the text is rather positive than negative and vice versa.
Statistics panel is represented by 3 buttons: “Numbers”, “Graphs” and “Search stats”.
“Numbers” displays the information about the number of tokens, words, sentences, subjectivity, lexical diversity, polarity and correctness. This button uses the output of TextBlob library which unfortunately appeared to be rather slow. It might take a while to calculate all this info so be patient.
“Graphs” displays POS-tags distribution, functional vs normal word plots. It also displays most frequent ngram counts. nn_search2 does not count 1-grams like “be”, “the”, “a”, etc. Only contentful words are included all functional are excluded from ngram calculations.
“Search stats” displays the number of matches found by the search query and the ratio of matched text length / total text length.