]> 2008-04-10 created 2011-11-28 fixed typo (PR vs. RP) 2011-11-31 added Punctuation (not addressed by Santorini 1990) Christian Chiarcos, chiarcos@uni-potsdam.de OLiA Annotation Model for Penn Treebank (PTB) part-of-speech annotation (Santorini 1990) Unless specified otherwise, all comments are taken from Santorini (1990). References Beatrice Santorini (1990), Part-of-Speech tagging guidelines for the Penn Treebank Project, 3rd revision, 2nd printing, ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz These are adjectives, ordinal numerals, and ordinal numbers. Hyphenated compounds that are used as modifiers are tagged as adjectives, e.g. "happy-go-lucky", "one-of-a-kind", "run-of-the-mill". Ordinal numbers are tagged as adjectives, as are compounds of the form "n-th" or "X-est", like "fourth-largest". This category includes most words that end in -ly as well as degree words like "quite", "too" and "very", posthead modifiers like "enough" and "indeed" (as in "good enough", "very well indeed"), and negative markers like "not", "n' t" and "never". This tag subsumes imperatives, infinitives and subjunctives. EXAMPLES: Imperative: Do/VB it. Infinitive: You should do/VB it. We want them to do/VB it. We made them do/VB it. Subjunctive: We suggested that he do/VB it. This category includes the conditional form of the verb to be. EXAMPLES: If I were/VBD rich... If I were/VBD to win the lottery... These are verbforms in present tense. This class includes 3rd person singular or other than 3rd person singular verbforms. These are cardinal numbers. These are common nouns in singular or plural, or mass nouns. These are adjectives mostly with the comparative ending -er and a comparative meaning. "More" or "less" should be tagged as a comparative adjective when it is used without a head noun and it corresponds to the object of a verb or preposition. These are comparative adverbs. This category includes "and", "but", "nor", "or", "yet" (as in "Yet it's cheap", "cheap yet good"), as well as the mathematical operators "plus", "minus", "less", "times" (in the sense of "multiplied by") and "over" (in the sense of "divided by"), when they are spelled out. For in the sense of "because" is a coordinating conjunction. This category includes the articles "a(n)", "every", "no" and "the", the indefinite determiners "another", "any" and "some", "each", "either" (as in "either way"), "neither" (as in "neither decision"), "that", "these", "this" and "those", and instances of "all" and "both" when they do not precede a determiner or possessive pronoun (as in "all roads" or "both times"). Existential "there" is the unstressed "there" that triggers inversion of the inflected verb and the logical subject of a sentence, e.g. "There/EX was a party in progress.", "There/EX ensued a melee.". These are foreign words. This is a verb in present participle or in gerund. This category includes "my" (as in "My, what a gorgeous day"), "oh", "please", "see" (as in "See it's like this"), "uh", "well" and "yes", among others. This category includes letters and numerals when they are used to identify items in a list. This category includes all verbs that don't take an -s ending in the third person singular present: "can", "could", ("dare"), "may", "might", "must", "ought", "shall", "should", "will", "would". This is a class, we inserted to structure the tagset. This category includes a number of mostly monosyllabic words that also double as directional adverbs and prepositions. This is a verb in past participle. This category includes the personal pronouns proper, without regard for case distinctions ("I", "me", "you", "he", "him", etc.), the reflexive pronouns ending in -self or -selves, and the nominal possessive pronouns "mine", "yours", "his", "hers", "ours" and "theirs". The possessive ending on nouns ending in 's or is split off by the tagging algorithm and tagged as if it were a separate word. e.g. "John/ NP 's/POS idea", "the parents/NNS'/POS distress". This category includes the adjectival possessive forms "my", "your", "his", "her", "its", "one's", "our" and "their". This category includes the following determinerlike elements when they precede an article or possessive pronoun. EXAMPLES: all/PDT his marbles nary/PDT a soul both/PDT the girls quite/PDT a mess half/PDT his time rather/PDT a nuisance many/PDT a moon such/PDT a good time We make no explicit distinction between prepositions and subordinating conjunctions. (The distinction is not lost, however - a preposition is an IN that precedes a noun phrase or a prepositional phrase, and a subordinate conjunction is an IN that precedes a clause). The preposition "to" has its own special tag TO. This is a class, we inserted to structure the tagset. These are singular or plural proper nouns. pos Not addressed by Santorini (1990), but produced by real-world taggers, also cf. http://www.cis.upenn.edu/~treebank/tokenization.html for the treatment of PTB punctuation tags in the parsed Penn Treebank. These are adjectives with the superlative ending -est (as well as "worst"). "Most" and "least" can also be tagged as superlative adjective when they occur by themselves. These are superlative adverbs. This tag should be used for mathematical, scientific and technical symbols or expressions that aren't words of English. It should not used for any and all technical expressions. For instance, the names of chemicals, units of measurements (including abbreviations thereof) and the like should be tagged as nouns. pos "To" is tagged TO, regardless of whether it is a preposition or an infinitival marker. This category includes "how", "where", "why", etc. When in a temporal sense is tagged as wh-adverb. In the sense of "if", on the other hand, it is a subordinating conjunction. EXAMPLES: "When/WRB he finally arrived, I was on my way out." This category includes "which", as well as that when it is used as a relative pronoun. This category includes "what", "who" and "whom". CC CD DT determiner, also article EX FW IN JJ adjective, ordinal numeral, ordinal number JJR JJS LS MD NN Noun singular or mass NNP Proper noun singular NNPS Proper noun plural NNS Noun plural PDT POS PP PP is the used Tag in "Part-of-Speech Tagging Guidelines for the Penn Treebank Project", Beatrice Santorini, 15.03.1991" (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/Penn-Treebank-Tagset.ps 21.11.07) PP$ PP$is the used Tag in "Part-of-Speech Tagging Guidelines for the Penn Treebank Project", Beatrice Santorini, 15.03.1991" (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/Penn-Treebank-Tagset.ps 21.11.07) PRP PRP is the used Tag in "Part of Speech Tagging Guidelines for the Penn Treebank Project, June 1990" (ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz 21.11.07) PRP$ PRP$ is the used Tag in "Part of Speech Tagging Guidelines for the Penn Treebank Project, June 1990" (ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz 21.11.07) RB adverb, negation RBR RBS RP SYM TO UH interjections, exclamation VB VBD VBG VBN VBP Verb non 3-rd person singular present VBZ Verb 3-rd person singular present WDT WP WP$ WRB '' : , $ " `` . { -LCB- ( -LRB- -LSB- [ -RCB- } -RRB- ) ] -RSB-