Coverage for nltk.ccg.lexicon : 82%
![](keybd_closed.png)
Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
# Natural Language Toolkit: Combinatory Categorial Grammar # # Copyright (C) 2001-2012 NLTK Project # Author: Graeme Gange <ggange@csse.unimelb.edu.au> # URL: <http://www.nltk.org/> # For license information, see LICENSE.TXT
#------------ # Regular expressions used for parsing components of the lexicon #------------
# Parses a primitive category and subscripts
# Separates the next primitive category from the remainder of the # string
# Separates the next application operator from the remainder
# Parses the definition of the category of either a word or a family
# Strips comments from a line
#---------- # Lexicons #---------- ''' Class representing a lexicon for CCG grammars. primitives - The list of primitive categories for the lexicon families - Families of categories entries - A mapping of words to possible categories '''
# Returns all the possible categories for a word
# Returns the target category for the parser
# String representation of the lexicon # Used for debugging st = "" first = True for ident in self._entries: if not first: st = st + "\n" st = st + ident + " => "
first = True for cat in self._entries[ident]: if not first: st = st + " | " else: first = False st = st + str(cat) return st
#----------- # Parsing lexicons #-----------
# Separates the contents matching the first set of brackets # from the rest of the input.
(part,rest) = matchBrackets(rest) inside = inside + part else: raise AssertionError('Unmatched bracket in string \'' + string + '\'')
# Separates the string for the next portion of the category # from the rest of the string
# Parses an application operator
# Parses the subscripts for a primitive category
# Parse a primitive category # If the primitive is the special category 'var', # replace it with the correct CCGVar
else: cat = cat.substitute([(cvar,var)])
raise AssertionError('String \'' + catstr + '\' is neither a family nor primitive category.')
# parseCategory drops the 'var' from the tuple return augParseCategory(line,primitives,families)[0]
# Parses a string representing a category, and returns # a tuple with (possibly) the CCG variable for the category
else: # print rePrim.match(str).groups()
else:
# Takes an input string, and converts it into a lexicon for CCGs. # Strip comments and leading/trailing whitespace.
# A line of primitive categories. # The first line is the target category # ie, :- S, N, NP, VP else: # Either a family definition, or a word definition # Family definition # ie, Det :: NP/N else: # Word definition # ie, which => (N\N)/(S/NP)
# Rather minimal lexicon based on the openccg `tinytiny' grammar. # Only incorporates a subset of the morphological subcategories, however. :- S,NP,N # Primitive categories Det :: NP/N # Determiners Pro :: NP IntransVsg :: S\\NP[sg] # Tensed intransitive verbs (singular) IntransVpl :: S\\NP[pl] # Plural TransVsg :: S\\NP[sg]/NP # Tensed transitive verbs (singular) TransVpl :: S\\NP[pl]/NP # Plural
the => NP[sg]/N[sg] the => NP[pl]/N[pl]
I => Pro me => Pro we => Pro us => Pro
book => N[sg] books => N[pl]
peach => N[sg] peaches => N[pl]
policeman => N[sg] policemen => N[pl]
boy => N[sg] boys => N[pl]
sleep => IntransVsg sleep => IntransVpl
eat => IntransVpl eat => TransVpl eats => IntransVsg eats => TransVsg
see => TransVpl sees => TransVsg ''') |