Coverage for nltk.classify.megam : 56%
![](keybd_closed.png)
Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
# Natural Language Toolkit: Interface to Megam Classifier # # Copyright (C) 2001-2012 NLTK Project # Author: Edward Loper <edloper@gradient.cis.upenn.edu> # URL: <http://www.nltk.org/> # For license information, see LICENSE.TXT
A set of functions used to interface with the external megam_ maxent optimization package. Before megam can be used, you should tell NLTK where it can find the megam binary, using the ``config_megam()`` function. Typical usage:
.. doctest:: :options: +SKIP
>>> from nltk.classify import megam >>> megam.config_megam() # pass path to megam if not found in PATH [Found megam: ...]
Use with MaxentClassifier. Example below, see MaxentClassifier documentation for details.
nltk.classify.MaxentClassifier.train(corpus, 'megam')
.. _megam: http://www.cs.utah.edu/~hal/megam/ """
except ImportError: numpy = None
###################################################################### #{ Configuration ######################################################################
""" Configure NLTK's interface to the ``megam`` maxent optimization package.
:param bin: The full path to the ``megam`` binary. If not specified, then nltk will search the system for a ``megam`` binary; and if one is not found, it will raise a ``LookupError`` exception. :type bin: str """ global _megam_bin 'megam', bin, env_vars=['MEGAM', 'MEGAMHOME'], binary_names=['megam.opt', 'megam', 'megam_686', 'megam_i686.opt'], url='http://www.cs.utah.edu/~hal/megam/')
###################################################################### #{ Megam Interface Functions ######################################################################
bernoulli=True, explicit=True): """ Generate an input file for ``megam`` based on the given corpus of classified tokens.
:type train_toks: list(tuple(dict, str)) :param train_toks: Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.
:type encoding: MaxentFeatureEncodingI :param encoding: A feature encoding, used to convert featuresets into feature vectors. May optionally implement a cost() method in order to assign different costs to different class predictions.
:type stream: stream :param stream: The stream to which the megam input file should be written.
:param bernoulli: If true, then use the 'bernoulli' format. I.e., all joint features have binary values, and are listed iff they are true. Otherwise, list feature values explicitly. If ``bernoulli=False``, then you must call ``megam`` with the ``-fvals`` option.
:param explicit: If true, then use the 'explicit' format. I.e., list the features that would fire for any of the possible labels, for each token. If ``explicit=True``, then you must call ``megam`` with the ``-explicit`` option. """ # Look up the set of labels.
# Write the file, which contains one line per instance. # First, the instance number (or, in the weighted multiclass case, the cost of each label). stream.write(':'.join(str(encoding.cost(featureset, label, l)) for l in labels)) else:
# For implicit file formats, just list the features that fire # for this instance's actual label. _write_megam_features(encoding.encode(featureset, label), stream, bernoulli)
# For explicit formats, list the features that would fire for # any of the possible labels. else: stream, bernoulli)
# End of the instance.
""" Given the stdout output generated by ``megam`` when training a model, return a ``numpy`` array containing the corresponding weight vector. This function does not currently handle bias features. """ if numpy is None: raise ValueError('This function requires that numpy be installed') assert explicit, 'non-explicit not supported yet' lines = s.strip().split('\n') weights = numpy.zeros(features_count, 'd') for line in lines: if line.strip(): fid, weight = line.split() weights[int(fid)] = float(weight) return weights
raise ValueError('MEGAM classifier requires the use of an ' 'always-on feature.') elif fval != 0: raise ValueError('If bernoulli=True, then all' 'features must be binary.') else: stream.write(' %s %s' % (fid, fval))
""" Call the ``megam`` binary with the given arguments. """ raise TypeError('args should be a list of strings')
# Call megam via a subprocess cmd = [_megam_bin] + args p = subprocess.Popen(cmd, stdout=subprocess.PIPE) (stdout, stderr) = p.communicate()
# Check the return code. if p.returncode != 0: print() print(stderr) raise OSError('megam command failed!')
return stdout
|