Experimental Setup

We used documents from the movie review dataset and ran 3-fold cross validation in a number of test configurations. We ignored case and treated punctuation marks as separate lexical items.

Our testbed supported testing various parameters: frequency vs. presence of features vs. term frequency-inverse document frequency, unigrams vs. bigrams vs. both, number of features, and type of feature tagging. The types of feature tagging were negation, part of speech (POS), and position. We additionally supported training and testing on only adjectives and verbs. We additionally supported the ability to use the full movie dataset as a training set and using the yelp dataset as a test set. Please see the end for the full test results.

Pranjal Vachaspati 2012-02-05