CS 432/532 Web Science Spring 2017 http://phonedude.github.io/cs532-s17/ Assignment #9 Due: 11:59pm May 1 2017 Support your answer: include all relevant discussion, assumptions, examples, etc. 1. Choose a blog or a newsfeed (or something similar with an Atom or RSS feed). Every student should do a unique feed, so please "claim" the feed on the class email list (first come, first served). It should be on a topic or topics of which you are qualified to provide classification training data. Find something with at least 100 entries (or items if RSS). Create between four and eight different categories for the entries in the feed: examples: work, class, family, news, deals liberal, conservative, moderate, libertarian sports, local, financial, national, international, entertainment metal, electronic, ambient, folk, hip-hop, pop Download and process the pages of the feed as per the week 12 class slides. Be sure to upload the raw data (Atom or RSS) to your github account. Create a table with 100 rows, like: title classification ----- -------------- Ric Ocasek - 80s "Something To Grab For" (forgotten song) Weezer - "Pinkerton" alternative (LP Review) Schon & Hammer - 80s "No More Lies" (forgotten song) etc. This is your "ground truth" (or "gold standard") data. 2. Train the Fisher classifier on the first 50 entries (the "training set"), then use the classifier to guess the classification of the next 50 entries (the "test set"). Create a table with 50 rows, like title actual predicted ----- ------ --------- Donnie Iris - 80s 80s "Ah! Leah!" (Forgotten Song) Black Sabbath - metal metal "Vol. 4" (LP Review) Catherine Wheel - alternative metal "Ferment" (LP Review) Assess the performance of your classifier in each of your categories by computing precision, recall, and F-measure. Use the "macro-averaged" label based method, as per: http://stats.stackexchange.com/questions/21551/how-to-compute-precision-recall-for-multiclass-multilabel-classification For example, if you have 5 categories (e.g., 80s, metal, alternative, electronic, cover), you will compute precision, recall, and F-measure for each category, and then compute the average across the 5 categories. 3. Repeat question #2, but use the first 90 entries to train your classifier and the last 10 entries for testing. =================================================================== ========The questions below is for 5 points extra credit=========== =================================================================== 4. Rerun question 3, but with "10-fold cross validation". What was the change, if any, in precision and recall (and thus F-Measure)?