{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "# Analysing the Edinburgh Fringe Festival Jokes\n", "\n", "**This is the ipython notebook that revisits the blog post: [Python, natural language processing and predicting funny](http://vknight.org/unpeudemath/code/2015/06/14/natural-language-and-predicting-funny/)**.\n", "\n", "I am updating the data to include the jokes from [2015](http://www.bbc.co.uk/news/uk-scotland-edinburgh-east-fife-34039927).\n", "\n", "Here are the libraries we are going to need:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import pandas # To handle our data nicely\n", "import nltk # For all the clever stuff" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "## Loading and tidying the data" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import csv\n", "df = pandas.read_csv('jokes.csv', quotechar='\"', skipinitialspace=True) #" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | Year | \n", "Author | \n", "Rank | \n", "Raw_joke | \n", "
---|---|---|---|---|
0 | \n", "2015 | \n", "Darren Walsh | \n", "1 | \n", "I just deleted all the German names off my pho... | \n", "
1 | \n", "2015 | \n", "Stewart Francis | \n", "2 | \n", "Kim Kardashian is saddled with a huge arse ...... | \n", "
2 | \n", "2015 | \n", "Adam Hess | \n", "3 | \n", "Surely every car is a people carrier? | \n", "
3 | \n", "2015 | \n", "Masai Graham | \n", "4 | \n", "What's the difference between a 'hippo' and a ... | \n", "
4 | \n", "2015 | \n", "Dave Green | \n", "5 | \n", "If I could take just one thing to a desert isl... | \n", "
\n", " | Year | \n", "Author | \n", "Rank | \n", "Raw_joke | \n", "
---|---|---|---|---|
65 | \n", "2009 | \n", "Adam Hills | \n", "6 | \n", "Going to Starbucks for coffee is like going to... | \n", "
66 | \n", "2009 | \n", "Marcus Brigstocke | \n", "7 | \n", "To the people who've got iPhones: you just bou... | \n", "
67 | \n", "2009 | \n", "Rhod Gilbert | \n", "8 | \n", "A spa hotel? It's like a normal hotel, only in... | \n", "
68 | \n", "2009 | \n", "Dan Antopolski | \n", "9 | \n", "I've been reading the news about there being a... | \n", "
69 | \n", "2009 | \n", "Simon Brodkin | \n", "10 | \n", "I started so many fights at my school - I had ... | \n", "
\n", " | Year | \n", "Author | \n", "Rank | \n", "Raw_joke | \n", "Joke | \n", "
---|---|---|---|---|---|
0 | \n", "2015 | \n", "Darren Walsh | \n", "1 | \n", "I just deleted all the German names off my pho... | \n", "[DELETED, GERMAN, NAMES, PHONE, HANS, FREE] | \n", "
1 | \n", "2015 | \n", "Stewart Francis | \n", "2 | \n", "Kim Kardashian is saddled with a huge arse ...... | \n", "[KIM, KARDASHIAN, SADDLED, HUGE, ARSE, ENOUGH,... | \n", "
2 | \n", "2015 | \n", "Adam Hess | \n", "3 | \n", "Surely every car is a people carrier? | \n", "[SURELY, EVERY, CAR, PEOPLE, CARRIER] | \n", "
3 | \n", "2015 | \n", "Masai Graham | \n", "4 | \n", "What's the difference between a 'hippo' and a ... | \n", "[DIFFERENCE, HIPPO, ZIPPO, ONE, REALLY, HEAVY,... | \n", "
4 | \n", "2015 | \n", "Dave Green | \n", "5 | \n", "If I could take just one thing to a desert isl... | \n", "[COULD, TAKE, ONE, THING, DESERT, ISLAND, PROB... | \n", "
\n", " | Year | \n", "Author | \n", "Rank | \n", "Raw_joke | \n", "Joke | \n", "Features | \n", "Funny | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "2015 | \n", "Darren Walsh | \n", "1 | \n", "I just deleted all the German names off my pho... | \n", "[DELETED, GERMAN, NAMES, PHONE, HANS, FREE] | \n", "{u'contains(DELETED)': False, u'contains(PHONE... | \n", "True | \n", "
1 | \n", "2015 | \n", "Stewart Francis | \n", "2 | \n", "Kim Kardashian is saddled with a huge arse ...... | \n", "[KIM, KARDASHIAN, SADDLED, HUGE, ARSE, ENOUGH,... | \n", "{u'contains(ENOUGH)': False, u'contains(KANYE)... | \n", "True | \n", "
2 | \n", "2015 | \n", "Adam Hess | \n", "3 | \n", "Surely every car is a people carrier? | \n", "[SURELY, EVERY, CAR, PEOPLE, CARRIER] | \n", "{u'contains(EVERY)': False, u'contains(PEOPLE)... | \n", "True | \n", "
3 | \n", "2015 | \n", "Masai Graham | \n", "4 | \n", "What's the difference between a 'hippo' and a ... | \n", "[DIFFERENCE, HIPPO, ZIPPO, ONE, REALLY, HEAVY,... | \n", "{u'contains(DIFFERENCE)': False, u'contains(HI... | \n", "True | \n", "
4 | \n", "2015 | \n", "Dave Green | \n", "5 | \n", "If I could take just one thing to a desert isl... | \n", "[COULD, TAKE, ONE, THING, DESERT, ISLAND, PROB... | \n", "{u'contains(THING)': True, u'contains(WOULDN)'... | \n", "True | \n", "
\n", " | Year | \n", "Author | \n", "Rank | \n", "Raw_joke | \n", "Joke | \n", "Features | \n", "Funny | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "2015 | \n", "Darren Walsh | \n", "1 | \n", "I just deleted all the German names off my pho... | \n", "[DELETED, GERMAN, NAMES, PHONE, HANS, FREE] | \n", "{u'contains(DELETED)': False, u'contains(PHONE... | \n", "True | \n", "
1 | \n", "2015 | \n", "Stewart Francis | \n", "2 | \n", "Kim Kardashian is saddled with a huge arse ...... | \n", "[KIM, KARDASHIAN, SADDLED, HUGE, ARSE, ENOUGH,... | \n", "{u'contains(ENOUGH)': False, u'contains(KANYE)... | \n", "True | \n", "
2 | \n", "2015 | \n", "Adam Hess | \n", "3 | \n", "Surely every car is a people carrier? | \n", "[SURELY, EVERY, CAR, PEOPLE, CARRIER] | \n", "{u'contains(EVERY)': False, u'contains(PEOPLE)... | \n", "True | \n", "
3 | \n", "2015 | \n", "Masai Graham | \n", "4 | \n", "What's the difference between a 'hippo' and a ... | \n", "[DIFFERENCE, HIPPO, ZIPPO, ONE, REALLY, HEAVY,... | \n", "{u'contains(DIFFERENCE)': False, u'contains(HI... | \n", "True | \n", "
4 | \n", "2015 | \n", "Dave Green | \n", "5 | \n", "If I could take just one thing to a desert isl... | \n", "[COULD, TAKE, ONE, THING, DESERT, ISLAND, PROB... | \n", "{u'contains(THING)': True, u'contains(WOULDN)'... | \n", "True | \n", "
5 | \n", "2015 | \n", "Mark Nelson | \n", "6 | \n", "Jesus fed 5,000 people with two fishes and a l... | \n", "[JESUS, FED, 5, 000, PEOPLE, TWO, FISHES, LOAF... | \n", "{u'contains(5)': False, u'contains(PEOPLE)': T... | \n", "False | \n", "
6 | \n", "2015 | \n", "Tom Parry | \n", "7 | \n", "Red sky at night. Shepherd's delight. Blue sky... | \n", "[RED, SKY, NIGHT, SHEPHERD, DELIGHT, BLUE, SKY... | \n", "{u'contains(BLUE)': False, u'contains(DAY)': T... | \n", "False | \n", "
7 | \n", "2015 | \n", "Alun Cochrane | \n", "8 | \n", "The first time I met my wife, I knew she was a... | \n", "[FIRST, TIME, MET, WIFE, KNEW, KEEPER, WEARING... | \n", "{u'contains(KNEW)': False, u'contains(MASSIVE)... | \n", "False | \n", "
8 | \n", "2015 | \n", "Simon Munnery | \n", "9 | \n", "Clowns divorce. Custardy battle | \n", "[CLOWNS, DIVORCE, CUSTARDY, BATTLE] | \n", "{u'contains(BATTLE)': False, u'contains(CUSTAR... | \n", "False | \n", "
9 | \n", "2015 | \n", "Grace The Child | \n", "10 | \n", "They're always telling me to live my dreams. B... | \n", "[RE, ALWAYS, TELLING, LIVE, DREAMS, WANT, NAKE... | \n", "{u'contains(DREAMS)': False, u'contains(LIVE)'... | \n", "False | \n", "
\n", " | Year | \n", "Author | \n", "Rank | \n", "Raw_joke | \n", "Joke | \n", "Features | \n", "Funny | \n", "Labeled_Feature | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "2015 | \n", "Darren Walsh | \n", "1 | \n", "I just deleted all the German names off my pho... | \n", "[DELETED, GERMAN, NAMES, PHONE, HANS, FREE] | \n", "{u'contains(DELETED)': False, u'contains(PHONE... | \n", "True | \n", "({u'contains(DELETED)': False, u'contains(PHON... | \n", "
1 | \n", "2015 | \n", "Stewart Francis | \n", "2 | \n", "Kim Kardashian is saddled with a huge arse ...... | \n", "[KIM, KARDASHIAN, SADDLED, HUGE, ARSE, ENOUGH,... | \n", "{u'contains(ENOUGH)': False, u'contains(KANYE)... | \n", "True | \n", "({u'contains(ENOUGH)': False, u'contains(KANYE... | \n", "
2 | \n", "2015 | \n", "Adam Hess | \n", "3 | \n", "Surely every car is a people carrier? | \n", "[SURELY, EVERY, CAR, PEOPLE, CARRIER] | \n", "{u'contains(EVERY)': False, u'contains(PEOPLE)... | \n", "True | \n", "({u'contains(EVERY)': False, u'contains(PEOPLE... | \n", "
3 | \n", "2015 | \n", "Masai Graham | \n", "4 | \n", "What's the difference between a 'hippo' and a ... | \n", "[DIFFERENCE, HIPPO, ZIPPO, ONE, REALLY, HEAVY,... | \n", "{u'contains(DIFFERENCE)': False, u'contains(HI... | \n", "True | \n", "({u'contains(DIFFERENCE)': False, u'contains(H... | \n", "
4 | \n", "2015 | \n", "Dave Green | \n", "5 | \n", "If I could take just one thing to a desert isl... | \n", "[COULD, TAKE, ONE, THING, DESERT, ISLAND, PROB... | \n", "{u'contains(THING)': True, u'contains(WOULDN)'... | \n", "True | \n", "({u'contains(THING)': True, u'contains(WOULDN)... | \n", "
\n", " | Raw_joke | \n", "Funny | \n", "Prediction | \n", "
---|---|---|---|
0 | \n", "I just deleted all the German names off my pho... | \n", "True | \n", "False | \n", "
1 | \n", "Kim Kardashian is saddled with a huge arse ...... | \n", "True | \n", "True | \n", "
2 | \n", "Surely every car is a people carrier? | \n", "True | \n", "True | \n", "
3 | \n", "What's the difference between a 'hippo' and a ... | \n", "True | \n", "True | \n", "
4 | \n", "If I could take just one thing to a desert isl... | \n", "True | \n", "True | \n", "
5 | \n", "Jesus fed 5,000 people with two fishes and a l... | \n", "False | \n", "True | \n", "
6 | \n", "Red sky at night. Shepherd's delight. Blue sky... | \n", "False | \n", "True | \n", "
7 | \n", "The first time I met my wife, I knew she was a... | \n", "False | \n", "False | \n", "
8 | \n", "Clowns divorce. Custardy battle | \n", "False | \n", "True | \n", "
9 | \n", "They're always telling me to live my dreams. B... | \n", "False | \n", "False | \n", "