"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"sonnetsSentimentFreqs.plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Even if the x (bottom) axis is illegible, we can see that there are both positive and negatively scored sonnets. Can we determine an average? Sure, let's use numpty.mean() on the list of values."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0786493506493506"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy\n",
"numpy.mean([val for doc, val in sonnetsSentimentFreqs.items()])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This suggests that on average the sonnets are more positive than negative."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outputting HTML"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We might want to show the most positive and negative sonnets, but to make it even more useful we'd want to show which words are positive and which are negative. Let's create a function to get HTML for any of our documents."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_html_for_sentiment_data(text, positives, negatives):\n",
" # the regular expression combines all of the positive and negative words for a search, e.g. (love|like)\n",
" # it then surrounds the word found in parentheses with styling, green for positive, red for negative\n",
" if len(positives) > 0:\n",
" text = re.sub(r'\\b(' + '|'.join(positives) + r')\\b', r'\\1', text)\n",
" if len(negatives) > 0:\n",
" text = re.sub(r'\\b(' + '|'.join(negatives) + r')\\b', r'\\1', text)\n",
" return text"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use this to generate an HTML snippet."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"fileid = sonnetsSentimentFreqs.max() # most positive\n",
"text = sonnetsCorpus.raw(fileid)\n",
"html = get_html_for_sentiment_data(text, sonnetsPositives, sonnetsNegatives)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have the HTML above, but it appears as code, so we need to use iPython's facilities for embedding HTML."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"032.txt
If thou survive my well-contented day,\r\n",
" When that churl Death my bones with dust shall cover\r\n",
" And shalt by fortune once more re-survey\r\n",
" These poor rude lines of thy deceased lover,\r\n",
" Compare them with the bett'ring of the time,\r\n",
" And though they be outstripp'd by every pen,\r\n",
" Reserve them for my love, not for their rhyme,\r\n",
" Exceeded by the height of happier men.\r\n",
" O! then vouchsafe me but this loving thought:\r\n",
" 'Had my friend's Muse grown with this growing age,\r\n",
" A dearer birth than this his love had brought,\r\n",
" To march in ranks of better equipage:\r\n",
" But since he died and poets better prove,\r\n",
" Theirs for their style I'll read, his for his love'.
"
],
"text/plain": [
""
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import HTML\n",
"HTML(\"\" + fileid + \"
\" + html + \"
\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now the most negative sonnet."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"090.txt
Then hate me when thou wilt; if ever, now;\r\n",
" Now, while the world is bent my deeds to cross,\r\n",
" Join with the spite of fortune, make me bow,\r\n",
" And do not drop in for an after-loss:\r\n",
" Ah! do not, when my heart hath 'scap'd this sorrow,\r\n",
" Come in the rearward of a conquer'd woe;\r\n",
" Give not a windy night a rainy morrow,\r\n",
" To linger out a purpos'd overthrow.\r\n",
" If thou wilt leave me, do not leave me last,\r\n",
" When other petty griefs have done their spite,\r\n",
" But in the onset come: so shall I taste\r\n",
" At first the very worst of fortune's might;\r\n",
" And other strains of woe, which now seem woe,\r\n",
" Compar'd with loss of thee, will not seem so.
"
],
"text/plain": [
""
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fileid = sonnetsSentimentFreqs.most_common()[-1][0] # most negative (fileid of the last element in the most common list)\n",
"text = sonnetsCorpus.raw(fileid)\n",
"html = get_html_for_sentiment_data(text, sonnetsPositives, sonnetsNegatives)\n",
"HTML(\"\" + fileid + \"
\" + html + \"
\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There's a lot that we could look at and question here, and we should do so, but there's also some intriguing aspects, particularly for such an automated process.\n",
"\n",
"For the sake of convenience, here are the main functions that we defined for sentiment analysis of a corpus:\n",
"\n",
"```python\n",
"import nltk\n",
"from nltk.corpus import sentiwordnet as swn\n",
"\n",
"def get_sentiments_data_from_corpus(corpus, skipWordNetPos=[]):\n",
" documents = {}\n",
" all_positives = []\n",
" all_negatives = []\n",
" for fileid in corpus.fileids():\n",
" tokens = corpus.words(fileid)\n",
" score, positives, negatives = get_sentiment_data_from_tokens(tokens, skipWordNetPos)\n",
" documents[fileid] = score\n",
" [all_positives.append(positive) for positive in positives]\n",
" [all_negatives.append(negative) for negative in negatives]\n",
" return documents, set(all_positives), set(all_negatives)\n",
"\n",
"def get_sentiment_data_from_tokens(tokens, skipWordNetPos=[]):\n",
" tagged = nltk.pos_tag(tokens)\n",
" positives = []\n",
" negatives = []\n",
" tokens_score = 0\n",
" for word, treebank in tagged:\n",
" score = get_sentiment_score_from_tagged(word, treebank, skipWordNetPos)\n",
" if score:\n",
" tokens_score += score\n",
" if score > 0:\n",
" positives.append(word.lower())\n",
" else:\n",
" negatives.append(word.lower())\n",
" return tokens_score, set(positives), set(negatives)\n",
"\n",
"def get_sentiment_score_from_tagged(token, treebank, skipWordNetPos=[]):\n",
" wordnet_pos = treebank_to_wordnet_pos(treebank, skipWordNetPos)\n",
" if wordnet_pos: # only print matches\n",
" senti_synsets = list(swn.senti_synsets(token, wordnet_pos))\n",
" if senti_synsets:\n",
" return senti_synsets[0].pos_score() - senti_synsets[0].neg_score()\n",
"\n",
"def treebank_to_wordnet_pos(treebank, skipWordNetPos=[]):\n",
" if \"NN\" in treebank and \"n\" not in skipWordNetPos: # singular and plural nouns (NN, NNS)\n",
" return \"n\"\n",
" elif \"JJ\" in treebank and \"a\" not in skipWordNetPos: # adjectives (JJ, JJR, JJS)\n",
" return \"a\" \n",
" elif \"VB\" in treebank and \"v\" not in skipWordNetPos: # verbs (VB, VBD, VBG, VBN, VBP, VBZ)\n",
" return \"v\"\n",
" elif \"RB\" in treebank and \"r\" not in skipWordNetPos: # adverbs (RB, RBR, RBS)\n",
" return \"r\"\n",
" # if we don't match any of these we implicitly return None```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next Steps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Try these tasks:\n",
"\n",
"* Experiment with different values for *skipWordNetPos* – which combination seems to give the best results?\n",
"* Can you set a threshold of positive and negative values to improve results?\n",
"* Can you add your own stop-word list to the function signatures and have those words skipped when looking for sentiment?\n",
"\n",
"In the next notebook we're going to look at [Document Similarity](DocumentSimilarity.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/) From [The Art of Literary Text Analysis](ArtOfLiteraryTextAnalysis.ipynb) by [Stéfan Sinclair](http://stefansinclair.name) & [Geoffrey Rockwell](http://geoffreyrockwell.com). Edited and revised by [Melissa Mony](http://melissamony.com).
Created March 9, 2015 and last modified December 9, 2015 (Jupyter 4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}