{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Simple Sentiment Analysis\n", "\n", "This notebook shows how to analyze a collection of passages like Tweets for sentiment.\n", "\n", "This is based on Neal Caron's [An introduction to text analysis with Python, Part 1](http://nealcaren.web.unc.edu/an-introduction-to-text-analysis-with-python-part-1/).\n", "\n", "This Notebook shows how to analyze one tweet." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting up our data\n", "\n", "Here we will define the data to test our positive and negative dictionaries." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "theTweet = \"No food is good food. Ha. I'm on a diet and the food is awful and lame.\"\n", "positive_words=['awesome','good','nice','super','fun','delightful']\n", "negative_words=['awful','lame','horrible','bad']" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(positive_words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tokenizing the text\n", "\n", "Now we will tokenize the text." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['no', 'food', 'is', 'good', 'food', 'ha', 'i', 'm', 'on', 'a']\n" ] } ], "source": [ "import re\n", "theTokens = re.findall(r'\\b\\w[\\w-]*\\b', theTweet.lower())\n", "print(theTokens[:10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculating postive words\n", "\n", "Now we will count the number of positive words." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "numPosWords = 0\n", "for banana in theTokens:\n", " if banana in positive_words:\n", " numPosWords += 1\n", "print(numPosWords) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculating negative words\n", "\n", "Now we will count the number of negative words." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n" ] } ], "source": [ "numNegWords = 0\n", "for word in theTokens:\n", " if word in negative_words:\n", " numNegWords += 1\n", "print(numNegWords) " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v1 = \"0\"\n", "v2 = 0\n", "v3 = str(v2)\n", "v1 == v3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculating percentages\n", "\n", "Now we calculate the percentages of postive and negative." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Positive: 6% Negative: 11%\n" ] } ], "source": [ "numWords = len(theTokens)\n", "percntPos = numPosWords / numWords\n", "percntNeg = numNegWords / numWords\n", "print(\"Positive: \" + \"{:.0%}\".format(percntPos) + \" Negative: \" + \"{:.0%}\".format(percntNeg))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deciding if it is postive or negative\n", "\n", "We are going assume that a simple majority will define if the Tweet is positive or negative." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Negative 1:2\n", "\n" ] } ], "source": [ "if numPosWords > numNegWords:\n", " print(\"Positive \" + str(numPosWords) + \":\" + str(numNegWords))\n", "elif numNegWords > numPosWords:\n", " print(\"Negative \" + str(numPosWords) + \":\" + str(numNegWords))\n", "elif numNegWords == numPosWords:\n", " print(\"Neither \" + str(numPosWords) + \":\" + str(numNegWords))\n", " \n", "print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next Steps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try another utility example, this time looking at more [Complex Sentiment Analysis](ComplexSentimentAnalysis.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/) From [The Art of Literary Text Analysis](../ArtOfLiteraryTextAnalysis.ipynb) by [Stéfan Sinclair](http://stefansinclair.name) & [Geoffrey Rockwell](http://geoffreyrockwell.com). Edited and revised by [Melissa Mony](http://melissamony.com).
Created August 8, 2014 (Jupyter 4.2.1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }