{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sentiment\n",
    "\n",
    "Polyglot has polarity lexicons for 136 languages.\n",
    "The scale of the words' polarity consisted of three degrees: +1 for positive words, and -1 for negatives words.\n",
    "Neutral words will have a score of 0."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Languages Coverage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  1. Turkmen                    2. Thai                       3. Latvian                  \n",
      "  4. Zazaki                     5. Tagalog                    6. Tamil                    \n",
      "  7. Tajik                      8. Telugu                     9. Luxembourgish, Letzeb... \n",
      " 10. Alemannic                 11. Latin                     12. Turkish                  \n",
      " 13. Limburgish, Limburgan...  14. Egyptian Arabic           15. Tatar                    \n",
      " 16. Lithuanian                17. Spanish; Castilian        18. Basque                   \n",
      " 19. Estonian                  20. Asturian                  21. Greek, Modern            \n",
      " 22. Esperanto                 23. English                   24. Ukrainian                \n",
      " 25. Marathi (Marāṭhī)         26. Maltese                   27. Burmese                  \n",
      " 28. Kapampangan               29. Uighur, Uyghur            30. Uzbek                    \n",
      " 31. Malagasy                  32. Yiddish                   33. Macedonian               \n",
      " 34. Urdu                      35. Malayalam                 36. Mongolian                \n",
      " 37. Breton                    38. Bosnian                   39. Bengali                  \n",
      " 40. Tibetan Standard, Tib...  41. Belarusian                42. Bulgarian                \n",
      " 43. Bashkir                   44. Vietnamese                45. Volapük                  \n",
      " 46. Gan Chinese               47. Manx                      48. Gujarati                 \n",
      " 49. Yoruba                    50. Occitan                   51. Scottish Gaelic; Gaelic  \n",
      " 52. Irish                     53. Galician                  54. Ossetian, Ossetic        \n",
      " 55. Oriya                     56. Walloon                   57. Swedish                  \n",
      " 58. Silesian                  59. Lombard language          60. Divehi; Dhivehi; Mald... \n",
      " 61. Danish                    62. German                    63. Armenian                 \n",
      " 64. Haitian; Haitian Creole   65. Hungarian                 66. Croatian                 \n",
      " 67. Bishnupriya Manipuri      68. Hindi                     69. Hebrew (modern)          \n",
      " 70. Portuguese                71. Afrikaans                 72. Pashto, Pushto           \n",
      " 73. Amharic                   74. Aragonese                 75. Bavarian                 \n",
      " 76. Assamese                  77. Panjabi, Punjabi          78. Polish                   \n",
      " 79. Azerbaijani               80. Italian                   81. Arabic                   \n",
      " 82. Icelandic                 83. Ido                       84. Scots                    \n",
      " 85. Sicilian                  86. Indonesian                87. Chinese Word             \n",
      " 88. Interlingua               89. Waray-Waray               90. Piedmontese language     \n",
      " 91. Quechua                   92. French                    93. Dutch                    \n",
      " 94. Norwegian Nynorsk         95. Norwegian                 96. Western Frisian          \n",
      " 97. Upper Sorbian             98. Nepali                    99. Persian                  \n",
      "100. Ilokano                  101. Finnish                  102. Faroese                  \n",
      "103. Romansh                  104. Javanese                 105. Romanian, Moldavian, ... \n",
      "106. Malay                    107. Japanese                 108. Russian                  \n",
      "109. Catalan; Valencian       110. Fiji Hindi               111. Chinese                  \n",
      "112. Cebuano                  113. Czech                    114. Chuvash                  \n",
      "115. Welsh                    116. West Flemish             117. Kirghiz, Kyrgyz          \n",
      "118. Kurdish                  119. Kazakh                   120. Korean                   \n",
      "121. Kannada                  122. Khmer                    123. Georgian                 \n",
      "124. Sakha                    125. Serbian                  126. Albanian                 \n",
      "127. Swahili                  128. Chechen                  129. Sundanese                \n",
      "130. Sanskrit (Saṁskṛta)      131. Venetian                 132. Northern Sami            \n",
      "133. Slovak                   134. Sinhala, Sinhalese       135. Bosnian-Croatian-Serbian \n",
      "136. Slovene                  \n"
     ]
    }
   ],
   "source": [
    "from polyglot.downloader import downloader\n",
    "print(downloader.supported_languages_table(\"sentiment2\", 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from polyglot.text import Text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Polarity\n",
    "\n",
    "To inquiry the polarity of a word, we can just call its own attribute `polarity`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "text = Text(\"The movie was really good.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Word            Polarity\n",
      "------------------------------\n",
      "The              0\n",
      "movie            0\n",
      "was              0\n",
      "really           0\n",
      "good             1\n",
      ".                0\n"
     ]
    }
   ],
   "source": [
    "print(\"{:<16}{}\".format(\"Word\", \"Polarity\")+\"\\n\"+\"-\"*30)\n",
    "for w in text.words:\n",
    "    print(\"{:<16}{:>2}\".format(w, w.polarity))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Entity Sentiment\n",
    "\n",
    "We can calculate a more sphosticated sentiment score for an entity that is mentioned in text as the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "blob = (\"Barack Obama gave a fantastic speech last night. \"\n",
    "        \"Reports indicate he will move next to New Hampshire.\")\n",
    "text = Text(blob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need split the text into sentneces, this will limit the words tha affect the sentiment of an entity to the words mentioned in the sentnece."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The movie was really good.\n"
     ]
    }
   ],
   "source": [
    "first_sentence = text.sentences[0]\n",
    "print(first_sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Second, we extract the entities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[u'Obama']\n"
     ]
    }
   ],
   "source": [
    "first_entity = first_sentence.entities[0]\n",
    "print(first_entity)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, for each entity we identified, we can calculate the strength of the positive or negative sentiment it has on a scale from 0-1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9375"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_entity.positive_sentiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_entity.negative_sentiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Citation\n",
    "\n",
    "This work is a direct implementation of the research being described in the [Building sentiment lexicons for all major languages](http://aclweb.org/anthology/P14-2063) paper.\n",
    "The author of this library strongly encourage you to cite the following paper if you are using this software."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "```\n",
    "   @inproceedings{chen2014building,\n",
    "   title={Building sentiment lexicons for all major languages},\n",
    "   author={Chen, Yanqing and Skiena, Steven},\n",
    "   booktitle={Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers)},\n",
    "   pages={383--389},\n",
    "   year={2014}}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}