{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sentiment\n", "\n", "Polyglot has polarity lexicons for 136 languages.\n", "The scale of the words' polarity consisted of three degrees: +1 for positive words, and -1 for negatives words.\n", "Neutral words will have a score of 0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Languages Coverage" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1. Turkmen 2. Thai 3. Latvian \n", " 4. Zazaki 5. Tagalog 6. Tamil \n", " 7. Tajik 8. Telugu 9. Luxembourgish, Letzeb... \n", " 10. Alemannic 11. Latin 12. Turkish \n", " 13. Limburgish, Limburgan... 14. Egyptian Arabic 15. Tatar \n", " 16. Lithuanian 17. Spanish; Castilian 18. Basque \n", " 19. Estonian 20. Asturian 21. Greek, Modern \n", " 22. Esperanto 23. English 24. Ukrainian \n", " 25. Marathi (Marāṭhī) 26. Maltese 27. Burmese \n", " 28. Kapampangan 29. Uighur, Uyghur 30. Uzbek \n", " 31. Malagasy 32. Yiddish 33. Macedonian \n", " 34. Urdu 35. Malayalam 36. Mongolian \n", " 37. Breton 38. Bosnian 39. Bengali \n", " 40. Tibetan Standard, Tib... 41. Belarusian 42. Bulgarian \n", " 43. Bashkir 44. Vietnamese 45. Volapük \n", " 46. Gan Chinese 47. Manx 48. Gujarati \n", " 49. Yoruba 50. Occitan 51. Scottish Gaelic; Gaelic \n", " 52. Irish 53. Galician 54. Ossetian, Ossetic \n", " 55. Oriya 56. Walloon 57. Swedish \n", " 58. Silesian 59. Lombard language 60. Divehi; Dhivehi; Mald... \n", " 61. Danish 62. German 63. Armenian \n", " 64. Haitian; Haitian Creole 65. Hungarian 66. Croatian \n", " 67. Bishnupriya Manipuri 68. Hindi 69. Hebrew (modern) \n", " 70. Portuguese 71. Afrikaans 72. Pashto, Pushto \n", " 73. Amharic 74. Aragonese 75. Bavarian \n", " 76. Assamese 77. Panjabi, Punjabi 78. Polish \n", " 79. Azerbaijani 80. Italian 81. Arabic \n", " 82. Icelandic 83. Ido 84. Scots \n", " 85. Sicilian 86. Indonesian 87. Chinese Word \n", " 88. Interlingua 89. Waray-Waray 90. Piedmontese language \n", " 91. Quechua 92. French 93. Dutch \n", " 94. Norwegian Nynorsk 95. Norwegian 96. Western Frisian \n", " 97. Upper Sorbian 98. Nepali 99. Persian \n", "100. Ilokano 101. Finnish 102. Faroese \n", "103. Romansh 104. Javanese 105. Romanian, Moldavian, ... \n", "106. Malay 107. Japanese 108. Russian \n", "109. Catalan; Valencian 110. Fiji Hindi 111. Chinese \n", "112. Cebuano 113. Czech 114. Chuvash \n", "115. Welsh 116. West Flemish 117. Kirghiz, Kyrgyz \n", "118. Kurdish 119. Kazakh 120. Korean \n", "121. Kannada 122. Khmer 123. Georgian \n", "124. Sakha 125. Serbian 126. Albanian \n", "127. Swahili 128. Chechen 129. Sundanese \n", "130. Sanskrit (Saṁskṛta) 131. Venetian 132. Northern Sami \n", "133. Slovak 134. Sinhala, Sinhalese 135. Bosnian-Croatian-Serbian \n", "136. Slovene \n" ] } ], "source": [ "from polyglot.downloader import downloader\n", "print(downloader.supported_languages_table(\"sentiment2\", 3))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from polyglot.text import Text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Polarity\n", "\n", "To inquiry the polarity of a word, we can just call its own attribute `polarity`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "text = Text(\"The movie was really good.\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Word Polarity\n", "------------------------------\n", "The 0\n", "movie 0\n", "was 0\n", "really 0\n", "good 1\n", ". 0\n" ] } ], "source": [ "print(\"{:<16}{}\".format(\"Word\", \"Polarity\")+\"\\n\"+\"-\"*30)\n", "for w in text.words:\n", " print(\"{:<16}{:>2}\".format(w, w.polarity))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Entity Sentiment\n", "\n", "We can calculate a more sphosticated sentiment score for an entity that is mentioned in text as the following:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "blob = (\"Barack Obama gave a fantastic speech last night. \"\n", " \"Reports indicate he will move next to New Hampshire.\")\n", "text = Text(blob)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need split the text into sentneces, this will limit the words tha affect the sentiment of an entity to the words mentioned in the sentnece." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The movie was really good.\n" ] } ], "source": [ "first_sentence = text.sentences[0]\n", "print(first_sentence)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Second, we extract the entities" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[u'Obama']\n" ] } ], "source": [ "first_entity = first_sentence.entities[0]\n", "print(first_entity)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, for each entity we identified, we can calculate the strength of the positive or negative sentiment it has on a scale from 0-1" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.9375" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_entity.positive_sentiment" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_entity.negative_sentiment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Citation\n", "\n", "This work is a direct implementation of the research being described in the [Building sentiment lexicons for all major languages](http://aclweb.org/anthology/P14-2063) paper.\n", "The author of this library strongly encourage you to cite the following paper if you are using this software." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "```\n", " @inproceedings{chen2014building,\n", " title={Building sentiment lexicons for all major languages},\n", " author={Chen, Yanqing and Skiena, Steven},\n", " booktitle={Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers)},\n", " pages={383--389},\n", " year={2014}}\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }