{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "text_wordcloud1.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyMJ4iddkEBEELv5/Slk2oxQ",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"source": [
"### Purpose: to demonstrate how to print texts using `wordcloud` package"
],
"metadata": {
"id": "s41p5q3OBj3V"
}
},
{
"cell_type": "markdown",
"source": [
"### Source: I used \"Romeo and Juliet\" listed in the famous book, \"*Automate the Boring Stuff with Python*\" by Al Sweigart, 2nd edition @2020\n",
"#### https://automatetheboringstuff.com/2e/chapter12/#:~:text=The%20iter_content()%20method,be%20on%20your%20computer"
],
"metadata": {
"id": "iSSgmSoK9CS6"
}
},
{
"cell_type": "code",
"source": [
"!pip install wordcloud"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3ROMUKnI5lw7",
"outputId": "82a6a601-5145-4d4c-d334-449336129452"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: wordcloud in /usr/local/lib/python3.7/dist-packages (1.5.0)\n",
"Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from wordcloud) (7.1.2)\n",
"Requirement already satisfied: numpy>=1.6.1 in /usr/local/lib/python3.7/dist-packages (from wordcloud) (1.21.6)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"!pip install --user requests"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xqsLAqZX5fCe",
"outputId": "0829b4fc-185e-4359-d77f-f4562b299a80"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (2.23.0)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests) (2022.6.15)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests) (3.0.4)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests) (2.10)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests) (1.24.3)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"##### `Requests` allow you to send HTTP/1.1 requests. You can add headers, form data, multi-part files, and parameters with simple Python dictionaries, and access the response data in the same way. "
],
"metadata": {
"id": "djdXBikXd8Ec"
}
},
{
"cell_type": "code",
"source": [
"import requests # to download files through url"
],
"metadata": {
"id": "LKS6qznKiSqB"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"source": [
"res = requests.get('https://automatetheboringstuff.com/files/rj.txt')\n",
"type(res)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "e4JRRdMG5fAB",
"outputId": "8bcdc848-e985-45c7-ae75-863d6b1aa809"
},
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"requests.models.Response"
]
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "code",
"source": [
"# checking for error, \n",
"\n",
"res.status_code == requests.codes.ok "
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0NdzCWb9-FRN",
"outputId": "58aee383-2a33-4db4-fde6-345f1629bc74"
},
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"True"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "code",
"source": [
"len(res.text) #178,000+ characters long"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Ioltq8gk5e9p",
"outputId": "ef51e9af-1a27-4a37-b9fb-0a0af86576af"
},
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"178978"
]
},
"metadata": {},
"execution_count": 6
}
]
},
{
"cell_type": "code",
"source": [
"print(res.text[:0])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "g_Hxco4N8kU3",
"outputId": "6baf30c4-97f1-4112-a40b-2f840fa21952"
},
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(res.text[:1])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "DKFwLU0A8mJg",
"outputId": "58b2bb25-492e-4830-b344-efb941099b03"
},
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"T\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(res.text[0])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "n5watRlR8mGh",
"outputId": "6f5c6ad5-bd33-49be-d8df-692c69407962"
},
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"T\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(res.text[:2])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ba82fkxm8mDc",
"outputId": "6f394405-2a96-443b-e157-2058a84a8dea"
},
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Th\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(res.text[:3])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zxxRFrkw9Gty",
"outputId": "5b8eb123-d321-4e20-aa1e-cd2728899a3b"
},
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(res.text[:100])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "g-L7gC0a8mAk",
"outputId": "6d20ca80-6b35-417c-b78e-5439d169b83b"
},
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare\r\n",
"\r\n",
"This eBook is for the use\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(res.text[:500])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pha6lrV25e7W",
"outputId": "87a509bf-4eab-4eb6-aee9-46e5fa02d568"
},
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare\r\n",
"\r\n",
"This eBook is for the use of anyone anywhere at no cost and with\r\n",
"almost no restrictions whatsoever. You may copy it, give it away or\r\n",
"re-use it under the terms of the Project Gutenberg License included\r\n",
"with this eBook or online at www.gutenberg.org/license\r\n",
"\r\n",
"\r\n",
"Title: Romeo and Juliet\r\n",
"\r\n",
"Author: William Shakespeare\r\n",
"\r\n",
"Posting Date: May 25, 2012 [EBook #1112]\r\n",
"Release Date: November, 1997 [Etext #1112]\r\n",
"\r\n",
"Language: Eng\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"res.raise_for_status() \n",
"# checking for error\n",
"# response.raise_for_status() returns an HTTPError object if an error has occurred during the process."
],
"metadata": {
"id": "eEpO-4Lz55I_"
},
"execution_count": 14,
"outputs": []
},
{
"cell_type": "code",
"source": [
"playFile = open('RomeoAndJuliet.txt', 'wb') \n",
" # to write binary data instead of text data \n",
" # to maintain the Unicode encoding of the text. refer to the book"
],
"metadata": {
"id": "0nQKip8055Gg"
},
"execution_count": 15,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# https://www.geeksforgeeks.org/response-iter_content-python-requests/\n",
"\n",
"for chunk in res.iter_content(200000):\n",
" playFile.write(chunk)"
],
"metadata": {
"id": "zWVqF85zAZoV"
},
"execution_count": 16,
"outputs": []
},
{
"cell_type": "code",
"source": [
"playFile.close()"
],
"metadata": {
"id": "6j5i3RpiAZg_"
},
"execution_count": 17,
"outputs": []
},
{
"cell_type": "code",
"source": [
"with open(\"RomeoAndJuliet.txt\", 'r') as fh: \n",
" filedata = fh.read()\n",
"\n",
"print(type(filedata))\n",
"print(len(filedata))\n",
"print('---------------')\n",
"print(filedata[:500])\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "hz4LNlJjf4b7",
"outputId": "268c7a68-fc84-4f47-d927-18978d2d831a"
},
"execution_count": 18,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"174126\n",
"---------------\n",
"The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare\n",
"\n",
"This eBook is for the use of anyone anywhere at no cost and with\n",
"almost no restrictions whatsoever. You may copy it, give it away or\n",
"re-use it under the terms of the Project Gutenberg License included\n",
"with this eBook or online at www.gutenberg.org/license\n",
"\n",
"\n",
"Title: Romeo and Juliet\n",
"\n",
"Author: William Shakespeare\n",
"\n",
"Posting Date: May 25, 2012 [EBook #1112]\n",
"Release Date: November, 1997 [Etext #1112]\n",
"\n",
"Language: English\n",
"\n",
"\n",
"*** STAR\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"#Library to form wordcloud :\n",
"from wordcloud import WordCloud, STOPWORDS\n",
"stopwords = set(STOPWORDS)\n",
"\n",
"#Library to plot the wordcloud :\n",
"import matplotlib.pyplot as plt\n",
"\n",
"#Generating the wordcloud data :\n",
"wordcloud = WordCloud(stopwords=stopwords, max_words=100).generate(filedata)\n",
"\n",
"#Plot the wordcloud :\n",
"plt.figure(figsize = (10, 10)) \n",
"plt.imshow(wordcloud) \n",
"\n",
"#To remove the axis value :\n",
"plt.axis(\"off\") \n",
"plt.show()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 310
},
"id": "WTwY7QaM1dqj",
"outputId": "5702d08e-efec-44e8-ddb6-d2a8c87798bb"
},
"execution_count": 19,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"