{ "cells": [ { "cell_type": "markdown", "id": "136200aa", "metadata": {}, "source": [ "## GGIS 527\n", "### Lab 7 Analyzing Geo-Text Data with Natural Language Processing (NLP) Tools\n", "#### Developed by Zhaonan Wang in Fall 2023\n", "In this lab, you will go through the data wrangling process with both types of geo-text data, with explicit geo-tags or implicit location mentions within text.\n", "- [**Explicit geo-text dataset**](#explicit): 2145 business located in Illinois side of St. Louis, derived from [Yelp Academic Dataset](https://www.yelp.com/dataset). Each data record also comes with a user review, in plain text. Refer [here for data format](https://www.yelp.com/dataset/documentation/main) of utilized business and review data. Your task is to perform sentiment analysis on each review and map the polarity score onto a map.\n", "- [**Implicit geo-text dataset**](#implicit): news reports usually mention various locations, like countries, states, and even local toponyms (place names). In this notebook, you will play with a toy corpus containing three chunks of online news about some dam failure events. Your task to extract location mentions buried in the unstructured text." ] }, { "cell_type": "markdown", "id": "f7fabf37", "metadata": {}, "source": [ "\n", "### Explicit Geo-Text Data Analysis" ] }, { "cell_type": "code", "execution_count": 1, "id": "f82f02e2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2145, 25)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0Unnamed: 0_xbusiness_idnameaddresscitystatepostal_codelatitudelongitude...hoursUnnamed: 0_yreview_iduser_idstars_yusefulfunnycooltextdate
0038LcAozWCMLGjwRbokaJAKMgEdwardsville Children's Museum722 Holyoake RdEdwardsvilleIL6202538.804395-89.949733...{'Monday': '10:0-15:0', 'Tuesday': '9:30-14:0'...313LfsU2lVUr1-pC802v0o32AmRgAqvxz9jHYpm8ccIjZUQ5.0000Place rocks excellent children's activities an...2016-07-04 20:56:17
11341ljxNT9p0y7YMPx0fcNBGigTony's Restaurant & 3rd Street Cafe312 Piasa StAltonIL6200238.896563-90.186203...{'Monday': '0:0-0:0', 'Tuesday': '16:0-21:30',...20uiqzlDEsUN_y1awEw_HHDAqmQPWMV_YYmwV2DyvmIDYQ5.0000We had been driving around for some time, on a...2018-07-17 01:07:49
211848bCBPXIVfVzBZBEpFu29dcgAll In Shipping5343 Belleville Crossing StBellevilleIL6222638.517586-90.021929...NaN1378oZqb2LRrJFaEjTz9ETzpPABHrWZS0J0FuJuLqeNk6J7w5.0000I love this little local business. They have e...2017-01-20 14:13:47
312386sE6jSnvMts_MAn-b4OkMAwK-9 Groom Room820 Industrial DrTroyIL6229438.716244-89.885830...{'Monday': '8:0-16:0', 'Tuesday': '8:0-16:0', ...194UjBwlySBW4iPpFWGOw5XkwSE85OT0FKxeL28izk-5POg4.0300This is another great local business. Our two...2011-03-25 17:36:39
4128102EuRGgOwJ0g1vTj2R04j37QCrafty Crab51 Ludwig DrFairview HeightsIL6220838.601298-89.989683...{'Monday': '12:0-22:0', 'Tuesday': '12:0-22:0'...3261DrWMCBMRweRydBEk-OLKYgh3o-SqWjDeMI2fCJI63-jg1.0000Waiter was absolutely terrible ordered our foo...2021-11-06 02:07:15
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 Unnamed: 0_x business_id \\\n", "0 0 38 LcAozWCMLGjwRbokaJAKMg \n", "1 13 41 ljxNT9p0y7YMPx0fcNBGig \n", "2 118 48 bCBPXIVfVzBZBEpFu29dcg \n", "3 123 86 sE6jSnvMts_MAn-b4OkMAw \n", "4 128 102 EuRGgOwJ0g1vTj2R04j37Q \n", "\n", " name address \\\n", "0 Edwardsville Children's Museum 722 Holyoake Rd \n", "1 Tony's Restaurant & 3rd Street Cafe 312 Piasa St \n", "2 All In Shipping 5343 Belleville Crossing St \n", "3 K-9 Groom Room 820 Industrial Dr \n", "4 Crafty Crab 51 Ludwig Dr \n", "\n", " city state postal_code latitude longitude ... \\\n", "0 Edwardsville IL 62025 38.804395 -89.949733 ... \n", "1 Alton IL 62002 38.896563 -90.186203 ... \n", "2 Belleville IL 62226 38.517586 -90.021929 ... \n", "3 Troy IL 62294 38.716244 -89.885830 ... \n", "4 Fairview Heights IL 62208 38.601298 -89.989683 ... \n", "\n", " hours Unnamed: 0_y \\\n", "0 {'Monday': '10:0-15:0', 'Tuesday': '9:30-14:0'... 313 \n", "1 {'Monday': '0:0-0:0', 'Tuesday': '16:0-21:30',... 20 \n", "2 NaN 1378 \n", "3 {'Monday': '8:0-16:0', 'Tuesday': '8:0-16:0', ... 194 \n", "4 {'Monday': '12:0-22:0', 'Tuesday': '12:0-22:0'... 3261 \n", "\n", " review_id user_id stars_y useful funny cool \\\n", "0 LfsU2lVUr1-pC802v0o32A mRgAqvxz9jHYpm8ccIjZUQ 5.0 0 0 0 \n", "1 uiqzlDEsUN_y1awEw_HHDA qmQPWMV_YYmwV2DyvmIDYQ 5.0 0 0 0 \n", "2 oZqb2LRrJFaEjTz9ETzpPA BHrWZS0J0FuJuLqeNk6J7w 5.0 0 0 0 \n", "3 UjBwlySBW4iPpFWGOw5Xkw SE85OT0FKxeL28izk-5POg 4.0 3 0 0 \n", "4 DrWMCBMRweRydBEk-OLKYg h3o-SqWjDeMI2fCJI63-jg 1.0 0 0 0 \n", "\n", " text date \n", "0 Place rocks excellent children's activities an... 2016-07-04 20:56:17 \n", "1 We had been driving around for some time, on a... 2018-07-17 01:07:49 \n", "2 I love this little local business. They have e... 2017-01-20 14:13:47 \n", "3 This is another great local business. Our two... 2011-03-25 17:36:39 \n", "4 Waiter was absolutely terrible ordered our foo... 2021-11-06 02:07:15 \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "# read prepared yelp data\n", "yelp_data = pd.read_csv('./data/yelp_STL_IL.csv')\n", "\n", "# check data\n", "print(yelp_data.shape)\n", "yelp_data.head()\n", "\n", "# we are majorly interested in column 'text' with geotag ['latitude', longitude]" ] }, { "cell_type": "markdown", "id": "e41f1af6", "metadata": {}, "source": [ "#### Introduction to Spacy and Sentiment Analysis\n", "We will use [Spacy](https://spacy.io/), which is free, open-sourced, and easy-to-use python library for foundamental NLP tasks, such as pre-processing, information extraction, and natural language understanding. Specifically, we will leverage a pre-trained pipeline, namely [spacytextblob](https://spacy.io/universe/project/spacy-textblob), for sentiment analysis. Depending on whether the user like the commented business or not, the model will return a sentiment polarity score on a scale from -1 to 1. Here negative denotes dislike and positive denotes like, to some extent." ] }, { "cell_type": "code", "execution_count": 2, "id": "3e7e7596", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: pip in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (23.3.1)\n", "Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (69.0.2)\n", "Requirement already satisfied: wheel in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (0.42.0)\n", "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: spacy in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (3.7.2)\n", "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.0.12)\n", "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.0.5)\n", "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.0.10)\n", "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.0.8)\n", "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.0.9)\n", "Requirement already satisfied: thinc<8.3.0,>=8.1.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (8.2.1)\n", "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.1.2)\n", "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.4.8)\n", "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.0.10)\n", "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (0.3.4)\n", "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (0.9.0)\n", "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (6.4.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (4.62.3)\n", "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.27.1)\n", "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (2.5.2)\n", "Requirement already satisfied: jinja2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.0.3)\n", "Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (69.0.2)\n", "Requirement already satisfied: packaging>=20.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (21.3)\n", "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy) (3.3.0)\n", "Requirement already satisfied: numpy>=1.15.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy) (1.22.0)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from packaging>=20.0->spacy) (3.0.7)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.6.0)\n", "Requirement already satisfied: pydantic-core==2.14.5 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.14.5)\n", "Requirement already satisfied: typing-extensions>=4.6.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.8.0)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.25.11)\n", "Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2021.10.8)\n", "Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.10)\n", "Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.3)\n", "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.7.11)\n", "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.1.4)\n", "Requirement already satisfied: click<9.0.0,>=7.1.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from typer<0.10.0,>=0.3.0->spacy) (7.1.2)\n", "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from weasel<0.4.0,>=0.1.0->spacy) (0.16.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2->spacy) (2.0.1)\n", "Defaulting to user installation because normal site-packages is not writeable\n", "Collecting en-core-web-sm==3.7.1\n", " Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.8/12.8 MB\u001b[0m \u001b[31m15.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n", "\u001b[?25hRequirement already satisfied: spacy<3.8.0,>=3.7.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from en-core-web-sm==3.7.1) (3.7.2)\n", "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.12)\n", "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.5)\n", "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.10)\n", "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.8)\n", "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.9)\n", "Requirement already satisfied: thinc<8.3.0,>=8.1.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.2.1)\n", "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.1.2)\n", "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.4.8)\n", "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)\n", "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.3.4)\n", "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.9.0)\n", "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (6.4.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.62.3)\n", "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.27.1)\n", "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.5.2)\n", "Requirement already satisfied: jinja2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.3)\n", "Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (69.0.2)\n", "Requirement already satisfied: packaging>=20.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (21.3)\n", "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.0)\n", "Requirement already satisfied: numpy>=1.15.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.22.0)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from packaging>=20.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.7)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.6.0)\n", "Requirement already satisfied: pydantic-core==2.14.5 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.14.5)\n", "Requirement already satisfied: typing-extensions>=4.6.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.8.0)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.25.11)\n", "Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2021.10.8)\n", "Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)\n", "Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3)\n", "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.7.11)\n", "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.1.4)\n", "Requirement already satisfied: click<9.0.0,>=7.1.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from typer<0.10.0,>=0.3.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (7.1.2)\n", "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from weasel<0.4.0,>=0.1.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.16.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.1)\n", "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n", "You can now load the package via spacy.load('en_core_web_sm')\n", "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: spacytextblob in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (4.0.0)\n", "Requirement already satisfied: spacy<4.0,>=3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacytextblob) (3.7.2)\n", "Requirement already satisfied: textblob<0.16.0,>=0.15.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacytextblob) (0.15.3)\n", "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.0.12)\n", "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.0.5)\n", "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.0.10)\n", "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.0.8)\n", "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.0.9)\n", "Requirement already satisfied: thinc<8.3.0,>=8.1.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (8.2.1)\n", "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.1.2)\n", "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.4.8)\n", "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.0.10)\n", "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (0.3.4)\n", "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (0.9.0)\n", "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (6.4.0)\n", "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (4.62.3)\n", "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.27.1)\n", "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (2.5.2)\n", "Requirement already satisfied: jinja2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.0.3)\n", "Requirement already satisfied: setuptools in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (69.0.2)\n", "Requirement already satisfied: packaging>=20.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (21.3)\n", "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (3.3.0)\n", "Requirement already satisfied: numpy>=1.15.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from spacy<4.0,>=3.0->spacytextblob) (1.22.0)\n", "Requirement already satisfied: nltk>=3.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from textblob<0.16.0,>=0.15.3->spacytextblob) (3.8.1)\n", "Requirement already satisfied: click in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from nltk>=3.1->textblob<0.16.0,>=0.15.3->spacytextblob) (7.1.2)\n", "Requirement already satisfied: joblib in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from nltk>=3.1->textblob<0.16.0,>=0.15.3->spacytextblob) (1.1.0)\n", "Requirement already satisfied: regex>=2021.8.3 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from nltk>=3.1->textblob<0.16.0,>=0.15.3->spacytextblob) (2023.10.3)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from packaging>=20.0->spacy<4.0,>=3.0->spacytextblob) (3.0.7)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0,>=3.0->spacytextblob) (0.6.0)\n", "Requirement already satisfied: pydantic-core==2.14.5 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0,>=3.0->spacytextblob) (2.14.5)\n", "Requirement already satisfied: typing-extensions>=4.6.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0,>=3.0->spacytextblob) (4.8.0)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (1.25.11)\n", "Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (2021.10.8)\n", "Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (2.0.10)\n", "Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<4.0,>=3.0->spacytextblob) (3.3)\n", "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<4.0,>=3.0->spacytextblob) (0.7.11)\n", "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from thinc<8.3.0,>=8.1.8->spacy<4.0,>=3.0->spacytextblob) (0.1.4)\n", "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /home/jovyan/.local/python3-0.9.0/lib/python3.8/site-packages (from weasel<0.4.0,>=0.1.0->spacy<4.0,>=3.0->spacytextblob) (0.16.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2->spacy<4.0,>=3.0->spacytextblob) (2.0.1)\n", "[nltk_data] Downloading package brown to /home/jovyan/nltk_data...\n", "[nltk_data] Package brown is already up-to-date!\n", "[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...\n", "[nltk_data] Package punkt is already up-to-date!\n", "[nltk_data] Downloading package wordnet to /home/jovyan/nltk_data...\n", "[nltk_data] Package wordnet is already up-to-date!\n", "[nltk_data] Downloading package averaged_perceptron_tagger to\n", "[nltk_data] /home/jovyan/nltk_data...\n", "[nltk_data] Package averaged_perceptron_tagger is already up-to-\n", "[nltk_data] date!\n", "[nltk_data] Downloading package conll2000 to /home/jovyan/nltk_data...\n", "[nltk_data] Package conll2000 is already up-to-date!\n", "[nltk_data] Downloading package movie_reviews to\n", "[nltk_data] /home/jovyan/nltk_data...\n", "[nltk_data] Package movie_reviews is already up-to-date!\n", "Finished.\n" ] } ], "source": [ "# install required libraries\n", "# spacy\n", "!pip install -U pip setuptools wheel\n", "!pip install -U spacy\n", "!python -m spacy download en_core_web_sm\n", "\n", "# spacytextblob\n", "!pip install spacytextblob\n", "!python -m textblob.download_corpora" ] }, { "cell_type": "code", "execution_count": 3, "id": "ad682545", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# import required libraries\n", "import spacy\n", "from spacytextblob.spacytextblob import SpacyTextBlob\n", "\n", "# load pipelines\n", "nlp = spacy.load('en_core_web_sm')\n", "nlp.add_pipe('spacytextblob')" ] }, { "cell_type": "code", "execution_count": 4, "id": "13b0a809", "metadata": {}, "outputs": [], "source": [ "# define a function to be applied on each row of pandas dataframe\n", "def sentiment_score(text):\n", " doc = nlp(text)\n", " return doc._.blob.polarity" ] }, { "cell_type": "code", "execution_count": 5, "id": "9c16668d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 2s, sys: 0 ns, total: 1min 2s\n", "Wall time: 1min 2s\n" ] } ], "source": [ "%%time\n", "\n", "# apply sentiment analysis to each row\n", "yelp_data['sentiment'] = yelp_data['text'].apply(sentiment_score)\n", "# It will take ~1 min to run through" ] }, { "cell_type": "code", "execution_count": 6, "id": "d81f5297", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-1.0 1.0\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0Unnamed: 0_xbusiness_idnameaddresscitystatepostal_codelatitudelongitude...Unnamed: 0_yreview_iduser_idstars_yusefulfunnycooltextdatesentiment
0038LcAozWCMLGjwRbokaJAKMgEdwardsville Children's Museum722 Holyoake RdEdwardsvilleIL6202538.804395-89.949733...313LfsU2lVUr1-pC802v0o32AmRgAqvxz9jHYpm8ccIjZUQ5.0000Place rocks excellent children's activities an...2016-07-04 20:56:170.436623
11341ljxNT9p0y7YMPx0fcNBGigTony's Restaurant & 3rd Street Cafe312 Piasa StAltonIL6200238.896563-90.186203...20uiqzlDEsUN_y1awEw_HHDAqmQPWMV_YYmwV2DyvmIDYQ5.0000We had been driving around for some time, on a...2018-07-17 01:07:490.200250
211848bCBPXIVfVzBZBEpFu29dcgAll In Shipping5343 Belleville Crossing StBellevilleIL6222638.517586-90.021929...1378oZqb2LRrJFaEjTz9ETzpPABHrWZS0J0FuJuLqeNk6J7w5.0000I love this little local business. They have e...2017-01-20 14:13:470.266146
312386sE6jSnvMts_MAn-b4OkMAwK-9 Groom Room820 Industrial DrTroyIL6229438.716244-89.885830...194UjBwlySBW4iPpFWGOw5XkwSE85OT0FKxeL28izk-5POg4.0300This is another great local business. Our two...2011-03-25 17:36:390.481250
4128102EuRGgOwJ0g1vTj2R04j37QCrafty Crab51 Ludwig DrFairview HeightsIL6220838.601298-89.989683...3261DrWMCBMRweRydBEk-OLKYgh3o-SqWjDeMI2fCJI63-jg1.0000Waiter was absolutely terrible ordered our foo...2021-11-06 02:07:15-0.124603
\n", "

5 rows × 26 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 Unnamed: 0_x business_id \\\n", "0 0 38 LcAozWCMLGjwRbokaJAKMg \n", "1 13 41 ljxNT9p0y7YMPx0fcNBGig \n", "2 118 48 bCBPXIVfVzBZBEpFu29dcg \n", "3 123 86 sE6jSnvMts_MAn-b4OkMAw \n", "4 128 102 EuRGgOwJ0g1vTj2R04j37Q \n", "\n", " name address \\\n", "0 Edwardsville Children's Museum 722 Holyoake Rd \n", "1 Tony's Restaurant & 3rd Street Cafe 312 Piasa St \n", "2 All In Shipping 5343 Belleville Crossing St \n", "3 K-9 Groom Room 820 Industrial Dr \n", "4 Crafty Crab 51 Ludwig Dr \n", "\n", " city state postal_code latitude longitude ... \\\n", "0 Edwardsville IL 62025 38.804395 -89.949733 ... \n", "1 Alton IL 62002 38.896563 -90.186203 ... \n", "2 Belleville IL 62226 38.517586 -90.021929 ... \n", "3 Troy IL 62294 38.716244 -89.885830 ... \n", "4 Fairview Heights IL 62208 38.601298 -89.989683 ... \n", "\n", " Unnamed: 0_y review_id user_id stars_y \\\n", "0 313 LfsU2lVUr1-pC802v0o32A mRgAqvxz9jHYpm8ccIjZUQ 5.0 \n", "1 20 uiqzlDEsUN_y1awEw_HHDA qmQPWMV_YYmwV2DyvmIDYQ 5.0 \n", "2 1378 oZqb2LRrJFaEjTz9ETzpPA BHrWZS0J0FuJuLqeNk6J7w 5.0 \n", "3 194 UjBwlySBW4iPpFWGOw5Xkw SE85OT0FKxeL28izk-5POg 4.0 \n", "4 3261 DrWMCBMRweRydBEk-OLKYg h3o-SqWjDeMI2fCJI63-jg 1.0 \n", "\n", " useful funny cool text \\\n", "0 0 0 0 Place rocks excellent children's activities an... \n", "1 0 0 0 We had been driving around for some time, on a... \n", "2 0 0 0 I love this little local business. They have e... \n", "3 3 0 0 This is another great local business. Our two... \n", "4 0 0 0 Waiter was absolutely terrible ordered our foo... \n", "\n", " date sentiment \n", "0 2016-07-04 20:56:17 0.436623 \n", "1 2018-07-17 01:07:49 0.200250 \n", "2 2017-01-20 14:13:47 0.266146 \n", "3 2011-03-25 17:36:39 0.481250 \n", "4 2021-11-06 02:07:15 -0.124603 \n", "\n", "[5 rows x 26 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check the derived column\n", "print(yelp_data['sentiment'].min(), yelp_data['sentiment'].max())\n", "yelp_data.head()" ] }, { "cell_type": "markdown", "id": "eca30f17", "metadata": {}, "source": [ "#### Visualization of Explicit Geo-Text Data\n", "We will use [folium](https://python-visualization.github.io/folium/latest/), a python plug-in to build an interactive map in leaflet.js. " ] }, { "cell_type": "code", "execution_count": 7, "id": "f626df91", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Defaulting to user installation because normal site-packages is not writeable\n", "Requirement already satisfied: folium in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (0.12.1.post1)\n", "Requirement already satisfied: branca>=0.3.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (0.4.2)\n", "Requirement already satisfied: jinja2>=2.9 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (3.0.3)\n", "Requirement already satisfied: numpy in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (1.22.0)\n", "Requirement already satisfied: requests in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from folium) (2.27.1)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from jinja2>=2.9->folium) (2.0.1)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (1.25.11)\n", "Requirement already satisfied: certifi>=2017.4.17 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (2021.10.8)\n", "Requirement already satisfied: charset-normalizer~=2.0.0 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (2.0.10)\n", "Requirement already satisfied: idna<4,>=2.5 in /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages (from requests->folium) (3.3)\n" ] } ], "source": [ "# install folium\n", "!pip install folium\n", "# alternative conda install\n", "# conda install -c conda-forge folium" ] }, { "cell_type": "code", "execution_count": 8, "id": "ffb09705", "metadata": {}, "outputs": [], "source": [ "# import libraries\n", "import folium\n", "import branca.colormap as cm\n", "from branca.element import Figure" ] }, { "cell_type": "code", "execution_count": 9, "id": "a2521fba", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(274, 26)\n" ] } ], "source": [ "# firstly, filter a selected neighborhood from the dataset\n", "select_neighbor = yelp_data[yelp_data['city']=='Edwardsville']\n", "\n", "print(select_neighbor.shape)\n", "# there are 274 businesses after filtering" ] }, { "cell_type": "code", "execution_count": 10, "id": "52ebb6a5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "-11" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# build a color map to visualize sentiment polarity\n", "rainbow = cm.StepColormap(['purple', 'lightblue', 'lightgreen', 'yellow', 'orange', 'red'], vmin=-1, vmax=1)\n", "rainbow" ] }, { "cell_type": "code", "execution_count": 11, "id": "85478d5e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a map instance with a frame\n", "fig = Figure(width=800, height=500)\n", "m = folium.Map(location=[38.8039, -89.9583], zoom_start=11)\n", "fig.add_child(m)\n", "\n", "# iterate each business to add a marker onto the basemap\n", "for index, row in select_neighbor.iterrows():\n", " iframe = folium.IFrame(row['text'])\n", " folium.Marker([row['latitude'], row['longitude']],\n", " popup=folium.Popup(iframe, min_width=300, max_width=300),\n", " icon=folium.Icon(color='lightgray', icon_color=rainbow(row['sentiment']))).add_to(m)\n", "\n", "m\n", "# Any observation about the spatial distribution pattern?" ] }, { "cell_type": "markdown", "id": "6de9d699", "metadata": {}, "source": [ "You can play with it by replacing the visualized attribute with other column, e.g., stars, or filter to a different neighborhood. You are also welcome to explore other regions for course project or out of personal interest. Please feel free to reach out to me (znwang@illinois.edu) about data or your cool project." ] }, { "cell_type": "markdown", "id": "f18a78ff", "metadata": {}, "source": [ "\n", "### Implicit Geo-Text Data Analysis\n", "According to [Twitter](https://developer.twitter.com/en/docs/tutorials/advanced-filtering-for-geo-data#:~:text=As%20mentioned%20in%20the%20review,contain%20some%20profile%20location%20information.), while only 1-2% of Tweets are geotagged, 30-40% of Tweets contain some location information. Similarly, [this GIScience'21 paper](https://arxiv.org/pdf/2009.12914.pdf) confirms that over 10% of Tweets contain some location references in the contents. Thus, it is important to perform text mining to extract these implicit geographic information from unstructed text data.\n", "\n", "Recently, researchers have been utilizing advanced NLP techniques to perform this task, which can be considered as a sub-task of Named Entity Recognition (NER). Instead of any named entities, like person names, time expression, we majorly focus on geospatial named entities, such as geopolitical entities, local organizations. We will use [Spacy](https://spacy.io/) again, as a general NER tool to recognize geo-entities from text data." ] }, { "cell_type": "code", "execution_count": 12, "id": "6f48906f", "metadata": {}, "outputs": [], "source": [ "# import required libraries\n", "import json\n", "import spacy\n", "from spacy import displacy # visualizer\n", "from collections import defaultdict\n", "from tqdm import tqdm" ] }, { "cell_type": "code", "execution_count": 13, "id": "96f0501d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 3/3 [00:00<00:00, 5236.33it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Length of data_list: 3\n", "Dozens, if not more than a hundred, Midland-area residents gathered to seek refuge within the walls of Midland High School Tuesday night after the Edenville Dam failed to hold back a deluge of water. Midland officials warned residents living near the Tittabawassee River to evacuate. They are concerned the Sanford Dam, located a few miles northwest of the city and downstream of the Edenville Dam, will also fail. Some drove to the school at 1301 Eastlawn Drive to seek shelter. Others were brought in by bus.\n", "Videos and images captured by witnesses show just how much water was unleashed when Michigan's Edenville Dam failed. Officials had been warning nearby residents to evacuate all day Tuesday because of fears the hydroelectric dam holding back Wixom Lake would break. It was announced on Facebook around 6 p.m. Tuesday that the dam had failed -- and a torrent of water was rushing down the Tittabawassee River. The water's unrelenting flow continued overnight and daylight on Wednesday showed how little was left of the lake. An aerial image taken by a drone shows the Edenville dam breach on Wednesday.\n", "Soaking rains from the remnants of Hurricane Ida prompted the evacuations of thousands of people Wednesday after water reached dangerous levels at a dam near Johnstown, PA. The storm moved east in the evening, with the National Weather Service confirming at least one tornado and social media posts showing homes blown to rubble and roofs torn from buildings in a southern New Jersey county just outside Philadelphia. Pennsylvania was blanketed with rain after high water drove some from their homes in Maryland and Virginia. The storm killed one person, two people were not accounted for, and a tornado was believed to have touched down along the Chesapeake Bay in Maryland. Ida caused countless school and business closures in Pennsylvania. About 150 roadways maintained by the Pennsylvania Department of Transportation were closed and many smaller roadways also were impassable.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# read text corpus and save into a list\n", "data_list = []\n", "with open('./data/news_samples.txt', encoding='utf-8') as f:\n", " readin = f.readlines()\n", " for line in tqdm(readin):\n", " data_list.append(line.strip())\n", "\n", "print(f'Length of data_list: {len(data_list)}')\n", "for text in data_list:\n", " print(text)" ] }, { "cell_type": "code", "execution_count": 14, "id": "d906fb58", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/3 [00:00
Dozens, if not more than a hundred, \n", "\n", " Midland\n", " GPE\n", "\n", "-area residents gathered to seek refuge within the walls of Midland High School Tuesday night after \n", "\n", " the Edenville Dam\n", " LOC\n", "\n", " failed to hold back a deluge of water. \n", "\n", " Midland\n", " GPE\n", "\n", " officials warned residents living near \n", "\n", " the Tittabawassee River\n", " LOC\n", "\n", " to evacuate. They are concerned \n", "\n", " the Sanford Dam\n", " LOC\n", "\n", ", located a few miles northwest of the city and downstream of \n", "\n", " the Edenville Dam\n", " LOC\n", "\n", ", will also fail. Some drove to the school at 1301 \n", "\n", " Eastlawn Drive\n", " LOC\n", "\n", " to seek shelter. Others were brought in by bus.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", " Videos\n", " GPE\n", "\n", " and images captured by witnesses show just how much water was unleashed when \n", "\n", " Michigan\n", " GPE\n", "\n", "'s \n", "\n", " Edenville Dam\n", " LOC\n", "\n", " failed. Officials had been warning nearby residents to evacuate all day Tuesday because of fears the hydroelectric dam holding back Wixom Lake would break. It was announced on Facebook around 6 p.m. Tuesday that the dam had failed -- and a torrent of water was rushing down \n", "\n", " the Tittabawassee River\n", " LOC\n", "\n", ". The water's unrelenting flow continued overnight and daylight on Wednesday showed how little was left of the lake. An aerial image taken by a drone shows the Edenville dam breach on Wednesday.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ " 67%|██████▋ | 2/3 [00:00<00:00, 10.84it/s]" ] }, { "data": { "text/html": [ "
Soaking rains from the remnants of Hurricane Ida prompted the evacuations of thousands of people Wednesday after water reached dangerous levels at a dam near \n", "\n", " Johnstown\n", " GPE\n", "\n", ", \n", "\n", " PA\n", " GPE\n", "\n", ". The storm moved east in the evening, with the National Weather Service confirming at least one tornado and social media posts showing homes blown to rubble and roofs torn from buildings in a southern \n", "\n", " New Jersey\n", " GPE\n", "\n", " county just outside \n", "\n", " Philadelphia\n", " GPE\n", "\n", ". \n", "\n", " Pennsylvania\n", " GPE\n", "\n", " was blanketed with rain after high water drove some from their homes in \n", "\n", " Maryland\n", " GPE\n", "\n", " and \n", "\n", " Virginia\n", " GPE\n", "\n", ". The storm killed one person, two people were not accounted for, and a tornado was believed to have touched down along \n", "\n", " the Chesapeake Bay\n", " LOC\n", "\n", " in \n", "\n", " Maryland\n", " GPE\n", "\n", ". Ida caused countless school and business closures in \n", "\n", " Pennsylvania\n", " GPE\n", "\n", ". About 150 roadways maintained by the Pennsylvania Department of Transportation were closed and many smaller roadways also were impassable.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 3/3 [00:00<00:00, 9.67it/s]\n" ] } ], "source": [ "# load spacy pipeline\n", "nlp = spacy.load('en_core_web_sm')\n", "\n", "# iterate through the news\n", "for i, text in enumerate(tqdm(data_list)):\n", " doc = nlp(text)\n", " \n", " entity_dict = defaultdict(int)\n", " for entity in doc.ents:\n", " if entity.label_ in ['LOC', 'GPE']: # LOCation, GeoPolitical Entity (i.e. countries, cities, states)\n", " entity_dict[entity.label_ + '_' + entity.text] += 1\n", " \n", " # visualize NER results\n", " displacy.render(doc, style='ent', options={\"ents\": ['LOC', 'GPE']}, jupyter=True)\n", " \n", " # save recognized entities into json\n", " with open(f'./data/NER_{i}.txt', 'w') as fout:\n", " fout.write(json.dumps(entity_dict) + '\\n')" ] }, { "cell_type": "markdown", "id": "ebe1ed93", "metadata": {}, "source": [ "#### Optional: Visualization of Implicit Geo-Text Data (Geocoding Service Required)" ] }, { "cell_type": "code", "execution_count": null, "id": "ea0b074e", "metadata": {}, "outputs": [], "source": [ "# import libraries\n", "import requests\n", "import folium\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "id": "13e4efaa", "metadata": {}, "outputs": [], "source": [ "# load target news' NER results\n", "target_news = 1\n", "with open(f'./data/NER_{i}.txt') as f:\n", " ner = f.read()\n", "ner_list = ner.split('\\n')\n", "ner_num = ner_list[0]\n", "ner_js = json.loads(ner_num)\n", "ner_js" ] }, { "cell_type": "code", "execution_count": null, "id": "53f63342", "metadata": {}, "outputs": [], "source": [ "ner_class = {}\n", "for key in ner_js.keys():\n", " class_ = key[:3]\n", " if class_ not in ner_class.keys():\n", " ner_class[class_] = {}" ] }, { "cell_type": "code", "execution_count": null, "id": "fcc8c0b2", "metadata": {}, "outputs": [], "source": [ "# need Google Maps API key\n", "my_Google_Maps_API_key = 'your_Google_Maps_API_key'\n", "for key in ner_js.keys():\n", " class_, place_name = key.split('_')\n", " if place_name not in ner_class[class_].keys():\n", " response = requests.get(f'https://maps.googleapis.com/maps/api/geocode/json?address={place_name}&key={my_Google_Maps_API_key}')\n", " if response.json()['results']:\n", " ner_class[class_][place_name] = response.json()['results'][0]['geometry']['location']" ] }, { "cell_type": "code", "execution_count": null, "id": "aba29188", "metadata": {}, "outputs": [], "source": [ "# Create a map instance with a frame\n", "fig = Figure(width=800, height=500)\n", "m = folium.Map(location=[38, -97], tiles=\"cartodbpositron\", zoom_start=6)\n", "fig.add_child(m)\n", "\n", "# LOC\n", "for key in ner_class['LOC']:\n", " lat, lon = ner_class['LOC'][key]['lat'], ner_class['LOC'][key]['lng']\n", " folium.Marker([lat, lon], popup=key, icon=folium.Icon(color='red'),).add_to(m)\n", "# GPE\n", "for key in ner_class['GPE']:\n", " lat, lon = ner_class['GPE'][key]['lat'], ner_class['GPE'][key]['lng']\n", " folium.Marker([lat, lon], popup=key, icon=folium.Icon(color='blue'),).add_to(m)\n", "\n", "m" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3-0.9.0", "language": "python", "name": "python3-0.9.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }