{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Integrating CyberGIS and Urban Sensing for Reproducible Streaming Analytics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This Jupyter notebook demostrates the book chapter named Integrating CyberGIS and Urban Sensing for Reproducible Streaming Analytics, including showing the locations for AoT sensors and the temperature curve for AoT nodes in Chicago.\n", "\n", "We are using Chicago, IL, US as our study area. And this notebook uses geospatial libraries to show the deployment of AoT Nodes, spatial distrinution for AoT nodes in Chicago, and Temperature trends in Chiacago for one week." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook Outline\n", "- [Data preparing](#Data)\n", " - [Setup](#setup)\n", " - [AoT Data](#AOT)\n", " - [Boundary and Street data](#SHP)\n", "- [Exploratory Analysis](#explore)\n", " - [Deployment of AoT Nodes](#deploy)\n", " - [Spatial distrinution for AoT nodes in Chicago](#spatial)\n", " - [Temperature trends in Chiacago for one week](#statistical)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Data Preparation\n", "\n", "The first part is a demostration that shows user how to prepare AoT Data,boundary data, and street data in Chicago." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Set up the environment by importing libraries\n", "Import numpy, pandas, geopandas, shapely and other libraries available in CyberGIS-Jupyter to set up an environment to store and manipulate the AoT data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pathlib\n", "import os\n", "import tarfile\n", "\n", "#show Chicago GTFS data\n", "import requests\n", "import shutil\n", "import zipfile\n", "\n", "# using pandas to \n", "import pandas as pd\n", "\n", "# Plotting the deployment of AoT nodes in Chicago\n", "import matplotlib.pyplot as plt\n", "import datetime\n", "%matplotlib inline\n", "\n", "import numpy as np\n", "import geopandas as gpd\n", "from shapely.geometry import Point\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### AoT data for Chicago\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the AoT data for Chicago in one week (between Sep 30th to Oct 6th in 2019.)\n", "\n", "The AoT data was downloaded from https://api.arrayofthings.org/.\n", "The boundary data and street data were generated from https://www.openstreetmap.org/.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file = pathlib.Path(\"chicago-complete.weekly.2019-09-30-to-2019-10-06.tar\")\n", "if file.exists ():\n", " print (\"AoT data exist\")\n", "else:\n", " print (\"AoT data not exist, Downloading the AoT data...\")\n", " #!wget https://s3.amazonaws.com/aot-tarballs/chicago-complete.weekly.2019-09-30-to-2019-10-06.tar\n", " !wget https://www.hydroshare.org/resource/f184c03de5df442aa5a2ec01fa3252f7/data/contents/chicago-complete.weekly.2019-09-30-to-2019-10-06.tar\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create data directory if the directory not exist" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.path.exists('./data'):\n", " print (\"Directory not exist, create directory\")\n", " os.mkdir('./data')\n", "else:\n", " print (\"Directory exist\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extract the weekly chicago AoT data into data directory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file_tar = tarfile.open('chicago-complete.weekly.2019-09-30-to-2019-10-06.tar')\n", "#specify which folder to extract\n", "file_tar.extractall('./data') \n", "file_tar.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the file path for the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fn = './data/chicago-complete.weekly.2019-09-30-to-2019-10-06/data.csv.gz'\n", "#print(os.path.isfile(fn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the data directory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# data path\n", "AoT_Data_Directory = './data/chicago-complete.weekly.2019-09-30-to-2019-10-06/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Boundary and Street data for Chicago\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the boundary and street data for Chicago" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "shpfile = pathlib.Path(\"shp.zip\")\n", "if shpfile.exists ():\n", " print (\"Shp Data exist\")\n", "else:\n", " print (\"Shp Data not exist, Downloading the data...\")\n", " !wget https://www.hydroshare.org/resource/f184c03de5df442aa5a2ec01fa3252f7/data/contents/shp.zip\n", "\n", "with zipfile.ZipFile('shp.zip', 'r') as file:\n", " file.extractall('./data')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "read sensor information from nodes.csv where values are seperated by \",\" and display the first 15 rows of data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nodes = pd.read_csv(AoT_Data_Directory + 'nodes.csv', sep=\",\")\n", "\n", "# the first 15 rows of data is displayed below\n", "nodes.head(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show the statistical number of the AoT Sensors" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nodes.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Exploratory Analysis\n", "\n", "The part is a demostration that shows Deployment of AoT Nodes, spatial distrinution for AoT nodes in Chicago, and Temperature trends in Chiacago for one week." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Deployment of AoT Nodes in Chicago" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show the Deployment of AoT Nodes of Chicago in each month" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Convert that column into a datetime datatype\n", "nodes['start_timestamp'] = pd.to_datetime(nodes['start_timestamp'])\n", "# Set the datetime column as the index\n", "nodes.index = nodes['start_timestamp'] \n", "\n", "# Count up the number of nodes deployed in each month\n", "nodecount = nodes['node_id'].resample('M').count()\n", "\n", "plt.style.use('seaborn-darkgrid')\n", "\n", "#plot data as a bar chart\n", "fig, ax = plt.subplots()\n", "nodecount.plot(kind='bar', ax = ax, figsize=[20,9], color=(0.2, 0.4, 0.6, 0.6))\n", "\n", "f = lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S').strftime('%b %Y')\n", "ax.set_xticklabels([ f(x.get_text()) for x in ax.get_xticklabels()])\n", "plt.xticks(fontsize = 20, rotation=60)\n", "plt.yticks(fontsize = 20)\n", "\n", "# Set title and labels\n", "ax.set_title('Deployment of AoT Nodes in Chicago',fontsize = 25, fontdict = {'verticalalignment':'bottom'})\n", "ax.set_xlabel('AoT Node Date',fontsize = 25, labelpad=25)\n", "ax.set_ylabel('Number of Nodes',fontsize = 25, labelpad=25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show the Deployment of AoT Nodes in Chicago using cumulative value" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nodes['start_timestamp'] = pd.to_datetime(nodes['start_timestamp'])\n", "nodes.index = nodes['start_timestamp'] \n", "\n", "nodecount = nodes['node_id'].resample('M').count()\n", "\n", "plt.style.use('seaborn-darkgrid')\n", "\n", "fig, ax = plt.subplots()\n", "nodecount.sort_index().cumsum().plot(ax = ax, figsize=[20,9], color=(0.2, 0.4, 0.6, 0.6), linewidth=4)\n", "\n", "plt.tick_params(axis='both', which='both', labelsize=20)\n", "\n", "ax.set_title('Deployment of AoT Nodes in Chicago',fontsize = 25, fontdict = {'verticalalignment':'bottom'})\n", "ax.set_xlabel('AoT Node Date',fontsize = 25, labelpad=25)\n", "ax.set_ylabel('Number of Nodes',fontsize = 25, labelpad=25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Spatial distrinution for AoT nodes in Chicago" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting global variables for shapefiles for Chicago street and region" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "Chicago_Streets_Shapefiles = './data/shp/street-chicago.shp'\n", "Chicago_Boundary_Shapefile = './data/shp/il-chicago.shp'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show the sensor data in Chicago, red nodes are active and blue nodes are no longer reporting data " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "node_locations = pd.DataFrame()\n", "# Pull the longitude and latitude information from the \"nodes\" dataframe\n", "# Use zip to create set of lon, lat tuples then put into a list\n", "node_locations['Coordinates'] = list(zip(nodes.lon, nodes.lat))\n", "\n", "node_locations['Coordinates'] = node_locations['Coordinates'].apply(Point)\n", "\n", "node_locationsDF = gpd.GeoDataFrame(node_locations, geometry='Coordinates')\n", "\n", "# create a new column in the nodes dataframe called status, if the end timestamp is NaN then r is placed in \n", "# the column otherwise a b is placed in column - this translates to red nodes are active and blue nodes\n", "# are no longer reporting data\n", "nodes['status'] = np.where(nodes['end_timestamp'].isnull(), 'r', 'b')\n", "\n", "plt.style.use('default')\n", "\n", "# This reads in a shapefile\n", "streetmap = gpd.read_file(Chicago_Streets_Shapefiles)\n", "\n", "f, ax = plt.subplots(1, figsize=(20, 20))\n", "\n", "# Plots the Chicago streets\n", "streetmap.plot(ax=ax, color='silver', zorder = 0)\n", "\n", "boundary = gpd.read_file(Chicago_Boundary_Shapefile)\n", "# Plots the boundary of Chicago \n", "#boundary.plot(ax = ax, alpha = 0.1, linewidth = 1.0, edgecolor = 'red', zorder = 5)\n", "boundary.plot(ax=ax, color='white', alpha = 0.4, linewidth=5.5, edgecolor='red', zorder = 5)\n", "\n", "# Plot locations on the map as a new layer - zorder ensures the nodes are on top and\n", "for node_status, node in nodes.groupby('status'):\n", " ax.plot(node['lon'], node['lat'], marker='o', linestyle='', ms=5, label=node_status, zorder = 10)\n", "\n", "ax.legend(['Not Active','Active'],fontsize=20)\n", "\n", "ax.set_axis_off()\n", "ax.set_title('Spatial distribution for AoT nodes in Chicago', fontsize = 25, fontdict = {'verticalalignment':'bottom'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Temperature trends in Chiacago for one week" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load one week data (about 50 seconds) and show the first 10 rows of data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#load full dataset (time the loading via %time)\n", "#about 50 seconds\n", "%time weekdata = pd.read_csv(fn, compression='gzip')\n", "#%time weekdata = pd.read_csv(AoT_Data_Directory + 'data.csv',sep=\",\")\n", "weekdata.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Converts the timestamp type to datetime and convert the temperature values into floats" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time temperature = weekdata[weekdata['parameter'] == 'temperature'].copy()\n", "# Convert that column into a datetime datatype\n", "temperature['timestamp'] = pd.to_datetime(temperature['timestamp'])\n", "\n", "# Converts the timestamp type to datetime\n", "temperature.timestamp = pd.to_datetime(temperature.timestamp)\n", "\n", "# Convert the temperature values into floats so they can be plotted\n", "%time temperature.value_hrf = pd.to_numeric(temperature['value_hrf'], errors='coerce').fillna(0)\n", "temperature.index = temperature['timestamp']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show the temperature trends in Chiacago for one week using \"°C\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TSYS01 = temperature[temperature['sensor'] == 'tsys01']\n", "TSYS01.index = TSYS01['timestamp']\n", "\n", "del TSYS01['timestamp']\n", "\n", "# Set the graphing style\n", "plt.style.use('seaborn-darkgrid')\n", "\n", "#plot data as a bar chart\n", "fig, ax = plt.subplots()\n", "\n", "plt.xticks(fontsize = 20, rotation=60)\n", "plt.yticks(fontsize = 20)\n", "\n", " \n", "ax.set_ylim(0, 50)\n", "\n", "# Set title and labels\n", "#Temperature for All Nodes in Chicago within one week\n", "ax.set_title('Temperature for All Nodes for Sensor TSYS01',fontsize = 25, fontdict = {'verticalalignment':'bottom'})\n", "ax.set_ylabel('Temperature (°C)',fontsize = 25, labelpad=25)\n", "\n", "\n", "for nodeId, t in TSYS01.groupby('node_id'):\n", " t['value_hrf'].plot(ax = ax, figsize=[20,9], linewidth=0.1)\n", " \n", "ax.set_xlabel('Week of Data', fontsize = 25, labelpad=25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert celsius to fahrenheit" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# function for converting celsius to fahrenheit\n", "def celsius_to_fahrenheit(temp):\n", " newtemp = temp*1.8 + 32.0\n", " return newtemp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#ignore warning with chained copy warning\n", "pd.set_option('mode.chained_assignment', None)\n", "TSYS01[\"common_temp\"] = celsius_to_fahrenheit(TSYS01[\"value_hrf\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show the temperature trends in Chiacago for one week using \"F\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set the graphing style \n", "plt.style.use('seaborn-darkgrid')\n", "\n", "#plot data as a bar chart\n", "fig, ax = plt.subplots()\n", "\n", "plt.xticks(fontsize = 20, rotation=60)\n", "plt.yticks(fontsize = 20)\n", "\n", "ax.set_ylim(0, 120)\n", "\n", "# Set title and labels\n", "ax.set_title('Temperature for All Nodes for Sensor TSYS01',fontsize = 25, fontdict = {'verticalalignment':'bottom'})\n", "ax.set_ylabel('Temperature (F)',fontsize = 25, labelpad=25)\n", " \n", "for nodeId, t in TSYS01.groupby('node_id'):\n", " t['common_temp'].plot(ax = ax, figsize=[20,9], linewidth=0.1)\n", " \n", "ax.set_xlabel('Week of Data', fontsize = 25, labelpad=25)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 2 }