{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Spatial exploratory data analysis\n", "\n", "In the previous lessons, we learnt about several ways of analyzing data visually such as using bar plots, scatter plots, and categorical plots. When our datsset consists of geospatial information such as zipcodes, states, and geographical coordinates, we can further explore by overlaying the data on top of spatial maps. In this lesson, you will learn to perform simple visual analysis of spatial data with Pandas and `folium` library using on-time performance data of dometics flights. The data forthis exercise was downloaded from Kaggle. We will explore the data to understand if there a geospatial pattern in the data using visualization. \n", "\n", "The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report and in this dataset of 2015 flight delays and cancellations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is folium\n", "\n", "Folium is a Python library that makes it easier to create maps using Leaflet, which is an open source javascript library for creating interactive maps. Folium map can be used for a range of purposes from simple visualization to creating interactive dashboard applications.\n", "\n", "To install folium with pip:\n", "```python \n", "pip install folium\n", "```\n", "or with conda:\n", "\n", "```python\n", "conda install -c conda-forge folium\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vizualize the number of flight by airports" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Read data into pandas dataframe" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import folium" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "DATA_DIR = \"/home/asimbanskota/t81_577_data_science/weekly_materials/week7/files\"\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "airlines = os.path.join(DATA_DIR, 'airlines.csv')\n", "airports = os.path.join(DATA_DIR, 'airports.csv')\n", "flights = os.path.join(DATA_DIR, 'flights.csv')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/asimbanskota/anaconda3/envs/api/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3062: DtypeWarning: Columns (7,8) have mixed types.Specify dtype option on import or set low_memory=False.\n", " has_raised = await self.run_ast_nodes(code_ast.body, cell_name,\n" ] } ], "source": [ "df_air = pd.read_csv(airlines)\n", "df_ap = pd.read_csv(airports)\n", "df = pd.read_csv(flights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets create a feature that keeps the value of total number of flights by airport." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "count_flights = df['ORIGIN_AIRPORT'].value_counts().reset_index()\n", "count_flights.rename({'index': 'IATA_CODE', 'ORIGIN_AIRPORT': 'count_flights'}, axis = 1, inplace = True)\n", "df_ap = df_ap.merge(count_flights, on = 'IATA_CODE')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maps are defined as a folium.Map object. We start with creating a base map by providing the latitude and logitude of the center of the map. This will instantiate a map object for a given location ( 45 degree lat and 96 degree west). Once the base map is created, other map objects can be incrementally added on top of the folium.Map. \n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "m = folium.Map(location=[45, -96], zoom_start =4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " We can access and display the map object within the notebook simply by referring to its name `m`. " ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "