{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Web Scraping and EDA in Python 3 using Requests, BeautifulSoup, Pandas, Matplotlib, Seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Source: \n", "## Thomas Brinkhoff: City Population, http://www.citypopulation.de\n", "\n", "In this jupyter notebook I will try to explain how to scrape content from a website using BeautifulSoup and Requests libraries.\n", "\n", "Please note there might be some policies and rules for a website for using the data. So before you do the web scraping please do not forget to read the data usage policies.\n", "\n", "For this article's purpose I am scraping the data from www.citypopulation.de website which has population statistics for different countries.\n", "\n", "Data use policy: http://citypopulation.de/termsofuse.html (DATA -> Population Data)\n", "\n", "\n", "\n", "### The data that I will be extracting in this jupyter notebook is for Oceania -> NEW ZEALAND http://citypopulation.de/en/newzealand/\n", "- I am only scraping data for North and South Islands (excluded Chatham islands)\n", "- North island: http://citypopulation.de/en/newzealand/northisland/\n", "- South island: http://citypopulation.de/en/newzealand/southisland/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Let us scrape the website and fetch the information using requests and beautifulsoup libraries\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Scraping website" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Python environment versions (for reference)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.7.0\n" ] } ], "source": [ "!python --version" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pip 19.3.1 from c:\\python37\\lib\\site-packages\\pip (python 3.7)\n", "\n" ] } ], "source": [ "!pip --version" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Importing libraries\n", "import requests\n", "import bs4\n", "from bs4 import BeautifulSoup\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requests version: 2.19.1\n", "BeautifulSoup version: 4.7.1\n", "Pandas version: 0.23.4\n" ] } ], "source": [ "print('Requests version: {}'.format(requests.__version__))\n", "print('BeautifulSoup version: {}'.format(bs4.__version__))\n", "print('Pandas version: {}'.format(pd.__version__))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# URLs to scrape\n", "# This is a dictionary object with URLs. \n", "# We will use this dictionary to scrape information for each url at a time.\n", "urls = {\n", " 'north': 'http://citypopulation.de/en/newzealand/northisland',\n", " 'south': 'http://citypopulation.de/en/newzealand/southisland/'\n", "}" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<Response [200]>\n" ] } ], "source": [ "# Using requests to get the information\n", "output = requests.get(urls['north'])\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### If you see a HTTP status code 200, meaning our request was a success\n", "\n", "#### Reference link here: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'<!DOCTYPE html>\\r\\n<html lang=\"en\">\\r\\n<head>\\r\\n<meta charset=\"utf-8\">\\r\\n<meta name=\"description\" content=\"North Island (New Zealand): Regions & Settlements with population statistics, charts and maps.\"'" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# What's in the output?\n", "# Let's output upto 200 characters \n", "output.text[:200]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "233654\n", "<class 'str'>\n" ] } ], "source": [ "print(len(output.text))\n", "print(type(output.text))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### As you see the output above, output.text has the html tags and content of the webpage.\n", "\n", "#### Please know that you need some understanding of HTML and tags.\n", "\n", "#### Now let's use BeautifulSoup to parse the output.text string." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "bs_output = BeautifulSoup(markup=output.text, features=\"html.parser\")" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(bs_output.contents)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(bs_output.contents)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['html', '\\n']" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bs_output.contents[:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Feel free to print the whole output of the contents\n", "#### bs_output.contents displays the html tags and content of the URL\n", "\n", "The beauty of BeautifulSoup's parser is that you can interact with each elements and parts of html tags including classes and id values.\n", "\n", "You might wonder wonder what is the difference between requests' output.text and bs4's bs_output.contents?\n", "- Well requests output is a single string vs bs4's output is a list of objects\n", "\n", "Example below:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[<a href=\"/\">Home</a>,\n", " <a href=\"/Oceania.html\" itemprop=\"url\"><span itemprop=\"name\">Oceania</span></a>,\n", " <a href=\"/en/newzealand/\" itemprop=\"url\"><span itemprop=\"name\">New Zealand</span></a>,\n", " <a href=\"javascript:cp.changePageLang('en','de')\"><img alt=\"\" src=\"/images/icons/de.svg\" title=\"Deutsch\"/></a>,\n", " <a href=\"javascript:openMap()\"><img alt=\"Show Map\" id=\"smap\" src=\"/images/smaps/newzealand-cities.png\" title=\"Show Map\"/></a>]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bs_output.find_all('a')[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### You see the output above? \n", "\n", "I've printed 5 items from the list output of BeautifulSoup's find_all function. I passed <a> tag as 'a' to find all <a> tag elements in the bs_output. Likewise you can extract and play around with all the html tags and their contents.\n", " \n", "#### Before we do further extraction let us try to understand which parts of our URL page we would like to extract the data from." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Exploring HTML content and extraction\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "# Our data is in the <table> tag with id='ts'\n", "#\n", "table_output = bs_output.find(name='table', attrs={'id': 'ts'})" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['\\n', <thead>\n", " <tr id=\"tsh\"><th class=\"rname\" data-coltype=\"name\" onclick=\"javascript:sort('ts',0,false)\"><a href=\"javascript:sort('ts',0,false)\">Name</a></th>\n", " <th class=\"rstatus\" data-coltype=\"status\" onclick=\"javascript:sort('ts',1,false)\"><a href=\"javascript:sort('ts',1,false)\">Status</a></th><th class=\"radm rarea\" data-coltype=\"adm\" onclick=\"javascript:sort('ts',2,false)\"><a href=\"javascript:sort('ts',2,false)\">Region</a></th><th class=\"rpop prio5\" data-coldate=\"1996-06-30\" data-colhead=\"E 1996-06-30\" data-coltype=\"pop\" onclick=\"javascript:sort('ts',3,true)\"><a href=\"javascript:sort('ts',3,true)\">Population</a><br/><span class=\"unit\">Estimate<br/>1996-06-30</span></th><th class=\"rpop prio4\" data-coldate=\"2001-06-30\" data-colhead=\"E 2001-06-30\" data-coltype=\"pop\" onclick=\"javascript:sort('ts',4,true)\"><a href=\"javascript:sort('ts',4,true)\">Population</a><br/><span class=\"unit\">Estimate<br/>2001-06-30</span></th><th class=\"rpop prio3\" data-coldate=\"2006-06-30\" data-colhead=\"E 2006-06-30\" data-coltype=\"pop\" onclick=\"javascript:sort('ts',5,true)\"><a href=\"javascript:sort('ts',5,true)\">Population</a><br/><span class=\"unit\">Estimate<br/>2006-06-30</span></th><th class=\"rpop prio2\" data-coldate=\"2013-06-30\" data-colhead=\"E 2013-06-30\" data-coltype=\"pop\" onclick=\"javascript:sort('ts',6,true)\"><a href=\"javascript:sort('ts',6,true)\">Population</a><br/><span class=\"unit\">Estimate<br/>2013-06-30</span></th><th class=\"rpop prio1\" data-coldate=\"2018-06-30\" data-colhead=\"E 2018-06-30\" data-coltype=\"pop\" onclick=\"javascript:sort('ts',7,true)\"><a href=\"javascript:sort('ts',7,true)\">Population</a><br/><span class=\"unit\">Estimate<br/>2018-06-30</span></th><th class=\"sc\" data-coltype=\"other\"> </th></tr>\n", " </thead>]" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table_output.contents[:2] # prints a list of tag elements" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Name',\n", " 'Status',\n", " 'Region',\n", " 'PopulationEstimate1996-06-30',\n", " 'PopulationEstimate2001-06-30',\n", " 'PopulationEstimate2006-06-30',\n", " 'PopulationEstimate2013-06-30',\n", " 'PopulationEstimate2018-06-30',\n", " '\\xa0']" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extracting column names from <tr> tag\n", "[x.text for x in table_output.find_all('th')] # outputs a list of values of the <th> elements in the table output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Notice the output above, we don't need the last value" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Name',\n", " 'Status',\n", " 'Region',\n", " 'PopulationEstimate1996-06-30',\n", " 'PopulationEstimate2001-06-30',\n", " 'PopulationEstimate2006-06-30',\n", " 'PopulationEstimate2013-06-30',\n", " 'PopulationEstimate2018-06-30']" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Append the column names into table_columns empty list\n", "#\n", "table_columns = [x.get_text() for x in table_output.find_all('th')][:-1]\n", "table_columns" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "# Extracting table output which is in <tbody> tag\n", "#\n", "table_body = table_output.find_all('tbody')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Feel free to print the output of table_body, it will be long output result" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "north_island_output = []\n", "for item in table_body:\n", " rows = item.find_all('tr') # extracts <tr> elements in <tbody>\n", " for row in rows:\n", " td = row.find_all('td') # extracts <td> elements in each row i.e. <tr>\n", " td_values = [val.text for val in td] # extracts value of each <td>\n", " north_island_output.append(td_values) # appends values to the list" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Algies Bay',\n", " 'Rural Settlement',\n", " 'Auckland',\n", " '550',\n", " '690',\n", " '800',\n", " '870',\n", " '980',\n", " '→']" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_output[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Again, in the output above we don't need the last value, so let's fix that" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [], "source": [ "north_island_output = []\n", "for item in table_body:\n", " rows = item.find_all('tr') # extracts <tr> elements in <tbody>\n", " for row in rows:\n", " td = row.find_all('td') # extracts <td> elements in each row i.e. <tr>\n", " td_values = [val.text for val in td] # extracts value of each <td>\n", " north_island_output.append(td_values[:-1]) # appends values to the list, also excludes the last value that is not needed" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Algies Bay',\n", " 'Rural Settlement',\n", " 'Auckland',\n", " '550',\n", " '690',\n", " '800',\n", " '870',\n", " '980']" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_output[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### That's it, we have our table column names and the table data append those two into a dataframe" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Name',\n", " 'Status',\n", " 'Region',\n", " 'PopulationEstimate1996-06-30',\n", " 'PopulationEstimate2001-06-30',\n", " 'PopulationEstimate2006-06-30',\n", " 'PopulationEstimate2013-06-30',\n", " 'PopulationEstimate2018-06-30']" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table_columns" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [], "source": [ "north_data = pd.DataFrame(\n", " data=north_island_output,\n", " columns=table_columns\n", ")" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate1996-06-30</th>\n", " <th>PopulationEstimate2001-06-30</th>\n", " <th>PopulationEstimate2006-06-30</th>\n", " <th>PopulationEstimate2013-06-30</th>\n", " <th>PopulationEstimate2018-06-30</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahipara</td>\n", " <td>Rural Settlement</td>\n", " <td>Northland</td>\n", " <td>930</td>\n", " <td>1,050</td>\n", " <td>1,120</td>\n", " <td>1,130</td>\n", " <td>1,180</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Algies Bay</td>\n", " <td>Rural Settlement</td>\n", " <td>Auckland</td>\n", " <td>550</td>\n", " <td>690</td>\n", " <td>800</td>\n", " <td>870</td>\n", " <td>980</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Arapuni</td>\n", " <td>Rural Settlement</td>\n", " <td>Waikato</td>\n", " <td>290</td>\n", " <td>260</td>\n", " <td>230</td>\n", " <td>250</td>\n", " <td>260</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Ashhurst</td>\n", " <td>Small Urban Area</td>\n", " <td>Manawatu-Wanganui</td>\n", " <td>2,530</td>\n", " <td>2,520</td>\n", " <td>2,510</td>\n", " <td>2,750</td>\n", " <td>2,990</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Athenree</td>\n", " <td>Rural Settlement</td>\n", " <td>Bay of Plenty</td>\n", " <td>510</td>\n", " <td>530</td>\n", " <td>630</td>\n", " <td>700</td>\n", " <td>740</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region \\\n", "0 Ahipara Rural Settlement Northland \n", "1 Algies Bay Rural Settlement Auckland \n", "2 Arapuni Rural Settlement Waikato \n", "3 Ashhurst Small Urban Area Manawatu-Wanganui \n", "4 Athenree Rural Settlement Bay of Plenty \n", "\n", " PopulationEstimate1996-06-30 PopulationEstimate2001-06-30 \\\n", "0 930 1,050 \n", "1 550 690 \n", "2 290 260 \n", "3 2,530 2,520 \n", "4 510 530 \n", "\n", " PopulationEstimate2006-06-30 PopulationEstimate2013-06-30 \\\n", "0 1,120 1,130 \n", "1 800 870 \n", "2 230 250 \n", "3 2,510 2,750 \n", "4 630 700 \n", "\n", " PopulationEstimate2018-06-30 \n", "0 1,180 \n", "1 980 \n", "2 260 \n", "3 2,990 \n", "4 740 " ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### We can follow the same procedure above for south island data as well. \n", "\n", "### For a better approach let us put the whole process using a function." ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [], "source": [ "# Importing libraries\n", "import requests\n", "from bs4 import BeautifulSoup\n", "import pandas as pd\n", "\n", "# Variables\n", "south_island_output = []\n", "north_island_output = []\n", "table_columns = []\n", "\n", "# URLs for North and South islands\n", "urls = {\n", " 'north': 'http://citypopulation.de/en/newzealand/northisland',\n", " 'south': 'http://citypopulation.de/en/newzealand/southisland/'\n", "}\n", "\n", "# Function that downloads the data\n", "def download_data():\n", " \"\"\"\n", " Function extracts td values by looping each child element of the parent.\n", " \n", " Two empty lists south_island_output and north_island_output are initialised.\n", " \n", " A urls dictionary object with north island and south island urls is also initialised.\n", " \n", " Pseudo code:\n", " - for each item in the dictionary\n", " - connect to the url\n", " - if success (response code == 200), then loop through the page data\n", " - for each row_item in body (loop - look for <tr> element):\n", " - for each row in the row_item (loop and look for <td> element):\n", " - for each <td> element, extract the text value\n", " - finally append those text values into output list\n", " \"\"\" \n", " for url in urls:\n", " print(url, urls[url])\n", " \n", " ## response\n", " response = requests.get(urls[url])\n", " \n", " if response.status_code == 200:\n", " print('Response code is 200. Success!')\n", " try:\n", " ## web scraping\n", " soup = BeautifulSoup(response.text, \"html.parser\")\n", " table = soup.find(name='table', attrs={'id': 'ts'})\n", " table_columns.append([x.get_text() for x in table.find_all('th')][:-1])\n", " body = table.find_all('tbody')\n", " for item in body:\n", " rows = item.find_all('tr') \n", " for row in rows:\n", " td = row.find_all('td')\n", " td_values = [val.text for val in td]\n", " if url == 'north':\n", " north_island_output.append(td_values[:-1]) # excluding last column that has an arrow as a value\n", " else:\n", " south_island_output.append(td_values[:-1]) # excluding last column that has an arrow as a value\n", " except Exception as ex:\n", " print(str(ex))\n", " else:\n", " print('Oops! {0}'.format(response.status_code))" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "north http://citypopulation.de/en/newzealand/northisland\n", "Response code is 200. Success!\n", "south http://citypopulation.de/en/newzealand/southisland/\n", "Response code is 200. Success!\n" ] } ], "source": [ "download_data()" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Name',\n", " 'Status',\n", " 'Region',\n", " 'PopulationEstimate1996-06-30',\n", " 'PopulationEstimate2001-06-30',\n", " 'PopulationEstimate2006-06-30',\n", " 'PopulationEstimate2013-06-30',\n", " 'PopulationEstimate2018-06-30']" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table_columns[0]" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [], "source": [ "# North island dataframe\n", "north_island_data = pd.DataFrame(data=north_island_output, columns=table_columns[0])\n", "\n", "# South island dataframe\n", "south_island_data = pd.DataFrame(data=south_island_output, columns=table_columns[0])" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate1996-06-30</th>\n", " <th>PopulationEstimate2001-06-30</th>\n", " <th>PopulationEstimate2006-06-30</th>\n", " <th>PopulationEstimate2013-06-30</th>\n", " <th>PopulationEstimate2018-06-30</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahipara</td>\n", " <td>Rural Settlement</td>\n", " <td>Northland</td>\n", " <td>930</td>\n", " <td>1,050</td>\n", " <td>1,120</td>\n", " <td>1,130</td>\n", " <td>1,180</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Algies Bay</td>\n", " <td>Rural Settlement</td>\n", " <td>Auckland</td>\n", " <td>550</td>\n", " <td>690</td>\n", " <td>800</td>\n", " <td>870</td>\n", " <td>980</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Arapuni</td>\n", " <td>Rural Settlement</td>\n", " <td>Waikato</td>\n", " <td>290</td>\n", " <td>260</td>\n", " <td>230</td>\n", " <td>250</td>\n", " <td>260</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Ashhurst</td>\n", " <td>Small Urban Area</td>\n", " <td>Manawatu-Wanganui</td>\n", " <td>2,530</td>\n", " <td>2,520</td>\n", " <td>2,510</td>\n", " <td>2,750</td>\n", " <td>2,990</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Athenree</td>\n", " <td>Rural Settlement</td>\n", " <td>Bay of Plenty</td>\n", " <td>510</td>\n", " <td>530</td>\n", " <td>630</td>\n", " <td>700</td>\n", " <td>740</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region \\\n", "0 Ahipara Rural Settlement Northland \n", "1 Algies Bay Rural Settlement Auckland \n", "2 Arapuni Rural Settlement Waikato \n", "3 Ashhurst Small Urban Area Manawatu-Wanganui \n", "4 Athenree Rural Settlement Bay of Plenty \n", "\n", " PopulationEstimate1996-06-30 PopulationEstimate2001-06-30 \\\n", "0 930 1,050 \n", "1 550 690 \n", "2 290 260 \n", "3 2,530 2,520 \n", "4 510 530 \n", "\n", " PopulationEstimate2006-06-30 PopulationEstimate2013-06-30 \\\n", "0 1,120 1,130 \n", "1 800 870 \n", "2 230 250 \n", "3 2,510 2,750 \n", "4 630 700 \n", "\n", " PopulationEstimate2018-06-30 \n", "0 1,180 \n", "1 980 \n", "2 260 \n", "3 2,990 \n", "4 740 " ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.head()" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate1996-06-30</th>\n", " <th>PopulationEstimate2001-06-30</th>\n", " <th>PopulationEstimate2006-06-30</th>\n", " <th>PopulationEstimate2013-06-30</th>\n", " <th>PopulationEstimate2018-06-30</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahaura</td>\n", " <td>Rural Settlement</td>\n", " <td>West Coast</td>\n", " <td>120</td>\n", " <td>140</td>\n", " <td>110</td>\n", " <td>100</td>\n", " <td>80</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Akaroa</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>680</td>\n", " <td>610</td>\n", " <td>620</td>\n", " <td>670</td>\n", " <td>630</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Alexandra</td>\n", " <td>Small Urban Area</td>\n", " <td>Otago</td>\n", " <td>4,690</td>\n", " <td>4,480</td>\n", " <td>4,940</td>\n", " <td>4,920</td>\n", " <td>5,510</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Allanton</td>\n", " <td>Rural Settlement</td>\n", " <td>Otago</td>\n", " <td>220</td>\n", " <td>240</td>\n", " <td>260</td>\n", " <td>260</td>\n", " <td>290</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Amberley</td>\n", " <td>Small Urban Area</td>\n", " <td>Canterbury</td>\n", " <td>1,050</td>\n", " <td>1,160</td>\n", " <td>1,340</td>\n", " <td>1,620</td>\n", " <td>1,800</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate1996-06-30 \\\n", "0 Ahaura Rural Settlement West Coast 120 \n", "1 Akaroa Rural Settlement Canterbury 680 \n", "2 Alexandra Small Urban Area Otago 4,690 \n", "3 Allanton Rural Settlement Otago 220 \n", "4 Amberley Small Urban Area Canterbury 1,050 \n", "\n", " PopulationEstimate2001-06-30 PopulationEstimate2006-06-30 \\\n", "0 140 110 \n", "1 610 620 \n", "2 4,480 4,940 \n", "3 240 260 \n", "4 1,160 1,340 \n", "\n", " PopulationEstimate2013-06-30 PopulationEstimate2018-06-30 \n", "0 100 80 \n", "1 670 630 \n", "2 4,920 5,510 \n", "3 260 290 \n", "4 1,620 1,800 " ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. EDA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For data analysis, let's take out unwanted columns from our dataframes" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [], "source": [ "north_island_data = north_island_data[['Name', 'Status','Region', 'PopulationEstimate2018-06-30']]\n", "south_island_data = south_island_data[['Name', 'Status','Region', 'PopulationEstimate2018-06-30']]" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018-06-30</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahipara</td>\n", " <td>Rural Settlement</td>\n", " <td>Northland</td>\n", " <td>1,180</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Algies Bay</td>\n", " <td>Rural Settlement</td>\n", " <td>Auckland</td>\n", " <td>980</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate2018-06-30\n", "0 Ahipara Rural Settlement Northland 1,180\n", "1 Algies Bay Rural Settlement Auckland 980" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.head(2)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018-06-30</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahaura</td>\n", " <td>Rural Settlement</td>\n", " <td>West Coast</td>\n", " <td>80</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Akaroa</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>630</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate2018-06-30\n", "0 Ahaura Rural Settlement West Coast 80\n", "1 Akaroa Rural Settlement Canterbury 630" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.1 North island" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(350, 4)" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Some data cleaning\n", "\n", "Population column values have \",\" comma in the values. Let's replace , and convert the column from string to integer type." ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name object\n", "Status object\n", "Region object\n", "PopulationEstimate2018-06-30 object\n", "dtype: object" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.dtypes" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [], "source": [ "north_island_data['PopulationEstimate2018-06-30'] = north_island_data['PopulationEstimate2018-06-30'].str.replace(',', '')" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018-06-30</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahipara</td>\n", " <td>Rural Settlement</td>\n", " <td>Northland</td>\n", " <td>1180</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Algies Bay</td>\n", " <td>Rural Settlement</td>\n", " <td>Auckland</td>\n", " <td>980</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Arapuni</td>\n", " <td>Rural Settlement</td>\n", " <td>Waikato</td>\n", " <td>260</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Ashhurst</td>\n", " <td>Small Urban Area</td>\n", " <td>Manawatu-Wanganui</td>\n", " <td>2990</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Athenree</td>\n", " <td>Rural Settlement</td>\n", " <td>Bay of Plenty</td>\n", " <td>740</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region \\\n", "0 Ahipara Rural Settlement Northland \n", "1 Algies Bay Rural Settlement Auckland \n", "2 Arapuni Rural Settlement Waikato \n", "3 Ashhurst Small Urban Area Manawatu-Wanganui \n", "4 Athenree Rural Settlement Bay of Plenty \n", "\n", " PopulationEstimate2018-06-30 \n", "0 1180 \n", "1 980 \n", "2 260 \n", "3 2990 \n", "4 740 " ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.head()" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [], "source": [ "north_island_data['PopulationEstimate2018-06-30'] = pd.to_numeric(north_island_data['PopulationEstimate2018-06-30'])" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name object\n", "Status object\n", "Region object\n", "PopulationEstimate2018-06-30 int64\n", "dtype: object" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Renaming population column" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahipara</td>\n", " <td>Rural Settlement</td>\n", " <td>Northland</td>\n", " <td>1180</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Algies Bay</td>\n", " <td>Rural Settlement</td>\n", " <td>Auckland</td>\n", " <td>980</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate2018\n", "0 Ahipara Rural Settlement Northland 1180\n", "1 Algies Bay Rural Settlement Auckland 980" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data = north_island_data.rename(columns={'PopulationEstimate2018-06-30': 'PopulationEstimate2018'})\n", "north_island_data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top 10 most populated places in north island" ] }, { "cell_type": "code", "execution_count": 152, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>6</th>\n", " <td>Auckland</td>\n", " <td>Main Urban Area</td>\n", " <td>Auckland</td>\n", " <td>1467800</td>\n", " </tr>\n", " <tr>\n", " <th>333</th>\n", " <td>Wellington</td>\n", " <td>Main Urban Area</td>\n", " <td>Wellington</td>\n", " <td>215400</td>\n", " </tr>\n", " <tr>\n", " <th>46</th>\n", " <td>Hamilton</td>\n", " <td>Main Urban Area</td>\n", " <td>Waikato</td>\n", " <td>169300</td>\n", " </tr>\n", " <tr>\n", " <th>267</th>\n", " <td>Tauranga</td>\n", " <td>Main Urban Area</td>\n", " <td>Bay of Plenty</td>\n", " <td>135000</td>\n", " </tr>\n", " <tr>\n", " <th>105</th>\n", " <td>Lower Hutt</td>\n", " <td>Main Urban Area</td>\n", " <td>Wellington</td>\n", " <td>104900</td>\n", " </tr>\n", " <tr>\n", " <th>198</th>\n", " <td>Palmerston North</td>\n", " <td>Large Urban Area</td>\n", " <td>Manawatu-Wanganui</td>\n", " <td>80300</td>\n", " </tr>\n", " <tr>\n", " <th>141</th>\n", " <td>Napier</td>\n", " <td>Large Urban Area</td>\n", " <td>Hawke's Bay</td>\n", " <td>62800</td>\n", " </tr>\n", " <tr>\n", " <th>219</th>\n", " <td>Porirua</td>\n", " <td>Large Urban Area</td>\n", " <td>Wellington</td>\n", " <td>55500</td>\n", " </tr>\n", " <tr>\n", " <th>143</th>\n", " <td>New Plymouth</td>\n", " <td>Large Urban Area</td>\n", " <td>Taranaki</td>\n", " <td>55300</td>\n", " </tr>\n", " <tr>\n", " <th>241</th>\n", " <td>Rotorua</td>\n", " <td>Large Urban Area</td>\n", " <td>Bay of Plenty</td>\n", " <td>54500</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region \\\n", "6 Auckland Main Urban Area Auckland \n", "333 Wellington Main Urban Area Wellington \n", "46 Hamilton Main Urban Area Waikato \n", "267 Tauranga Main Urban Area Bay of Plenty \n", "105 Lower Hutt Main Urban Area Wellington \n", "198 Palmerston North Large Urban Area Manawatu-Wanganui \n", "141 Napier Large Urban Area Hawke's Bay \n", "219 Porirua Large Urban Area Wellington \n", "143 New Plymouth Large Urban Area Taranaki \n", "241 Rotorua Large Urban Area Bay of Plenty \n", "\n", " PopulationEstimate2018 \n", "6 1467800 \n", "333 215400 \n", "46 169300 \n", "267 135000 \n", "105 104900 \n", "198 80300 \n", "141 62800 \n", "219 55500 \n", "143 55300 \n", "241 54500 " ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.sort_values(by='PopulationEstimate2018', ascending=False).head(10)" ] }, { "cell_type": "code", "execution_count": 179, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1224x432 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "top_10 = north_island_data.sort_values(by='PopulationEstimate2018', ascending=False).head(10)\n", "\n", "plt.figure(figsize=(17, 6))\n", "sns.barplot(x='Name', y='PopulationEstimate2018', data=top_10)\n", "plt.title(\"Top 10 most populated places in North Island NZ\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top 10 least populated places in north island" ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>20</th>\n", " <td>Castlepoint</td>\n", " <td>Rural Settlement</td>\n", " <td>Wellington</td>\n", " <td>50</td>\n", " </tr>\n", " <tr>\n", " <th>328</th>\n", " <td>Waitotara</td>\n", " <td>Rural Settlement</td>\n", " <td>Taranaki</td>\n", " <td>60</td>\n", " </tr>\n", " <tr>\n", " <th>234</th>\n", " <td>Raurimu</td>\n", " <td>Rural Settlement</td>\n", " <td>Manawatu-Wanganui</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>309</th>\n", " <td>Waiinu Beach</td>\n", " <td>Rural Settlement</td>\n", " <td>Taranaki</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>340</th>\n", " <td>Whangapoua</td>\n", " <td>Rural Settlement</td>\n", " <td>Waikato</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>Atiamuri</td>\n", " <td>Rural Settlement</td>\n", " <td>Waikato</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>184</th>\n", " <td>Ormondville</td>\n", " <td>Rural Settlement</td>\n", " <td>Manawatu-Wanganui</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>Baddeleys Beach - Campbells Beach</td>\n", " <td>Rural Settlement</td>\n", " <td>Auckland</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>324</th>\n", " <td>Waitangi</td>\n", " <td>Rural Settlement</td>\n", " <td>Northland</td>\n", " <td>80</td>\n", " </tr>\n", " <tr>\n", " <th>230</th>\n", " <td>Rainbows End</td>\n", " <td>Rural Settlement</td>\n", " <td>Auckland</td>\n", " <td>80</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region \\\n", "20 Castlepoint Rural Settlement Wellington \n", "328 Waitotara Rural Settlement Taranaki \n", "234 Raurimu Rural Settlement Manawatu-Wanganui \n", "309 Waiinu Beach Rural Settlement Taranaki \n", "340 Whangapoua Rural Settlement Waikato \n", "5 Atiamuri Rural Settlement Waikato \n", "184 Ormondville Rural Settlement Manawatu-Wanganui \n", "8 Baddeleys Beach - Campbells Beach Rural Settlement Auckland \n", "324 Waitangi Rural Settlement Northland \n", "230 Rainbows End Rural Settlement Auckland \n", "\n", " PopulationEstimate2018 \n", "20 50 \n", "328 60 \n", "234 70 \n", "309 70 \n", "340 70 \n", "5 70 \n", "184 70 \n", "8 70 \n", "324 80 \n", "230 80 " ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "north_island_data.sort_values(by='PopulationEstimate2018', ascending=True).head(10)" ] }, { "cell_type": "code", "execution_count": 178, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1944x432 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bottom_10 = north_island_data.sort_values(by='PopulationEstimate2018', ascending=True).head(10)\n", "\n", "plt.figure(figsize=(27, 6))\n", "sns.barplot(x='Name', y='PopulationEstimate2018', data=bottom_10)\n", "plt.title(\"Top 10 least populated places in North Island NZ\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Population counts grouped by Region" ] }, { "cell_type": "code", "execution_count": 270, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>sum</th>\n", " <th>count</th>\n", " </tr>\n", " <tr>\n", " <th>Region</th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>Auckland</th>\n", " <td>1612730</td>\n", " <td>59</td>\n", " </tr>\n", " <tr>\n", " <th>Bay of Plenty</th>\n", " <td>259400</td>\n", " <td>38</td>\n", " </tr>\n", " <tr>\n", " <th>Gisborne</th>\n", " <td>39320</td>\n", " <td>9</td>\n", " </tr>\n", " <tr>\n", " <th>Hawke's Bay</th>\n", " <td>140620</td>\n", " <td>21</td>\n", " </tr>\n", " <tr>\n", " <th>Manawatu-Wanganui</th>\n", " <td>199140</td>\n", " <td>50</td>\n", " </tr>\n", " <tr>\n", " <th>Northland</th>\n", " <td>109850</td>\n", " <td>58</td>\n", " </tr>\n", " <tr>\n", " <th>Taranaki</th>\n", " <td>93240</td>\n", " <td>22</td>\n", " </tr>\n", " <tr>\n", " <th>Waikato</th>\n", " <td>353070</td>\n", " <td>77</td>\n", " </tr>\n", " <tr>\n", " <th>Wellington</th>\n", " <td>498300</td>\n", " <td>16</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " sum count\n", "Region \n", "Auckland 1612730 59\n", "Bay of Plenty 259400 38\n", "Gisborne 39320 9\n", "Hawke's Bay 140620 21\n", "Manawatu-Wanganui 199140 50\n", "Northland 109850 58\n", "Taranaki 93240 22\n", "Waikato 353070 77\n", "Wellington 498300 16" ] }, "execution_count": 270, "metadata": {}, "output_type": "execute_result" } ], "source": [ "region_totals = north_island_data.groupby('Region')['PopulationEstimate2018'].agg(['sum', 'count'])\n", "region_totals" ] }, { "cell_type": "code", "execution_count": 271, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABAQAAAFzCAYAAACpeNsxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de7xdVX3v/c+3RG6lIJdobQIGBS9IW5UY8VoqNlCtQi1oeLzEFh+ecqjWtlrl9GmhIK3Uo7RqxXI05VKPSPFCbEtpCipeEAjXcBFJBSVCBQxSbAWF/s4fcyyysll7ZxOyL8n8vF+v9dpz/eYYY4215p5zrfWbY46VqkKSJEmSJPXLT810ByRJkiRJ0vQzISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMmBCRJkiRJ6iETApIkSZIk9dCcme7A5mC33XarBQsWzHQ3JEmSJEl6VK644oq7q2ruqHUmBCZhwYIFrFy5cqa7IUmSJEnSo5Lk2+Ot85IBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg9NWUIgybIkdya5bkz8rUluSnJ9kr8Yih+bZHVbd9BQfL8kq9q6DyZJi2+T5FMtfmmSBUN1lia5ud2WDsX3bGVvbnW3nqrnL0mSJEnSbDZnCts+HfgwcOYgkOSXgUOAX6iqB5I8ocX3AZYAzwJ+DvjXJE+rqoeAU4GjgK8D/wQcDJwPHAncU1V7JVkCnAy8LskuwHHAQqCAK5Isr6p7WplTqursJB9tbZy6qZ7wfu88c8OF9Khc8b43zXQXJEmSJGmLNGUjBKrqYmDtmPDRwHur6oFW5s4WPwQ4u6oeqKpbgNXAoiRPAnasqkuqquiSC4cO1TmjLZ8LHNhGDxwErKiqtS0JsAI4uK17WStLqztoS5IkSZKkXpnuOQSeBrykDdv/UpLntfg84LahcmtabF5bHhtfr05VPQjcC+w6QVu7Aj9oZce2JUmSJElSr0zlJQPjPd7OwP7A84BzkjwFyIiyNUGcjagzUVuPkOQouksV2GOPPcYrJkmSJEnSZmm6RwisAT5TncuA/wZ2a/Hdh8rNB25v8fkj4gzXSTIH2InuEoXx2robeHwrO7atR6iq06pqYVUtnDt37kY8VUmSJEmSZq/pTgh8ju46fpI8Ddia7ov6cmBJ++WAPYG9gcuq6g7gviT7tzkA3gSc19paDgx+QeAw4KI2z8AFwOIkOyfZGVgMXNDWfaGVpdUdtCVJkiRJUq9M2SUDST4JHADslmQN3cz/y4Bl7acIfwwsbV/Ur09yDnAD8CBwTPuFAegmIjwd2I7u1wXOb/GPA2clWU03MmAJQFWtTXIicHkrd0JVDSY3fBdwdpL3AFe1NiRJkiRJ6p0pSwhU1RHjrHrDOOVPAk4aEV8J7Dsifj9w+DhtLaNLPoyNfwtYNH6vJUmSJEnqh+m+ZECSJEmSJM0CJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMmBCRJkiRJ6iETApIkSZIk9ZAJAUmSJEmSesiEgCRJkiRJPWRCQJIkSZKkHjIhIEmSJElSD5kQkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB6asoRAkmVJ7kxy3Yh170hSSXYbih2bZHWSm5IcNBTfL8mqtu6DSdLi2yT5VItfmmTBUJ2lSW5ut6VD8T1b2Ztb3a2n6vlLkiRJkjSbTeUIgdOBg8cGk+wO/ArwnaHYPsAS4FmtzkeSbNVWnwocBezdboM2jwTuqaq9gFOAk1tbuwDHAc8HFgHHJdm51TkZOKWq9gbuaW1IkiRJktQ7U5YQqKqLgbUjVp0C/CFQQ7FDgLOr6oGqugVYDSxK8iRgx6q6pKoKOBM4dKjOGW35XODANnrgIGBFVa2tqnuAFcDBbd3LWlla3UFbkiRJkiT1yrTOIZDk1cB3q+qaMavmAbcN3V/TYvPa8tj4enWq6kHgXmDXCdraFfhBKzu2rVF9PSrJyiQr77rrrkk/R0mSJEmSNgfTlhBIsj3wR8CfjFo9IlYTxDemzkRtPXJF1WlVtbCqFs6dO3e8YpIkSZIkbZamc4TAU4E9gWuS3ArMB65M8rN0Z+t3Hyo7H7i9xeePiDNcJ8kcYCe6SxTGa+tu4PGt7Ni2JEmSJEnqlWlLCFTVqqp6QlUtqKoFdF/cn1tV/w4sB5a0Xw7Yk27ywMuq6g7gviT7tzkA3gSc15pcDgx+QeAw4KI2z8AFwOIkO7fJBBcDF7R1X2hlaXUHbUmSJEmS1CtT+bODnwQuAZ6eZE2ScWf0r6rrgXOAG4B/Bo6pqofa6qOBj9FNNPhvwPkt/nFg1ySrgd8H3t3aWgucCFzebie0GMC7gN9vdXZtbUiSJEmS1DtzNlxk41TVERtYv2DM/ZOAk0aUWwnsOyJ+P3D4OG0vA5aNiH+L7qcIJUmSJEnqtWn9lQFJkiRJkjQ7mBCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMmBCRJkiRJ6iETApIkSZIk9ZAJAUmSJEmSesiEgCRJkiRJPWRCQJIkSZKkHjIhIEmSJElSD5kQkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPXQlCUEkixLcmeS64Zi70vyjSTXJvlskscPrTs2yeokNyU5aCi+X5JVbd0Hk6TFt0nyqRa/NMmCoTpLk9zcbkuH4nu2sje3ultP1fOXJEmSJGk2m8oRAqcDB4+JrQD2rapfAL4JHAuQZB9gCfCsVucjSbZqdU4FjgL2brdBm0cC91TVXsApwMmtrV2A44DnA4uA45Ls3OqcDJxSVXsD97Q2JEmSJEnqnSlLCFTVxcDaMbF/qaoH292vA/Pb8iHA2VX1QFXdAqwGFiV5ErBjVV1SVQWcCRw6VOeMtnwucGAbPXAQsKKq1lbVPXRJiIPbupe1srS6g7YkSZIkSeqVmZxD4LeA89vyPOC2oXVrWmxeWx4bX69OSzLcC+w6QVu7Aj8YSkgMtyVJkiRJUq/MSEIgyR8BDwKfGIRGFKsJ4htTZ6K2RvXxqCQrk6y86667xismSZIkSdJmadoTAm2Sv18DXt8uA4DubP3uQ8XmA7e3+PwR8fXqJJkD7ER3icJ4bd0NPL6VHdvWI1TVaVW1sKoWzp0799E+TUmSJEmSZrVpTQgkORh4F/DqqvqvoVXLgSXtlwP2pJs88LKqugO4L8n+bQ6ANwHnDdUZ/ILAYcBFLcFwAbA4yc5tMsHFwAVt3RdaWVrdQVuSJEmSJPXKnA0X2ThJPgkcAOyWZA3dzP/HAtsAK9qvB369qn67qq5Pcg5wA92lBMdU1UOtqaPpfrFgO7o5BwbzDnwcOCvJarqRAUsAqmptkhOBy1u5E6pqMLnhu4Czk7wHuKq1IUmSJElS70xZQqCqjhgRHvcLeFWdBJw0Ir4S2HdE/H7g8HHaWgYsGxH/Ft1PEUqSJEmS1Gsz+SsDkiRJkiRphpgQkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMmBCRJkiRJ6iETApIkSZIk9ZAJAUmSJEmSesiEgCRJkiRJPWRCQJIkSZKkHjIhIEmSJElSD5kQkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6aMoSAkmWJbkzyXVDsV2SrEhyc/u789C6Y5OsTnJTkoOG4vslWdXWfTBJWnybJJ9q8UuTLBiqs7Q9xs1Jlg7F92xlb251t56q5y9JkiRJ0mw2lSMETgcOHhN7N3BhVe0NXNjuk2QfYAnwrFbnI0m2anVOBY4C9m63QZtHAvdU1V7AKcDJra1dgOOA5wOLgOOGEg8nA6e0x7+ntSFJkiRJUu9MWUKgqi4G1o4JHwKc0ZbPAA4dip9dVQ9U1S3AamBRkicBO1bVJVVVwJlj6gzaOhc4sI0eOAhYUVVrq+oeYAVwcFv3slZ27ONLkiRJktQr0z2HwBOr6g6A9vcJLT4PuG2o3JoWm9eWx8bXq1NVDwL3ArtO0NauwA9a2bFtPUKSo5KsTLLyrrvuepRPU5IkSZKk2W22TCqYEbGaIL4xdSZq65Erqk6rqoVVtXDu3LnjFZMkSZIkabM03QmB77XLAGh/72zxNcDuQ+XmA7e3+PwR8fXqJJkD7ER3icJ4bd0NPL6VHduWJEmSJEm9Mt0JgeXAYNb/pcB5Q/El7ZcD9qSbPPCydlnBfUn2b3MAvGlMnUFbhwEXtXkGLgAWJ9m5TSa4GLigrftCKzv28SVJkiRJ6pU5Gy6ycZJ8EjgA2C3JGrqZ/98LnJPkSOA7wOEAVXV9knOAG4AHgWOq6qHW1NF0v1iwHXB+uwF8HDgryWq6kQFLWltrk5wIXN7KnVBVg8kN3wWcneQ9wFWtDUmSJEmSemfKEgJVdcQ4qw4cp/xJwEkj4iuBfUfE76clFEasWwYsGxH/Ft1PEUqSJEmS1GuzZVJBSZIkSZI0jUwISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg/NmUyhJFsBrwQWDNepqg9MTbckSZIkSdJUmlRCAPg8cD+wCvjvqeuOJEmSJEmaDpNNCMyvql+Y0p5IkiRJkqRpM9k5BM5PsnhKeyJJkiRJkqbNZEcIfB34bJKfAn4CBKiq2nHKeiZJkiRJkqbMZBMC7wdeAKyqqprC/kiSJEmSpGkw2UsGbgauMxkgSZIkSdKWYbIjBO4AvpjkfOCBQdCfHZQkSZIkafM02YTALe22dbtJkiRJkqTN2KQSAlX1p1PdEUmSJEmSNH0mlRBI8gXgEfMHVNXLNnmPJEmSJEnSlJvsJQPvGFreFvgN4MFN3x1JkiRJkjQdJnvJwBVjQl9N8qUp6I8kSZIkSZoGk/rZwSS7DN12S3Iw8LMb+6BJfi/J9UmuS/LJJNu2tlckubn93Xmo/LFJVie5KclBQ/H9kqxq6z6YJC2+TZJPtfilSRYM1VnaHuPmJEs39jlIkiRJkrQ5m1RCALgCWNn+fg34feDIjXnAJPOAtwELq2pfYCtgCfBu4MKq2hu4sN0nyT5t/bOAg4GPJNmqNXcqcBSwd7sd3OJHAvdU1V7AKcDJra1dgOOA5wOLgOOGEw+SJEmSJPXFZBMC7wKeXVV7AmcB/wn812N43DnAdknmANsDtwOHAGe09WcAh7blQ4Czq+qBqroFWA0sSvIkYMequqSqCjhzTJ1BW+cCB7bRAwcBK6pqbVXdA6xgXRJBkiRJkqTemGxC4P+vqv9I8mLgV4DT6c7OP2pV9V3gfwHfAe4A7q2qfwGeWFV3tDJ3AE9oVeYBtw01sabF5rXlsfH16lTVg8C9wK4TtCVJkiRJUq9MNiHwUPv7SuCjVXUesPXGPGAbon8IsCfwc8BPJ3nDRFVGxGqC+MbWGdvPo5KsTLLyrrvumqB7kiRJkiRtfiabEPhukr8BXgv8U5JtHkXdsV4O3FJVd1XVT4DPAC8EvtcuA6D9vbOVXwPsPlR/Pt0lBmva8tj4enXaZQk7AWsnaOsRquq0qlpYVQvnzp27kU9VkiRJkqTZabJf6l8LXAAcXFU/AHYB3rmRj/kdYP8k27fr+g8EbgSWA4NZ/5cC57Xl5cCS9ssBe9JNHnhZu6zgviT7t3beNKbOoK3DgIvaPAMXAIuT7NxGKixuMUmSJEmSemXOZApV1X/Rnckf3L+D7vr/R62qLk1yLnAl8CBwFXAasANwTpIj6ZIGh7fy1yc5B7ihlT+mqgaXMBxNN5/BdsD57QbwceCsJKvpRgYsaW2tTXIicHkrd0JVrd2Y5yFJkiRJ0uYs3YlzTWThwoW1cuXKDZbb751nTkNv+uWK971pprsgSZIkSZutJFdU1cJR6zZ2HgBJkiRJkrQZMyEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMmBCRJkiRJ6iETApIkSZIk9ZAJAUmSJEmSesiEgCRJkiRJPWRCQJIkSZKkHjIhIEmSJElSD5kQkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPXQjCQEkjw+yblJvpHkxiQvSLJLkhVJbm5/dx4qf2yS1UluSnLQUHy/JKvaug8mSYtvk+RTLX5pkgVDdZa2x7g5ydLpfN6SJEmSJM0WMzVC4K+Af66qZwC/CNwIvBu4sKr2Bi5s90myD7AEeBZwMPCRJFu1dk4FjgL2breDW/xI4J6q2gs4BTi5tbULcBzwfGARcNxw4kGSJEmSpL6Y9oRAkh2BlwIfB6iqH1fVD4BDgDNasTOAQ9vyIcDZVfVAVd0CrAYWJXkSsGNVXVJVBZw5ps6grXOBA9vogYOAFVW1tqruAVawLokgSZIkSVJvzMQIgacAdwF/m+SqJB9L8tPAE6vqDoD29wmt/DzgtqH6a1psXlseG1+vTlU9CNwL7DpBW4+Q5KgkK5OsvOuuuzb2uUqSJEmSNCvNREJgDvBc4NSqeg7wn7TLA8aREbGaIL6xddYPVp1WVQurauHcuXMn6J4kSZIkSZufmUgIrAHWVNWl7f65dAmC77XLAGh/7xwqv/tQ/fnA7S0+f0R8vTpJ5gA7AWsnaEuSJEmSpF6Z9oRAVf07cFuSp7fQgcANwHJgMOv/UuC8trwcWNJ+OWBPuskDL2uXFdyXZP82P8CbxtQZtHUYcFGbZ+ACYHGSndtkgotbTJIkSZKkXpkzQ4/7VuATSbYGvgX8Jl1y4pwkRwLfAQ4HqKrrk5xDlzR4EDimqh5q7RwNnA5sB5zfbtBNWHhWktV0IwOWtLbWJjkRuLyVO6Gq1k7lE5UkSZIkaTaakYRAVV0NLByx6sBxyp8EnDQivhLYd0T8flpCYcS6ZcCyR9NfSZIkSZK2NDMxh4AkSZIkSZphJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMmBCRJkiRJ6iETApIkSZIk9ZAJAUmSJEmSesiEgCRJkiRJPWRCQJIkSZKkHpoz0x2QJEmSJG35zvn7RTPdhS3Kaw+/7DG34QgBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMzlhBIslWSq5L8Q7u/S5IVSW5uf3ceKntsktVJbkpy0FB8vySr2roPJkmLb5PkUy1+aZIFQ3WWtse4OcnS6XvGkiRJkiTNHjM5QuB3gRuH7r8buLCq9gYubPdJsg+wBHgWcDDwkSRbtTqnAkcBe7fbwS1+JHBPVe0FnAKc3NraBTgOeD6wCDhuOPEgSZIkSVJfzEhCIMl84JXAx4bChwBntOUzgEOH4mdX1QNVdQuwGliU5EnAjlV1SVUVcOaYOoO2zgUObKMHDgJWVNXaqroHWMG6JIIkSZIkSb0xUyME/hL4Q+C/h2JPrKo7ANrfJ7T4POC2oXJrWmxeWx4bX69OVT0I3AvsOkFbkiRJkiT1yrQnBJL8GnBnVV0x2SojYjVBfGPrrP+gyVFJViZZedddd02qo5IkSZIkbS5mYoTAi4BXJ7kVOBt4WZK/A77XLgOg/b2zlV8D7D5Ufz5we4vPHxFfr06SOcBOwNoJ2nqEqjqtqhZW1cK5c+du3DOVJEmSJGmWmvaEQFUdW1Xzq2oB3WSBF1XVG4DlwGDW/6XAeW15ObCk/XLAnnSTB17WLiu4L8n+bX6AN42pM2jrsPYYBVwALE6yc5tMcHGLSZIkSZLUK3NmugND3guck+RI4DvA4QBVdX2Sc4AbgAeBY6rqoVbnaOB0YDvg/HYD+DhwVpLVdCMDlrS21iY5Ebi8lTuhqtZO9ROTJEmSJGm2mdGEQFV9EfhiW/4+cOA45U4CThoRXwnsOyJ+Py2hMGLdMmDZxvZZkiRJkqQtwUz9yoAkSZIkSZpBJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknpoNv3KgCRJkiQ9ascff/xMd2GL4uvZH44QkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYfmzHQHpOn2nRN+fqa7sEXZ409WzXQXJEmSJG0ERwhIkiRJktRDJgQkSZIkSeohEwKSJEmSJPWQCQFJkiRJknrIhIAkSZIkST1kQkCSJEmSpB4yISBJkiRJUg9Ne0Igye5JvpDkxiTXJ/ndFt8lyYokN7e/Ow/VOTbJ6iQ3JTloKL5fklVt3QeTpMW3SfKpFr80yYKhOkvbY9ycZOn0PXNJkiRJkmaPmRgh8CDwB1X1TGB/4Jgk+wDvBi6sqr2BC9t92rolwLOAg4GPJNmqtXUqcBSwd7sd3OJHAvdU1V7AKcDJra1dgOOA5wOLgOOGEw+SJEmSJPXFtCcEquqOqrqyLd8H3AjMAw4BzmjFzgAObcuHAGdX1QNVdQuwGliU5EnAjlV1SVUVcOaYOoO2zgUObKMHDgJWVNXaqroHWMG6JIIkSZIkSb0xZyYfvA3lfw5wKfDEqroDuqRBkie0YvOArw9VW9NiP2nLY+ODOre1th5Mci+w63B8RJ2xfTuKbvQBe+yxx0Y9P0mSZtKH/+DzM92FLcrvvP9VM90FSZI2qRmbVDDJDsCngbdX1X9MVHRErCaIb2yd9YNVp1XVwqpaOHfu3Am6J0mSJEnS5mdGEgJJHkeXDPhEVX2mhb/XLgOg/b2zxdcAuw9Vnw/c3uLzR8TXq5NkDrATsHaCtiRJkiRJ6pWZ+JWBAB8HbqyqDwytWg4MZv1fCpw3FF/SfjlgT7rJAy9rlxfcl2T/1uabxtQZtHUYcFGbZ+ACYHGSndtkgotbTJIkSZKkXpmJOQReBLwRWJXk6hb7n8B7gXOSHAl8BzgcoKquT3IOcAPdLxQcU1UPtXpHA6cD2wHntxt0CYezkqymGxmwpLW1NsmJwOWt3AlVtXaqnqgkSZIkSbPVtCcEquorjL6WH+DAceqcBJw0Ir4S2HdE/H5aQmHEumXAssn2V5IkSZKkLdGMTSooSZIkSZJmjgkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHpv1nByVJW4YvvfSXZroLW5RfuvhLM90FSZLUM44QkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yF8ZkCRJmiEnveGwme7CFueP/u7cme6CJG02HCEgSZIkSVIPmRCQJEmSJKmHvGRAkiRJmsCNJ100013Yojzzj142012Q1DhCQJIkSZKkHjIhIEmSJElSD5kQkCRJkiSph0wISJIkSZLUQyYEJEmSJEnqIRMCkiRJkiT1kAkBSZIkSZJ6yISAJEmSJEk9ZEJAkiRJkqQemjPTHZCksV70oRfNdBe2KF9961dnuguSJEmahRwhIEmSJElSD5kQkCRJkiSph0wISJIkSZLUQ71MCCQ5OMlNSVYnefdM90eSJEmSpOnWu4RAkq2AvwZ+FdgHOCLJPjPbK0mSJEmSplfvEgLAImB1VX2rqn4MnA0cMsN9kiRJkiRpWvUxITAPuG3o/poWkyRJkiSpN1JVM92HaZXkcOCgqnpLu/9GYFFVvXVMuaOAo9rdpwM3TWtHp95uwN0z3QlNyG00+7mNNg9up9nPbbR5cDvNfm6j2c9ttHnY0rbTk6tq7qgVc6a7J7PAGmD3ofvzgdvHFqqq04DTpqtT0y3JyqpaONP90PjcRrOf22jz4Haa/dxGmwe30+znNpr93Eabhz5tpz5eMnA5sHeSPZNsDSwBls9wnyRJkiRJmla9GyFQVQ8m+R3gAmArYFlVXT/D3ZIkSZIkaVr1LiEAUFX/BPzTTPdjhm2xl0NsQdxGs5/baPPgdpr93EabB7fT7Oc2mv3cRmElBH8AABRQSURBVJuH3myn3k0qKEmSJEmS+jmHgCRJkiRJvWdCYJZL8utJKskzNrL+m5N8eET8+CTveOw9hCSnJzlsU7Q1myR5KMnVSa5JcmWSF07x470tyY1JPjEmfkCSe5Nc1dYfNxT/h8fweP/zsfZ5c5LkiUn+T5JvJbkiySVt/1qY5IMT1HtMr7M6SX445v7IY9OjbPNRtdHKHz/JsguS/GjoGPC1JE/f6M5uYu194ayh+3OS3DWb/leTHJpkn0dZ5xeTXD10/4gk/5Xkce3+zye5dlP3dVNK8rUZfOxK8v6h+++Y7P/8UJ0Dht/vxnuPb/vIdY+pw+s/5qz5350OSXZtx5erk/x7ku8O3d96mvvy8iSfGxH/9STvnM6+zEZJTkny9qH7FyT52ND99yf5/Qnqf639fVT/50meneQVG9vvLdEm2BY/bH8fPn5t6HPgJPr05iQ/t7H1ZwMTArPfEcBX6H4NQdPrR1X17Kr6ReBY4M+n+PH+B/CKqnr9iHVfrqrnAAuBNyTZbxM8Xm8SAkkCfA64uKqeUlX70e1T86tqZVW9bQofe6upaltT7t+GjgFnMLv2mf8E9k2yXbv/K8B3Z7A/oxwKPKqEALAKeHKSn2n3Xwh8A3jO0P2vbpruTY2qmtLk8QY8ALwmyW4bUznJHOAAutdZU6iqvt+OL88GPgqcMrhfVT/eUP3peG+pqs9W1fum+nE2A1+j7RNJforu9+mfNbR+wuPSYzgmPBswIbC+x7QtRtkEnwPfDJgQ0NRIsgPwIuBIWkJgbHYxyYeTvLktP6+dxbomyWVDH6gGZV/ZzoruNib+/ya5vNX7dJLtW/z0JB9sbX5rcIYgnQ8nuSHJPwJPmMKXYbbYEbgHuu2S5MJ0owZWJTmkxU9M8ruDCklOSvKIA0yS309yXbu9vcU+CjwFWJ7k98brRFX9J3AF8NQxbf50kmVtO1411Kc3J/lMkn9OcnOSv2jx9wLbtTMRn5hs3zdjLwN+XFUfHQSq6ttV9aHhfSrJLw2doblqaB/aMcln2//8R9ub0ODs5aq2LU8etJ3kh0lOSHIp8IIktyb506H/mWe0ciO3W98keVWSS9tr8K9Jntjiq5I8vh1zvp/kTS1+VpKXj2nj4eNbkrntWHZ5u72oFfsRMDg7cHjbbtckuXgS3Rw+BixI8uW2PR8ePdT69fA2bPvWqx/r6zOB84FXtuUjgE8OPfaiduy+KkOjG8Y7JrR1pyZZmeT6JH861M5n2vIh6UZNbJ1k2yTfavFHvIe01+TVwPva/vTUJF9MsrDV2S3JrWOfUFX9N93PAz+/hfYD/pp1X1BfSPeBkCR/0h73uiSnJUmLfzHJyeneB7+Z5CUtvn2Sc5Jcm+RT7X9u0J9HPPcWH2/fXW+UXevDgra83miYafYg3URYj3gfSfLkdO9d17a/e7T46Uk+kOQLwKeA3wZ+r223l7TqL82YzwJj2h5vnzigbY9zk3yj7ROD7XRwi30FeM2UvBqbqSSfTzeS7fokb2mxOUl+kOQ9SS4DFrX/zcE+8NGh1/YrSd7b9oGbhrbHU9t2uqq1//wRj/38tg0XJHlLkr+c1ic/O32VdcegZwHXAfcl2TnJNsAzgRsz4rMhjD4mpPvMflWSp2TE8TrdKJETgNe1ffF1SXZJ8rm2D389yS9M/VOfdSazLa5K8s62b1w7fEwfJet/Djw+3eeyL7bj3duGyv1xO2atSPLJdCOwDqM7WfeJtp22S3Jg25arWlvbtPoj309mharyNktvwBuAj7flrwHPpcvc/8NQmQ/TZaa2Br4FPK/Fd6T7FYk3tzK/DnwZ2LmtPx54R1vedai99wBvbcunA39PlzjaB1jd4q8BVtD9bOPPAT8ADpvp12sKXv+HgKvpzk7dC+zX4nOAHdvybsBqIMAC4MoW/yng34Zf2xbfj+4M2E8DOwDXA89p624FdhvRj4e3ObBrK/esMfE/A97Qlh8PfLM9xpvb/8VOwLbAt4HdW7kfDj3GBvu+Od+At9GdfRm1bvh1/Dzwora8Q9vWBwD30yVstmr/+4e1//3vAHNbuYuAQ1vdAl479Bi3Du1X/wP42ETbbaZfrynaBoP9aXD7DvDhtm5n1k1y+xbg/W35o3RfePel+5L4v1v85rZ93szo49v/AV7clvcAbhzRn1XAvMFrP2L9AroEwtVtf7gD2KOt2x7Yti3vDaxsy78EfK4t7wTcAsyZotfzh8AvAOe2ffvqMf/LOw4eG3g58Om2/GbGPybs0v5uBXyxtT8HuKXF/1fbDi9qz/WTLT7Re8hhQ+u+CCxsy7sBt47z3I4H/oTuGPYVugToOUPb/inD/W3LZwGvGnqcwf/QK4B/bcvvAP6mLe9L9+V54XjPfQP77vG099B2/zpgwWDbzOB+9sO27W9t2/gdwPFt3eeBpW35t4b+V08H/gHYapzndjqjPwssAK7bwD5xAN375/xW/xLgxXT/e7e1sgHOYeizTd9uI17zwf/j9sANdMfIOXTvLa8ZUS50CcFfbfe/Apzcll8N/POI7fQM4NK2/HK6UXQvAVbSjZ6D7nj8lzP9+syGW9un9gD+P7qk2Yl0x5cXARczzmfDdv+H7e8BbV97Id3JncF7ykTH6w8P9eFDwHFt+WXA1TP9uszSbbGYLjGadtz5B+ClY7bF8PHrANa9dx5P951rm7Ydvw88ju5L/9XAdsDP0L0XDb5HfZF17yWDY9vT2v0zgbcP9fsR7yez4dbLnx3cjBwBDDKzZ7f7/zhO2acDd1TV5QBV9R8ALVn8y3T/yIsH8TH2TfIeui8kOwAXDK37XHVnbG5IO2sHvJTug+BDwO1JLtrI5zfb/ai6oXwkeQFwZpJ96Q4wf5bkpcB/A/OAJ1bVrenOYj4HeCJwVVV9f0ybLwY+W92ZftKdeXsJcNUG+vKSJFe1x3tvVV2f5ICh9YuBV2fdGatt6Q6WABdW1b3t8W4Ankx3sHrYJPu+xUjy13Tb4sfA8PWRXwU+kG4eh89U1Zq2D11WVYOzoZ9sdX8CfLGq7mrxT9DtG5+j+/L76TEP+5n29wrWnQ0bb7vduIme6mzy8P4E3ZlquuMSdF8WPpXkSXTJzVta/Mt0r+m3gVOBo5LMA9ZW1Q8nOL69HNinrYduhMfPVNV9Q/35KnB6knNYt23G+rehY8Dr6D5gHEz34eDDSZ5Nt62fBlBVX0ry10meQLeNP11VDz6aF+nRqKpr052VPoJH/pTuTsAZSfam+xLxuKF14x0TXpvkKLoPtk8C9mmPsTrJM4FFwAfotslWdNsHJn4P2RhfBf6gtX95Vf1bkr2SzAV2GOyLwC8n+UO6Lzm70CVYP9/WDe9vC9ryi4G/Aqiq67L+XASPeO7AYP2ofXfWqqr/SHImXSL0R0OrXsC6/p8F/MXQur9v7+njGfVZYNjIfaK5rKrWAKSbH2IBXeLilqq6ucX/Djhqkk+xD34v60YXzadLil1N95712aFyB6a7xn9bui8vV9CNHILR+8A2dNvpF+kSYsOjDfcFPgL8SlX9+yZ9NluGwZnpF9IdB+e15XvpvkCO/GwIjH0tn0n3XrK4qm5vsYmO18NeDPwGQFVdlG4eip0Gx/Me2dC2WNxug8/WO9AlHyczGhDgH6vqAeCBJHfSbccXA+dV1Y+gG8UzTt2n0x3bvtnunwEcw7rvc7Py/cSEwCyVZFe67N++SYruw1cBy1n/Uo9tB1Xa+lG+RXd282l0md+xTqc7s3lN+5B+wNC6B4a7NbTcq9+rrKrBpRZz6bKQc+lGDPwk3bDXwXb4GF1G92eBZSOayojYZHy5qn5tgvUBfqOqblov2A0HHN6GDzH+fr+hvm/Orqe9iQJU1TFte663P1TVe9NdBvMK4OtZNyx97P97MfG2vH/Eh+vBdhjeBiO3Ww99CPhAVS1via7jW/xiujfSPYA/ohsJcBjrvojC6OPbTwEvGLxxj1JVv932j1cCVyd59gaSYMuBv23Lvwd8D/jF9lj3D5U7C3g93WVevzVBe5vKcroz9wfQjSAaOBH4QlX9eksafHFo3SOOCUn2pDub/LyquifJ6aw7rn0Z+FW6JNi/0r1nbNXKw8TvIcMeZN3716Btkvwt3RwBt1fVK4CvA8+j+wB2SSu2hu41HVwusC3dl5eFVXVbuonzHm6T8fe3R9jAcx+vreHnwpjys8FfAley7n92lOHj2n9uoL3xPgsMTLRPjPce1KvPEZPV3ndeCuxfVT9Kd0nF4P/rR9VOL6a7vPPDwHOr6rstKbeh/9s/oEv+vYHuS+fwUPbb6UblPBv4503+xDZ/g2vXf55uRNBtdK/nf9B9Zno94382HHZHiz+H7jWHiY/Xw0bte33cjza0LQ4A/ryq/mYj2x91zJrs5/cNlRu1X8445xCYvQ4DzqyqJ1fVgqranXVnzfZJsk2SnYADW+wbwM8leR5Akp9JNzkQdGfXXkN3hnt44o2BnwHuSDeL86gJ7ca6GFiSZKt2Ru+XN+oZbkbadT5b0Q0d2gm4sx3wf5nu7NrAZ+nOID6P0WfJLgYOTXct60+zbqjzY3UB8Nbk4esHn7OB8gA/adt8YEN935xdBGyb5Oih2PZjCyV5alWtqqqT6b5cDq7vWpRkz3RzB7yObjjmpcAvpbsWeiu6s7RfepT92pjttiXaiXUT4i0dBKvqNrqzXnu3s8JfofviNrzPjDq+/QvwO4MC7azletq2vrSq/gS4G9h9A318Md2lA4P+3tHOmL6R7tgwcDrw9tb/6zfQ5qawDDihqlaNiQ+/pm+eRDs70n0pvLedAf7VoXUX0z2nS9qImF3p9o3B8xvvPeS+tm7gVrrLpqB7jwOgqn6zuonUXtHu30f3Ae/NrEsIXNL6MJjBf/BB++508+1M5pduvgK8FiDdrx/8/CSe+3hupbuMjyTPBfacRJ1pU1Vr6YbhHzkU/hrrJih+Pd3rMcrY7TYZE+0To3wD2DPJ4Az1EY/y8bZkO9GNgvpRO6Y9b5xy29Gdib473Xw3vzFOubFt39GSCktZ/8vLWuDXgL/IurkjtM5X6V6ftVX1UNvHHk838uYSJv5sOOwHdInoPxsa6Tne8Xrsvngx7Rjb6t49zsjfLd2GtsUFwG+19waSzGsj9x6LrwCvSjd/zg6sm78H1t9O3wAWJNmr3X8jj/6z4bQzITB7HcH6w8KgG4L8/9C9yV8LfII2HKa6GWlfB3woyTV01zk/nJlsZyBfD/z90BvwwB/TfblZQfePvCGfpbt2ZhXdMN5Z/4++kQaT7l1NN9HS0nbW9xPAwiQr6V7Th1+zth2+QHe96yOGX1bVlXRfGC6je80/VlUbulxgMk6ky/Zfm+5nVE6cRJ3TWvlPTKbvm7P24edQui/wt6SbkOkM4F1jir49baI5uqG2g6GXlwDvpctE30J32ccddL8+8QXgGro5GM57lF3bmO22JTqe7tj0Zbov58MupZtbAbpEwDzGfJEZcXx7G90+em0bEv/bIx7zfWkTQtJ9yLpmRJmntmPANXTzPbylxT8CLE3ydbqRCQ+fXa2q79Fd8jHRmdlNpqrWVNVfjVj1F8CfJ/kqG/5yRlVdQ/d+cj1dkmF4luZL6YZMDoZbXgtcOzhTyfjvIWcD70w3udJT6UYyHJ3uJ7g2NAv+V4FtWlIIun3wKbSEQFX9APjfdO9Dn6Ob22BDPgLMbZcKvKs9j3s38NzH82lgl/b+cDTr/kdnk/ez/uv8NuA32/N/I/C7I2t1l138etafVHBDxt0nRqmq++kuEfjHdgb825N8nD74R2D7dtz5E7p96xHaiKYz6N6XPjteuTE+DLylbacns/6ZUNr72quBv0mbcFMPW0W3P319TOzeqrqbCT4bjtXeJ14F/HUbqTbe8foLdCcBr0532drx7TGupftMspR+mnBbVNW/0M0ldEmSVXRz7TzaJOd6qrskezndZ4XP0J00GlyqcTrw0fZ+EOA36T6PrKJL2n30EQ3OMln3fi7psWpnkK8EDh9cG7m52Jz7Ls0WbRjvKrphvH27rnNWayN5HldV97cExYV0Ez9t8CfeJEn9lmSHNn/R9nQJ8qPaib7NniMEpE2kDUFdTTdh12b1hXpz7rs0W7Rrf78BfMhkwKy0PfCVdub1s8DRJgMkSZN0WhsFcCXdpMFbRDIAHCEgSZIkSVIvOUJAkiRJkqQeMiEgSZIkSVIPmRCQJEmSJKmHTAhIkqTHJMlD7aexrkvy+SSPfwxtndAmaJQkSVPMSQUlSdJjkuSHVbVDWz4D+GZVnTTD3ZIkSRvgCAFJkrQpXQLMG9xJ8s4klye5NsmfDsX/OMk3kqxI8skk72jx05Mc1pYPTHJVklVJliXZpsVvTfKnSa5s654xzc9RkqQtggkBSZK0SSTZCjgQWN7uLwb2BhYBzwb2S/LSJAuB3wCeA7wGWDiirW2B04HXVdXPA3OAo4eK3F1VzwVOBd4xVc9JkqQtmQkBSZL0WG2X5Grg+8AuwIoWX9xuVwFXAs+gSxC8GDivqn5UVfcBnx/R5tOBW6rqm+3+GcBLh9Z/pv29Aliw6Z6KJEn9YUJAkiQ9Vj+qqmcDTwa2Bo5p8QB/XlXPbre9qurjLb4hGyrzQPv7EN3oAUmS9CiZEJAkSZtEVd0LvA14R5LHARcAv5VkMOHgvCRPAL4CvCrJtm3dK0c09w1gQZK92v03Al+a8ichSVKPmFGXJEmbTFVdleQaYElVnZXkmcAlSQB+CLyhqi5Pshy4Bvg2sBK4d0w79yf5TeDvk8wBLgc+Op3PRZKkLZ0/OyhJkqZdkh2q6odJtgcuBo6qqitnul+SJPWJIwQkSdJMOC3JPsC2wBkmAyRJmn6OEJAkSZIkqYecVFCSJEmSpB4yISBJkiRJUg+ZEJAkSZIkqYdMCEiSJEmS1EMmBCRJkiRJ6iETApIkSZIk9dD/BaYGRX4+no7LAAAAAElFTkSuQmCC\n", "text/plain": [ "<Figure size 1224x432 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(17, 6))\n", "sns.barplot(x=region_totals.index, y='sum', data=region_totals)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.2 South island" ] }, { "cell_type": "code", "execution_count": 272, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(227, 4)" ] }, "execution_count": 272, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Some data cleaning\n", "\n", "Population column values have \",\" comma in the values. Let's replace , and convert the column from string to integer type." ] }, { "cell_type": "code", "execution_count": 273, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name object\n", "Status object\n", "Region object\n", "PopulationEstimate2018-06-30 object\n", "dtype: object" ] }, "execution_count": 273, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.dtypes" ] }, { "cell_type": "code", "execution_count": 274, "metadata": {}, "outputs": [], "source": [ "south_island_data['PopulationEstimate2018-06-30'] = south_island_data['PopulationEstimate2018-06-30'].str.replace(',', '')" ] }, { "cell_type": "code", "execution_count": 275, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018-06-30</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahaura</td>\n", " <td>Rural Settlement</td>\n", " <td>West Coast</td>\n", " <td>80</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Akaroa</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>630</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Alexandra</td>\n", " <td>Small Urban Area</td>\n", " <td>Otago</td>\n", " <td>5510</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Allanton</td>\n", " <td>Rural Settlement</td>\n", " <td>Otago</td>\n", " <td>290</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Amberley</td>\n", " <td>Small Urban Area</td>\n", " <td>Canterbury</td>\n", " <td>1800</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate2018-06-30\n", "0 Ahaura Rural Settlement West Coast 80\n", "1 Akaroa Rural Settlement Canterbury 630\n", "2 Alexandra Small Urban Area Otago 5510\n", "3 Allanton Rural Settlement Otago 290\n", "4 Amberley Small Urban Area Canterbury 1800" ] }, "execution_count": 275, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.head()" ] }, { "cell_type": "code", "execution_count": 276, "metadata": {}, "outputs": [], "source": [ "south_island_data['PopulationEstimate2018-06-30'] = pd.to_numeric(south_island_data['PopulationEstimate2018-06-30'])" ] }, { "cell_type": "code", "execution_count": 277, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name object\n", "Status object\n", "Region object\n", "PopulationEstimate2018-06-30 int64\n", "dtype: object" ] }, "execution_count": 277, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Renaming population column" ] }, { "cell_type": "code", "execution_count": 278, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Ahaura</td>\n", " <td>Rural Settlement</td>\n", " <td>West Coast</td>\n", " <td>80</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Akaroa</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>630</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate2018\n", "0 Ahaura Rural Settlement West Coast 80\n", "1 Akaroa Rural Settlement Canterbury 630" ] }, "execution_count": 278, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data = south_island_data.rename(columns={'PopulationEstimate2018-06-30': 'PopulationEstimate2018'})\n", "south_island_data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top 10 most populated places in south island" ] }, { "cell_type": "code", "execution_count": 279, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>28</th>\n", " <td>Christchurch</td>\n", " <td>Main Urban Area</td>\n", " <td>Canterbury</td>\n", " <td>377200</td>\n", " </tr>\n", " <tr>\n", " <th>40</th>\n", " <td>Dunedin</td>\n", " <td>Main Urban Area</td>\n", " <td>Otago</td>\n", " <td>104500</td>\n", " </tr>\n", " <tr>\n", " <th>125</th>\n", " <td>Nelson</td>\n", " <td>Large Urban Area</td>\n", " <td>Nelson</td>\n", " <td>49300</td>\n", " </tr>\n", " <tr>\n", " <th>73</th>\n", " <td>Invercargill</td>\n", " <td>Large Urban Area</td>\n", " <td>Southland</td>\n", " <td>48700</td>\n", " </tr>\n", " <tr>\n", " <th>195</th>\n", " <td>Timaru</td>\n", " <td>Medium Urban Area</td>\n", " <td>Canterbury</td>\n", " <td>28300</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Blenheim</td>\n", " <td>Medium Urban Area</td>\n", " <td>Marlborough</td>\n", " <td>26400</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Ashburton</td>\n", " <td>Medium Urban Area</td>\n", " <td>Canterbury</td>\n", " <td>19600</td>\n", " </tr>\n", " <tr>\n", " <th>158</th>\n", " <td>Rangiora</td>\n", " <td>Medium Urban Area</td>\n", " <td>Canterbury</td>\n", " <td>18400</td>\n", " </tr>\n", " <tr>\n", " <th>166</th>\n", " <td>Rolleston</td>\n", " <td>Medium Urban Area</td>\n", " <td>Canterbury</td>\n", " <td>16350</td>\n", " </tr>\n", " <tr>\n", " <th>154</th>\n", " <td>Queenstown</td>\n", " <td>Medium Urban Area</td>\n", " <td>Otago</td>\n", " <td>15650</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate2018\n", "28 Christchurch Main Urban Area Canterbury 377200\n", "40 Dunedin Main Urban Area Otago 104500\n", "125 Nelson Large Urban Area Nelson 49300\n", "73 Invercargill Large Urban Area Southland 48700\n", "195 Timaru Medium Urban Area Canterbury 28300\n", "19 Blenheim Medium Urban Area Marlborough 26400\n", "11 Ashburton Medium Urban Area Canterbury 19600\n", "158 Rangiora Medium Urban Area Canterbury 18400\n", "166 Rolleston Medium Urban Area Canterbury 16350\n", "154 Queenstown Medium Urban Area Otago 15650" ] }, "execution_count": 279, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.sort_values(by='PopulationEstimate2018', ascending=False).head(10)" ] }, { "cell_type": "code", "execution_count": 283, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1224x432 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "top_10 = south_island_data.sort_values(by='PopulationEstimate2018', ascending=False).head(10)\n", "\n", "plt.figure(figsize=(17, 6))\n", "sns.barplot(x='Name', y='PopulationEstimate2018', data=top_10)\n", "plt.title(\"Top 10 most populated places in South Island NZ\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top 10 least populated places in south island" ] }, { "cell_type": "code", "execution_count": 284, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>Region</th>\n", " <th>PopulationEstimate2018</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>59</th>\n", " <td>Haast</td>\n", " <td>Rural Settlement</td>\n", " <td>West Coast</td>\n", " <td>50</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>Arthur's Pass</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>60</td>\n", " </tr>\n", " <tr>\n", " <th>127</th>\n", " <td>Ngakuta Bay</td>\n", " <td>Rural Settlement</td>\n", " <td>Marlborough</td>\n", " <td>60</td>\n", " </tr>\n", " <tr>\n", " <th>150</th>\n", " <td>Pounawea</td>\n", " <td>Rural Settlement</td>\n", " <td>Otago</td>\n", " <td>60</td>\n", " </tr>\n", " <tr>\n", " <th>112</th>\n", " <td>Milford Huts</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>60</td>\n", " </tr>\n", " <tr>\n", " <th>153</th>\n", " <td>Purau</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>60</td>\n", " </tr>\n", " <tr>\n", " <th>116</th>\n", " <td>Moana</td>\n", " <td>Rural Settlement</td>\n", " <td>West Coast</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>174</th>\n", " <td>Selwyn Huts</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>70</td>\n", " </tr>\n", " <tr>\n", " <th>103</th>\n", " <td>Makikihi</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>80</td>\n", " </tr>\n", " <tr>\n", " <th>209</th>\n", " <td>Waipopo</td>\n", " <td>Rural Settlement</td>\n", " <td>Canterbury</td>\n", " <td>80</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status Region PopulationEstimate2018\n", "59 Haast Rural Settlement West Coast 50\n", "9 Arthur's Pass Rural Settlement Canterbury 60\n", "127 Ngakuta Bay Rural Settlement Marlborough 60\n", "150 Pounawea Rural Settlement Otago 60\n", "112 Milford Huts Rural Settlement Canterbury 60\n", "153 Purau Rural Settlement Canterbury 60\n", "116 Moana Rural Settlement West Coast 70\n", "174 Selwyn Huts Rural Settlement Canterbury 70\n", "103 Makikihi Rural Settlement Canterbury 80\n", "209 Waipopo Rural Settlement Canterbury 80" ] }, "execution_count": 284, "metadata": {}, "output_type": "execute_result" } ], "source": [ "south_island_data.sort_values(by='PopulationEstimate2018', ascending=True).head(10)" ] }, { "cell_type": "code", "execution_count": 285, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1944x432 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bottom_10 = south_island_data.sort_values(by='PopulationEstimate2018', ascending=True).head(10)\n", "\n", "plt.figure(figsize=(27, 6))\n", "sns.barplot(x='Name', y='PopulationEstimate2018', data=bottom_10)\n", "plt.title(\"Top 10 least populated places in South Island NZ\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Population counts grouped by Region" ] }, { "cell_type": "code", "execution_count": 286, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>sum</th>\n", " <th>count</th>\n", " </tr>\n", " <tr>\n", " <th>Region</th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>Canterbury</th>\n", " <td>544370</td>\n", " <td>87</td>\n", " </tr>\n", " <tr>\n", " <th>Marlborough</th>\n", " <td>37490</td>\n", " <td>17</td>\n", " </tr>\n", " <tr>\n", " <th>Nelson</th>\n", " <td>49300</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>Otago</th>\n", " <td>200050</td>\n", " <td>58</td>\n", " </tr>\n", " <tr>\n", " <th>Southland</th>\n", " <td>72450</td>\n", " <td>23</td>\n", " </tr>\n", " <tr>\n", " <th>Tasman</th>\n", " <td>35640</td>\n", " <td>19</td>\n", " </tr>\n", " <tr>\n", " <th>West Coast</th>\n", " <td>23000</td>\n", " <td>22</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " sum count\n", "Region \n", "Canterbury 544370 87\n", "Marlborough 37490 17\n", "Nelson 49300 1\n", "Otago 200050 58\n", "Southland 72450 23\n", "Tasman 35640 19\n", "West Coast 23000 22" ] }, "execution_count": 286, "metadata": {}, "output_type": "execute_result" } ], "source": [ "region_totals = south_island_data.groupby('Region')['PopulationEstimate2018'].agg(['sum', 'count'])\n", "region_totals" ] }, { "cell_type": "code", "execution_count": 288, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 1224x432 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(17, 6))\n", "sns.barplot(x=region_totals.index, y='sum', data=region_totals)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 4 }