{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Food Standards Agency\n",
"\n",
"Simple demo of how to get data out out of Food Standards Agency. This uses a scraper I've built previously for grabbing data from FSA into a SQLite database.\n",
"\n",
"To install the scraper, uncomment and run the following:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#%pip install git+https://github.com/ouseful-datasupply/food_gov_uk.git"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the scraper installed, grab the data from the FSA website into a SQLite db (`food.sqlite`):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!rm food.sqlite\n",
"\n",
"# The full download takes quite a lot of time (20 mins+); I've raised an issue:\n",
"# https://github.com/ouseful-datasupply/food_gov_uk/issues/1\n",
"#Please feel free to submit a PR\n",
"# If you fany making PRs, there's also a lot of try:except: blocks that should be properly trapped.\n",
"# It would perhaps also make sense to include a \"demo\" that just grabs one data file.\n",
"# It would be useful to also automagically use a tqdm notebook widget if we are in a notebook\n",
"# I guess we are calling from commandline here though...\n",
"!oi_fsa collect --dbname food.sqlite"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check what tables are available:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" fsa_ratings_metadata | \n",
"
\n",
" \n",
" 1 | \n",
" ratingstable | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name\n",
"0 fsa_ratings_metadata\n",
"1 ratingstable"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import sqlite3\n",
"\n",
"# Read sqlite query results into a pandas DataFrame\n",
"con = sqlite3.connect(\"food.sqlite\")\n",
"df = pd.read_sql_query(\"SELECT name FROM sqlite_master WHERE type='table';\", con)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Preview each table, strating with the `fsa_ratings_metadata` table:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Local authority | \n",
" Last update | \n",
" Number of businesses | \n",
" Link | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Babergh | \n",
" 20/07/2020 | \n",
" 864 | \n",
" http://ratings.food.gov.uk/OpenDataFiles/FHRS2... | \n",
"
\n",
" \n",
" 1 | \n",
" Basildon | \n",
" 31/07/2020 | \n",
" 1,213 | \n",
" http://ratings.food.gov.uk/OpenDataFiles/FHRS1... | \n",
"
\n",
" \n",
" 2 | \n",
" Bedford | \n",
" 10/07/2020 | \n",
" 1,419 | \n",
" http://ratings.food.gov.uk/OpenDataFiles/FHRS7... | \n",
"
\n",
" \n",
" 3 | \n",
" Braintree | \n",
" 30/07/2020 | \n",
" 1,145 | \n",
" http://ratings.food.gov.uk/OpenDataFiles/FHRS1... | \n",
"
\n",
" \n",
" 4 | \n",
" Breckland | \n",
" 28/07/2020 | \n",
" 1,276 | \n",
" http://ratings.food.gov.uk/OpenDataFiles/FHRS2... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Local authority Last update Number of businesses \\\n",
"0 Babergh 20/07/2020 864 \n",
"1 Basildon 31/07/2020 1,213 \n",
"2 Bedford 10/07/2020 1,419 \n",
"3 Braintree 30/07/2020 1,145 \n",
"4 Breckland 28/07/2020 1,276 \n",
"\n",
" Link \n",
"0 http://ratings.food.gov.uk/OpenDataFiles/FHRS2... \n",
"1 http://ratings.food.gov.uk/OpenDataFiles/FHRS1... \n",
"2 http://ratings.food.gov.uk/OpenDataFiles/FHRS7... \n",
"3 http://ratings.food.gov.uk/OpenDataFiles/FHRS1... \n",
"4 http://ratings.food.gov.uk/OpenDataFiles/FHRS2... "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql_query(\"SELECT * FROM fsa_ratings_metadata LIMIT 5;\", con)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And the `ratingstable`:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" FHRSID | \n",
" LocalAuthorityBusinessID | \n",
" BusinessName | \n",
" BusinessType | \n",
" BusinessTypeID | \n",
" AddressLine1 | \n",
" AddressLine2 | \n",
" AddressLine3 | \n",
" PostCode | \n",
" RatingValue | \n",
" ... | \n",
" LocalAuthorityEmailAddress | \n",
" SchemeType | \n",
" NewRatingPending | \n",
" AddressLine4 | \n",
" Hygiene | \n",
" Structural | \n",
" ConfidenceInManagement | \n",
" Longitude | \n",
" Latitude | \n",
" RightToReply | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 861699 | \n",
" PI/000109856 | \n",
" A G Lifestyle | \n",
" Restaurant/Cafe/Canteen | \n",
" 1 | \n",
" 38 Station Road | \n",
" SUDBURY | \n",
" Suffolk | \n",
" CO10 2SS | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" None | \n",
" 5 | \n",
" 0 | \n",
" 0 | \n",
" 0.73160300000000 | \n",
" 52.03688800000000 | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" 526543 | \n",
" PI/000036352 | \n",
" A Leeder Butchers | \n",
" Retailers - other | \n",
" 4613 | \n",
" 33 Swan Street | \n",
" Boxford | \n",
" SUDBURY | \n",
" CO10 5NZ | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" Suffolk | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.85756600000000 | \n",
" 52.02934900000000 | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" 1067344 | \n",
" PI/000179206 | \n",
" A Slice of NY | \n",
" Restaurant/Cafe/Canteen | \n",
" 1 | \n",
" 11a Friars Street | \n",
" SUDBURY | \n",
" Suffolk | \n",
" CO10 2AA | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" None | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.72908000000000 | \n",
" 52.03735200000000 | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" 526626 | \n",
" PI/000046693 | \n",
" A Taste Of Home | \n",
" Other catering premises | \n",
" 7841 | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" AwaitingInspection | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" 526987 | \n",
" PI/000111131 | \n",
" Abc Preschool | \n",
" Hospitals/Childcare/Caring Premises | \n",
" 5 | \n",
" ABC Pre School | \n",
" Country Primary School | \n",
" Strickmere | \n",
" CO7 6YG | \n",
" Exempt | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" Stratford St Mary | \n",
" None | \n",
" None | \n",
" None | \n",
" 0.97423600000000 | \n",
" 51.97134600000000 | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 25 columns
\n",
"
"
],
"text/plain": [
" FHRSID LocalAuthorityBusinessID BusinessName \\\n",
"0 861699 PI/000109856 A G Lifestyle \n",
"1 526543 PI/000036352 A Leeder Butchers \n",
"2 1067344 PI/000179206 A Slice of NY \n",
"3 526626 PI/000046693 A Taste Of Home \n",
"4 526987 PI/000111131 Abc Preschool \n",
"\n",
" BusinessType BusinessTypeID AddressLine1 \\\n",
"0 Restaurant/Cafe/Canteen 1 38 Station Road \n",
"1 Retailers - other 4613 33 Swan Street \n",
"2 Restaurant/Cafe/Canteen 1 11a Friars Street \n",
"3 Other catering premises 7841 None \n",
"4 Hospitals/Childcare/Caring Premises 5 ABC Pre School \n",
"\n",
" AddressLine2 AddressLine3 PostCode RatingValue ... \\\n",
"0 SUDBURY Suffolk CO10 2SS 5 ... \n",
"1 Boxford SUDBURY CO10 5NZ 5 ... \n",
"2 SUDBURY Suffolk CO10 2AA 5 ... \n",
"3 None None None AwaitingInspection ... \n",
"4 Country Primary School Strickmere CO7 6YG Exempt ... \n",
"\n",
" LocalAuthorityEmailAddress SchemeType NewRatingPending \\\n",
"0 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n",
"1 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n",
"2 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n",
"3 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n",
"4 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n",
"\n",
" AddressLine4 Hygiene Structural ConfidenceInManagement \\\n",
"0 None 5 0 0 \n",
"1 Suffolk 0 0 0 \n",
"2 None 0 0 0 \n",
"3 None None None None \n",
"4 Stratford St Mary None None None \n",
"\n",
" Longitude Latitude RightToReply \n",
"0 0.73160300000000 52.03688800000000 None \n",
"1 0.85756600000000 52.02934900000000 None \n",
"2 0.72908000000000 52.03735200000000 None \n",
"3 None None None \n",
"4 0.97423600000000 51.97134600000000 None \n",
"\n",
"[5 rows x 25 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql_query(\"SELECT * FROM ratingstable LIMIT 5;\", con)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `ratingstable` is the interesting one. Let's just have a guess that there may be things called *supermarkets*:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" FHRSID | \n",
" LocalAuthorityBusinessID | \n",
" BusinessName | \n",
" BusinessType | \n",
" BusinessTypeID | \n",
" AddressLine1 | \n",
" AddressLine2 | \n",
" AddressLine3 | \n",
" PostCode | \n",
" RatingValue | \n",
" ... | \n",
" LocalAuthorityEmailAddress | \n",
" SchemeType | \n",
" NewRatingPending | \n",
" AddressLine4 | \n",
" Hygiene | \n",
" Structural | \n",
" ConfidenceInManagement | \n",
" Longitude | \n",
" Latitude | \n",
" RightToReply | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 911046 | \n",
" PI/000174005 | \n",
" Aldi Foodstore | \n",
" Retailers - supermarkets/hypermarkets | \n",
" 7840 | \n",
" Aldi Store Ltd | \n",
" 2 Donald Mackintosh Way | \n",
" Pinewood | \n",
" IP8 3LQ | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" IPSWICH | \n",
" 5 | \n",
" 5 | \n",
" 5 | \n",
" None | \n",
" None | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" 526264 | \n",
" PI/000011690 | \n",
" Aldi Foodstore Ltd | \n",
" Retailers - supermarkets/hypermarkets | \n",
" 7840 | \n",
" Girling Street | \n",
" SUDBURY | \n",
" Suffolk | \n",
" CO10 1NB | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" None | \n",
" 5 | \n",
" 0 | \n",
" 0 | \n",
" 0.72920400000000 | \n",
" 52.04225900000000 | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" 1025780 | \n",
" PI/000177356 | \n",
" Budgens | \n",
" Retailers - supermarkets/hypermarkets | \n",
" 7840 | \n",
" The Pharmacy | \n",
" Hall Street | \n",
" Long Melford | \n",
" CO10 9JG | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" SUDBURY | \n",
" 0 | \n",
" 0 | \n",
" 5 | \n",
" 0.71929200000000 | \n",
" 52.07931500000000 | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" 782482 | \n",
" PI/000170924 | \n",
" Co-operative Food Group Ltd | \n",
" Retailers - supermarkets/hypermarkets | \n",
" 7840 | \n",
" Co-operative Supermarket | \n",
" 46 Canhams Road | \n",
" Great Cornard | \n",
" CO10 0ER | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" SUDBURY | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.75501000000000 | \n",
" 52.02886900000000 | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" 526594 | \n",
" PI/000039208 | \n",
" Co-Operative Group Society Ltd | \n",
" Retailers - supermarkets/hypermarkets | \n",
" 7840 | \n",
" 80a High Street | \n",
" Lavenham | \n",
" SUDBURY | \n",
" CO10 9PT | \n",
" 5 | \n",
" ... | \n",
" foodsafety@baberghmidsuffolk.gov.uk | \n",
" FHRS | \n",
" False | \n",
" Suffolk | \n",
" 0 | \n",
" 5 | \n",
" 0 | \n",
" 0.79535500000000 | \n",
" 52.10948700000000 | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 25 columns
\n",
"
"
],
"text/plain": [
" FHRSID LocalAuthorityBusinessID BusinessName \\\n",
"0 911046 PI/000174005 Aldi Foodstore \n",
"1 526264 PI/000011690 Aldi Foodstore Ltd \n",
"2 1025780 PI/000177356 Budgens \n",
"3 782482 PI/000170924 Co-operative Food Group Ltd \n",
"4 526594 PI/000039208 Co-Operative Group Society Ltd \n",
"\n",
" BusinessType BusinessTypeID \\\n",
"0 Retailers - supermarkets/hypermarkets 7840 \n",
"1 Retailers - supermarkets/hypermarkets 7840 \n",
"2 Retailers - supermarkets/hypermarkets 7840 \n",
"3 Retailers - supermarkets/hypermarkets 7840 \n",
"4 Retailers - supermarkets/hypermarkets 7840 \n",
"\n",
" AddressLine1 AddressLine2 AddressLine3 PostCode \\\n",
"0 Aldi Store Ltd 2 Donald Mackintosh Way Pinewood IP8 3LQ \n",
"1 Girling Street SUDBURY Suffolk CO10 1NB \n",
"2 The Pharmacy Hall Street Long Melford CO10 9JG \n",
"3 Co-operative Supermarket 46 Canhams Road Great Cornard CO10 0ER \n",
"4 80a High Street Lavenham SUDBURY CO10 9PT \n",
"\n",
" RatingValue ... LocalAuthorityEmailAddress SchemeType \\\n",
"0 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n",
"1 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n",
"2 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n",
"3 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n",
"4 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n",
"\n",
" NewRatingPending AddressLine4 Hygiene Structural ConfidenceInManagement \\\n",
"0 False IPSWICH 5 5 5 \n",
"1 False None 5 0 0 \n",
"2 False SUDBURY 0 0 5 \n",
"3 False SUDBURY 0 0 0 \n",
"4 False Suffolk 0 5 0 \n",
"\n",
" Longitude Latitude RightToReply \n",
"0 None None None \n",
"1 0.72920400000000 52.04225900000000 None \n",
"2 0.71929200000000 52.07931500000000 None \n",
"3 0.75501000000000 52.02886900000000 None \n",
"4 0.79535500000000 52.10948700000000 None \n",
"\n",
"[5 rows x 25 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ratings_df = pd.read_sql_query(\"SELECT * FROM ratingstable WHERE BusinessType LIKE '%supermarket%' LIMIT 5;\", con)\n",
"ratings_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What other business types are there?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BusinessType | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Restaurant/Cafe/Canteen | \n",
"
\n",
" \n",
" 1 | \n",
" Retailers - other | \n",
"
\n",
" \n",
" 2 | \n",
" Other catering premises | \n",
"
\n",
" \n",
" 3 | \n",
" Hospitals/Childcare/Caring Premises | \n",
"
\n",
" \n",
" 4 | \n",
" School/college/university | \n",
"
\n",
" \n",
" 5 | \n",
" Retailers - supermarkets/hypermarkets | \n",
"
\n",
" \n",
" 6 | \n",
" Manufacturers/packers | \n",
"
\n",
" \n",
" 7 | \n",
" Pub/bar/nightclub | \n",
"
\n",
" \n",
" 8 | \n",
" Takeaway/sandwich shop | \n",
"
\n",
" \n",
" 9 | \n",
" Farmers/growers | \n",
"
\n",
" \n",
" 10 | \n",
" Hotel/bed & breakfast/guest house | \n",
"
\n",
" \n",
" 11 | \n",
" Mobile caterer | \n",
"
\n",
" \n",
" 12 | \n",
" Distributors/Transporters | \n",
"
\n",
" \n",
" 13 | \n",
" Importers/Exporters | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BusinessType\n",
"0 Restaurant/Cafe/Canteen\n",
"1 Retailers - other\n",
"2 Other catering premises\n",
"3 Hospitals/Childcare/Caring Premises\n",
"4 School/college/university\n",
"5 Retailers - supermarkets/hypermarkets\n",
"6 Manufacturers/packers\n",
"7 Pub/bar/nightclub\n",
"8 Takeaway/sandwich shop\n",
"9 Farmers/growers\n",
"10 Hotel/bed & breakfast/guest house\n",
"11 Mobile caterer\n",
"12 Distributors/Transporters\n",
"13 Importers/Exporters"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"business_types = pd.read_sql_query(\"SELECT DISTINCT BusinessType FROM ratingstable;\", con)\n",
"business_types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You could also filter down on business names:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BusinessName | \n",
" Num | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Tesco | \n",
" 1664 | \n",
"
\n",
" \n",
" 1 | \n",
" Sainsbury's | \n",
" 974 | \n",
"
\n",
" \n",
" 2 | \n",
" Lidl | \n",
" 444 | \n",
"
\n",
" \n",
" 3 | \n",
" Aldi | \n",
" 421 | \n",
"
\n",
" \n",
" 4 | \n",
" Iceland | \n",
" 398 | \n",
"
\n",
" \n",
" 5 | \n",
" Co-op | \n",
" 368 | \n",
"
\n",
" \n",
" 6 | \n",
" Tesco Express | \n",
" 363 | \n",
"
\n",
" \n",
" 7 | \n",
" Morrisons | \n",
" 288 | \n",
"
\n",
" \n",
" 8 | \n",
" Asda | \n",
" 275 | \n",
"
\n",
" \n",
" 9 | \n",
" Marks & Spencer | \n",
" 214 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BusinessName Num\n",
"0 Tesco 1664\n",
"1 Sainsbury's 974\n",
"2 Lidl 444\n",
"3 Aldi 421\n",
"4 Iceland 398\n",
"5 Co-op 368\n",
"6 Tesco Express 363\n",
"7 Morrisons 288\n",
"8 Asda 275\n",
"9 Marks & Spencer 214"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q='''\n",
"SELECT DISTINCT BusinessName, COUNT(BusinessName) AS Num FROM ratingstable\n",
"WHERE BusinessType LIKE '%supermarket%'\n",
"GROUP BY BusinessName ORDER BY Num DESC;\n",
"'''\n",
"\n",
"supermarket_brands = pd.read_sql_query(q, con)\n",
"supermarket_brands.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simple Mapping\n",
"\n",
"We can map things easily enough:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" BusinessName | \n",
" Latitude | \n",
" Longitude | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Aldi | \n",
" 52.00415000000000 | \n",
" -0.79434300000000 | \n",
"
\n",
" \n",
" 1 | \n",
" Aldi | \n",
" 52.00359344482420 | \n",
" -0.72759300470352 | \n",
"
\n",
" \n",
" 2 | \n",
" Asda | \n",
" 52.00531500000000 | \n",
" -0.72898600000000 | \n",
"
\n",
" \n",
" 3 | \n",
" Co-op | \n",
" 51.99034900000000 | \n",
" -0.75381900000000 | \n",
"
\n",
" \n",
" 4 | \n",
" Co-op | \n",
" 52.00190700000000 | \n",
" -0.77912200000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" BusinessName Latitude Longitude\n",
"0 Aldi 52.00415000000000 -0.79434300000000\n",
"1 Aldi 52.00359344482420 -0.72759300470352\n",
"2 Asda 52.00531500000000 -0.72898600000000\n",
"3 Co-op 51.99034900000000 -0.75381900000000\n",
"4 Co-op 52.00190700000000 -0.77912200000000"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import folium\n",
"\n",
"q='''\n",
"SELECT DISTINCT BusinessName, Latitude, Longitude FROM ratingstable\n",
"WHERE BusinessType LIKE '%supermarket%' AND PostCode LIKE 'MK_ %' AND Longitude NOT NULL;\n",
"'''\n",
"\n",
"mk_supermarkets = pd.read_sql_query(q, con)\n",
"\n",
"mk_supermarkets.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hmm... so we're not? Maybe my scraper is borked:-(\n",
"\n",
"Anyway, example of how to work with what we've got..."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BusinessName object\n",
"Latitude object\n",
"Longitude object\n",
"dtype: object"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mk_supermarkets.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cast the lat/lon to numerics:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"mk_supermarkets[['Latitude', 'Longitude']] = mk_supermarkets[['Latitude', 'Longitude']].apply(pd.to_numeric, errors='coerce')\n",
"# We could probably alternatively CAST() in the query"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get the average location to center the map:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Latitude 52.012905\n",
"Longitude -0.751281\n",
"dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"AVERAGE_LOCATION = mk_supermarkets[['Latitude', 'Longitude']].mean()\n",
"AVERAGE_LOCATION"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m = folium.Map(AVERAGE_LOCATION, width=500, height=800, zoom_start=11)\n",
"\n",
"def add_marker(row, m):\n",
" folium.Circle(location=[row['Latitude'], row['Longitude']],\n",
" color = 'red', radius=50, fill=True, fill_opacity=1.0,\n",
" tooltip=row['BusinessName']).add_to(m)\n",
"\n",
"mk_supermarkets.apply(add_marker, m=m, axis=1)\n",
"\n",
"m"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Index Of Deprivation — Example\n",
"\n",
"The following is cribbed from https://blog.ouseful.info/2019/03/26/we-need-to-talk-about-geo/ and is a demo at an LA level. You'd probably want to go to lower geography (I'm not sure how low deprivation stats collection areas go? LSOA at least, I think...) but this is just a proof of concept.\n",
"\n",
"There are other examples in [Components for Rolling Your Own GIS Inside Jupyter Notebooks](https://blog.ouseful.info/2019/04/02/fragment-components-for-rolling-your-own-gis-inside-jupyter-notebooks/) showing how to create draggable search areas and run geo-queries; I've also previosuly done some [LSOA examples](https://github.com/psychemedia/crime-data-demo/blob/master/LSOA%20Sketches.ipynb). The [Police API demo](https://github.com/psychemedia/crime-data-demo/blob/master/Police%20API%20Demo.ipynb) has some other interesting examples, like plotting crimes along a route, which could be reversioned here to give a map of food outlets along a road, for example."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Ingest\n",
"Start by grabbing some data. There's possibly an API way somewhere for doing this, or you can do it manually by searching http://geoportal.statistics.gov.uk/ eg for LAs http://geoportal.statistics.gov.uk/datasets/1d78d47c87df4212b79fe2323aae8e08_0\n",
"\n",
"Get a shapefile from [ONS geoportal](http://geoportal.statistics.gov.uk):"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"import geopandas\n",
" \n",
"#From the downloads area of the page, grab the link for the shapefile download\n",
"# This may take some time...\n",
"url='https://opendata.arcgis.com/datasets/7ff28788e1e640de8150fb8f35703f6e_2.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D'\n",
"gdf = geopandas.read_file(url)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get some deprivation data - there may be more recent data available; it'd also be neater if this were done via an API call to something, perhaps on https://opendatacommunities.org/ or similar:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['Notes', 'IMD', 'Income', 'Employment', 'Education', 'Health', 'Crime', 'Barriers', 'Living', 'IDACI', 'IDAOPI'])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015\n",
"#File 10: local authority district summaries\n",
"data_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/464464/File_10_ID2015_Local_Authority_District_Summaries.xlsx'\n",
" \n",
"#Download and read in the deprivation data Excel file\n",
"df = pd.read_excel(data_url, sheet_name=None)\n",
" \n",
"#Preview the name of the sheets in the data loaded from the Excel file\n",
"df.keys()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Merge boundaries and deprivation data:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"#Merge in data\n",
"gdf = pd.merge(gdf, df['Education'],\n",
" how='inner', #The type of join (what happens if data is in one dataset and not the other)\n",
" left_on='lad16cd', #Column we're merging on in left dataframe\n",
" right_on='Local Authority District code (2013)'#Column we're merging on in right dataframe\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Preview the merged data:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" objectid | \n",
" lad16cd | \n",
" lad16nm | \n",
" lad16nmw | \n",
" bng_e | \n",
" bng_n | \n",
" long | \n",
" lat | \n",
" st_areasha | \n",
" st_lengths | \n",
" geometry | \n",
" Local Authority District code (2013) | \n",
" Local Authority District name (2013) | \n",
" Education, Skills and Training - Average rank | \n",
" Education, Skills and Training - Rank of average rank | \n",
" Education, Skills and Training - Average score | \n",
" Education, Skills and Training - Rank of average score | \n",
" Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally | \n",
" Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" E06000001 | \n",
" Hartlepool | \n",
" None | \n",
" 447157 | \n",
" 531476 | \n",
" -1.27023 | \n",
" 54.676159 | \n",
" 9.359786e+07 | \n",
" 69382.685924 | \n",
" MULTIPOLYGON (((447097.001 537152.001, 447228.... | \n",
" E06000001 | \n",
" Hartlepool | \n",
" 20101.48 | \n",
" 72 | \n",
" 30.510 | \n",
" 47 | \n",
" 0.2069 | \n",
" 37 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" E06000002 | \n",
" Middlesbrough | \n",
" None | \n",
" 451141 | \n",
" 516887 | \n",
" -1.21099 | \n",
" 54.544670 | \n",
" 5.387900e+07 | \n",
" 42085.584812 | \n",
" MULTIPOLYGON (((449861.900 521260.700, 449853.... | \n",
" E06000002 | \n",
" Middlesbrough | \n",
" 22728.01 | \n",
" 24 | \n",
" 40.640 | \n",
" 3 | \n",
" 0.4419 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" E06000003 | \n",
" Redcar and Cleveland | \n",
" None | \n",
" 464359 | \n",
" 519597 | \n",
" -1.00611 | \n",
" 54.567520 | \n",
" 2.448388e+08 | \n",
" 96189.660709 | \n",
" MULTIPOLYGON (((455776.701 528322.499, 455684.... | \n",
" E06000003 | \n",
" Redcar and Cleveland | \n",
" 19185.28 | \n",
" 95 | \n",
" 27.875 | \n",
" 71 | \n",
" 0.1818 | \n",
" 54 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" E06000004 | \n",
" Stockton-on-Tees | \n",
" None | \n",
" 444937 | \n",
" 518183 | \n",
" -1.30669 | \n",
" 54.556911 | \n",
" 2.049366e+08 | \n",
" 115439.477112 | \n",
" MULTIPOLYGON (((444126.099 528005.799, 444165.... | \n",
" E06000004 | \n",
" Stockton-on-Tees | \n",
" 16660.09 | \n",
" 150 | \n",
" 24.637 | \n",
" 110 | \n",
" 0.1750 | \n",
" 59 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" E06000005 | \n",
" Darlington | \n",
" None | \n",
" 428029 | \n",
" 515649 | \n",
" -1.56835 | \n",
" 54.535351 | \n",
" 1.974820e+08 | \n",
" 105799.568559 | \n",
" POLYGON ((423475.701 524731.596, 423497.204 52... | \n",
" E06000005 | \n",
" Darlington | \n",
" 16385.06 | \n",
" 155 | \n",
" 22.569 | \n",
" 129 | \n",
" 0.1385 | \n",
" 75 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" objectid lad16cd lad16nm lad16nmw bng_e bng_n \\\n",
"0 1 E06000001 Hartlepool None 447157 531476 \n",
"1 2 E06000002 Middlesbrough None 451141 516887 \n",
"2 3 E06000003 Redcar and Cleveland None 464359 519597 \n",
"3 4 E06000004 Stockton-on-Tees None 444937 518183 \n",
"4 5 E06000005 Darlington None 428029 515649 \n",
"\n",
" long lat st_areasha st_lengths \\\n",
"0 -1.27023 54.676159 9.359786e+07 69382.685924 \n",
"1 -1.21099 54.544670 5.387900e+07 42085.584812 \n",
"2 -1.00611 54.567520 2.448388e+08 96189.660709 \n",
"3 -1.30669 54.556911 2.049366e+08 115439.477112 \n",
"4 -1.56835 54.535351 1.974820e+08 105799.568559 \n",
"\n",
" geometry \\\n",
"0 MULTIPOLYGON (((447097.001 537152.001, 447228.... \n",
"1 MULTIPOLYGON (((449861.900 521260.700, 449853.... \n",
"2 MULTIPOLYGON (((455776.701 528322.499, 455684.... \n",
"3 MULTIPOLYGON (((444126.099 528005.799, 444165.... \n",
"4 POLYGON ((423475.701 524731.596, 423497.204 52... \n",
"\n",
" Local Authority District code (2013) Local Authority District name (2013) \\\n",
"0 E06000001 Hartlepool \n",
"1 E06000002 Middlesbrough \n",
"2 E06000003 Redcar and Cleveland \n",
"3 E06000004 Stockton-on-Tees \n",
"4 E06000005 Darlington \n",
"\n",
" Education, Skills and Training - Average rank \\\n",
"0 20101.48 \n",
"1 22728.01 \n",
"2 19185.28 \n",
"3 16660.09 \n",
"4 16385.06 \n",
"\n",
" Education, Skills and Training - Rank of average rank \\\n",
"0 72 \n",
"1 24 \n",
"2 95 \n",
"3 150 \n",
"4 155 \n",
"\n",
" Education, Skills and Training - Average score \\\n",
"0 30.510 \n",
"1 40.640 \n",
"2 27.875 \n",
"3 24.637 \n",
"4 22.569 \n",
"\n",
" Education, Skills and Training - Rank of average score \\\n",
"0 47 \n",
"1 3 \n",
"2 71 \n",
"3 110 \n",
"4 129 \n",
"\n",
" Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally \\\n",
"0 0.2069 \n",
"1 0.4419 \n",
"2 0.1818 \n",
"3 0.1750 \n",
"4 0.1385 \n",
"\n",
" Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally \n",
"0 37 \n",
"1 1 \n",
"2 54 \n",
"3 59 \n",
"4 75 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gdf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Example of plotting boundary overlaid on previous map (this will probably clobber the tooltip/popup; choropleth should be plotted first or on a lower layer):"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/site-packages/pyproj/crs.py:77: FutureWarning: '+init=:' syntax is deprecated. ':' is the preferred initialization method.\n",
" return _prepare_from_string(\" \".join(pjargs))\n"
]
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"folium.Choropleth(gdf[gdf['lad16nm']=='Milton Keynes'], key_on='feature.properties.lad16cd',\n",
" data=df['Education'],\n",
" columns=['Local Authority District code (2013)',\n",
" 'Education, Skills and Training - Rank of average rank'],\n",
" fill_color='YlOrBr').add_to(m)\n",
"m"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's the start of a simple recipe for finding areas that neighbour a particular area:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#Via https://gis.stackexchange.com/a/300262/119781\n",
" \n",
"def plotNeighbours(gdf, region='Milton Keynes',\n",
" indicator='Education, Skills and Training - Rank of average rank',\n",
" cmap='OrRd'):\n",
" ''' Plot choropleth for an indicator relative to a specified region and its neighbours. '''\n",
" \n",
" targetBoundary = gdf[gdf['lad16nm']==region]['geometry'].values[0]\n",
" neighbours = gdf.apply(lambda row: row['geometry'].touches(targetBoundary) or row['geometry']==targetBoundary ,\n",
" axis=1)\n",
" # neighbours is a gdf that contains areas that neighbour the specified area\n",
" # Now plot these etc..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tidy Up\n",
"\n",
"Close the database connection:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"con.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}