{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Food Standards Agency\n", "\n", "Simple demo of how to get data out out of Food Standards Agency. This uses a scraper I've built previously for grabbing data from FSA into a SQLite database.\n", "\n", "To install the scraper, uncomment and run the following:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#%pip install git+https://github.com/ouseful-datasupply/food_gov_uk.git" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the scraper installed, grab the data from the FSA website into a SQLite db (`food.sqlite`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#!rm food.sqlite\n", "\n", "# The full download takes quite a lot of time (20 mins+); I've raised an issue:\n", "# https://github.com/ouseful-datasupply/food_gov_uk/issues/1\n", "#Please feel free to submit a PR\n", "# If you fany making PRs, there's also a lot of try:except: blocks that should be properly trapped.\n", "# It would perhaps also make sense to include a \"demo\" that just grabs one data file.\n", "# It would be useful to also automagically use a tqdm notebook widget if we are in a notebook\n", "# I guess we are calling from commandline here though...\n", "!oi_fsa collect --dbname food.sqlite" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check what tables are available:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
name
0fsa_ratings_metadata
1ratingstable
\n", "
" ], "text/plain": [ " name\n", "0 fsa_ratings_metadata\n", "1 ratingstable" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import sqlite3\n", "\n", "# Read sqlite query results into a pandas DataFrame\n", "con = sqlite3.connect(\"food.sqlite\")\n", "df = pd.read_sql_query(\"SELECT name FROM sqlite_master WHERE type='table';\", con)\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Preview each table, strating with the `fsa_ratings_metadata` table:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Local authorityLast updateNumber of businessesLink
0Babergh20/07/2020864http://ratings.food.gov.uk/OpenDataFiles/FHRS2...
1Basildon31/07/20201,213http://ratings.food.gov.uk/OpenDataFiles/FHRS1...
2Bedford10/07/20201,419http://ratings.food.gov.uk/OpenDataFiles/FHRS7...
3Braintree30/07/20201,145http://ratings.food.gov.uk/OpenDataFiles/FHRS1...
4Breckland28/07/20201,276http://ratings.food.gov.uk/OpenDataFiles/FHRS2...
\n", "
" ], "text/plain": [ " Local authority Last update Number of businesses \\\n", "0 Babergh 20/07/2020 864 \n", "1 Basildon 31/07/2020 1,213 \n", "2 Bedford 10/07/2020 1,419 \n", "3 Braintree 30/07/2020 1,145 \n", "4 Breckland 28/07/2020 1,276 \n", "\n", " Link \n", "0 http://ratings.food.gov.uk/OpenDataFiles/FHRS2... \n", "1 http://ratings.food.gov.uk/OpenDataFiles/FHRS1... \n", "2 http://ratings.food.gov.uk/OpenDataFiles/FHRS7... \n", "3 http://ratings.food.gov.uk/OpenDataFiles/FHRS1... \n", "4 http://ratings.food.gov.uk/OpenDataFiles/FHRS2... " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_sql_query(\"SELECT * FROM fsa_ratings_metadata LIMIT 5;\", con)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the `ratingstable`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FHRSIDLocalAuthorityBusinessIDBusinessNameBusinessTypeBusinessTypeIDAddressLine1AddressLine2AddressLine3PostCodeRatingValue...LocalAuthorityEmailAddressSchemeTypeNewRatingPendingAddressLine4HygieneStructuralConfidenceInManagementLongitudeLatitudeRightToReply
0861699PI/000109856A G LifestyleRestaurant/Cafe/Canteen138 Station RoadSUDBURYSuffolkCO10 2SS5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseNone5000.7316030000000052.03688800000000None
1526543PI/000036352A Leeder ButchersRetailers - other461333 Swan StreetBoxfordSUDBURYCO10 5NZ5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseSuffolk0000.8575660000000052.02934900000000None
21067344PI/000179206A Slice of NYRestaurant/Cafe/Canteen111a Friars StreetSUDBURYSuffolkCO10 2AA5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseNone0000.7290800000000052.03735200000000None
3526626PI/000046693A Taste Of HomeOther catering premises7841NoneNoneNoneNoneAwaitingInspection...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseNoneNoneNoneNoneNoneNoneNone
4526987PI/000111131Abc PreschoolHospitals/Childcare/Caring Premises5ABC Pre SchoolCountry Primary SchoolStrickmereCO7 6YGExempt...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseStratford St MaryNoneNoneNone0.9742360000000051.97134600000000None
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " FHRSID LocalAuthorityBusinessID BusinessName \\\n", "0 861699 PI/000109856 A G Lifestyle \n", "1 526543 PI/000036352 A Leeder Butchers \n", "2 1067344 PI/000179206 A Slice of NY \n", "3 526626 PI/000046693 A Taste Of Home \n", "4 526987 PI/000111131 Abc Preschool \n", "\n", " BusinessType BusinessTypeID AddressLine1 \\\n", "0 Restaurant/Cafe/Canteen 1 38 Station Road \n", "1 Retailers - other 4613 33 Swan Street \n", "2 Restaurant/Cafe/Canteen 1 11a Friars Street \n", "3 Other catering premises 7841 None \n", "4 Hospitals/Childcare/Caring Premises 5 ABC Pre School \n", "\n", " AddressLine2 AddressLine3 PostCode RatingValue ... \\\n", "0 SUDBURY Suffolk CO10 2SS 5 ... \n", "1 Boxford SUDBURY CO10 5NZ 5 ... \n", "2 SUDBURY Suffolk CO10 2AA 5 ... \n", "3 None None None AwaitingInspection ... \n", "4 Country Primary School Strickmere CO7 6YG Exempt ... \n", "\n", " LocalAuthorityEmailAddress SchemeType NewRatingPending \\\n", "0 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n", "1 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n", "2 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n", "3 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n", "4 foodsafety@baberghmidsuffolk.gov.uk FHRS False \n", "\n", " AddressLine4 Hygiene Structural ConfidenceInManagement \\\n", "0 None 5 0 0 \n", "1 Suffolk 0 0 0 \n", "2 None 0 0 0 \n", "3 None None None None \n", "4 Stratford St Mary None None None \n", "\n", " Longitude Latitude RightToReply \n", "0 0.73160300000000 52.03688800000000 None \n", "1 0.85756600000000 52.02934900000000 None \n", "2 0.72908000000000 52.03735200000000 None \n", "3 None None None \n", "4 0.97423600000000 51.97134600000000 None \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_sql_query(\"SELECT * FROM ratingstable LIMIT 5;\", con)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `ratingstable` is the interesting one. Let's just have a guess that there may be things called *supermarkets*:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FHRSIDLocalAuthorityBusinessIDBusinessNameBusinessTypeBusinessTypeIDAddressLine1AddressLine2AddressLine3PostCodeRatingValue...LocalAuthorityEmailAddressSchemeTypeNewRatingPendingAddressLine4HygieneStructuralConfidenceInManagementLongitudeLatitudeRightToReply
0911046PI/000174005Aldi FoodstoreRetailers - supermarkets/hypermarkets7840Aldi Store Ltd2 Donald Mackintosh WayPinewoodIP8 3LQ5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseIPSWICH555NoneNoneNone
1526264PI/000011690Aldi Foodstore LtdRetailers - supermarkets/hypermarkets7840Girling StreetSUDBURYSuffolkCO10 1NB5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseNone5000.7292040000000052.04225900000000None
21025780PI/000177356BudgensRetailers - supermarkets/hypermarkets7840The PharmacyHall StreetLong MelfordCO10 9JG5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseSUDBURY0050.7192920000000052.07931500000000None
3782482PI/000170924Co-operative Food Group LtdRetailers - supermarkets/hypermarkets7840Co-operative Supermarket46 Canhams RoadGreat CornardCO10 0ER5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseSUDBURY0000.7550100000000052.02886900000000None
4526594PI/000039208Co-Operative Group Society LtdRetailers - supermarkets/hypermarkets784080a High StreetLavenhamSUDBURYCO10 9PT5...foodsafety@baberghmidsuffolk.gov.ukFHRSFalseSuffolk0500.7953550000000052.10948700000000None
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " FHRSID LocalAuthorityBusinessID BusinessName \\\n", "0 911046 PI/000174005 Aldi Foodstore \n", "1 526264 PI/000011690 Aldi Foodstore Ltd \n", "2 1025780 PI/000177356 Budgens \n", "3 782482 PI/000170924 Co-operative Food Group Ltd \n", "4 526594 PI/000039208 Co-Operative Group Society Ltd \n", "\n", " BusinessType BusinessTypeID \\\n", "0 Retailers - supermarkets/hypermarkets 7840 \n", "1 Retailers - supermarkets/hypermarkets 7840 \n", "2 Retailers - supermarkets/hypermarkets 7840 \n", "3 Retailers - supermarkets/hypermarkets 7840 \n", "4 Retailers - supermarkets/hypermarkets 7840 \n", "\n", " AddressLine1 AddressLine2 AddressLine3 PostCode \\\n", "0 Aldi Store Ltd 2 Donald Mackintosh Way Pinewood IP8 3LQ \n", "1 Girling Street SUDBURY Suffolk CO10 1NB \n", "2 The Pharmacy Hall Street Long Melford CO10 9JG \n", "3 Co-operative Supermarket 46 Canhams Road Great Cornard CO10 0ER \n", "4 80a High Street Lavenham SUDBURY CO10 9PT \n", "\n", " RatingValue ... LocalAuthorityEmailAddress SchemeType \\\n", "0 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n", "1 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n", "2 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n", "3 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n", "4 5 ... foodsafety@baberghmidsuffolk.gov.uk FHRS \n", "\n", " NewRatingPending AddressLine4 Hygiene Structural ConfidenceInManagement \\\n", "0 False IPSWICH 5 5 5 \n", "1 False None 5 0 0 \n", "2 False SUDBURY 0 0 5 \n", "3 False SUDBURY 0 0 0 \n", "4 False Suffolk 0 5 0 \n", "\n", " Longitude Latitude RightToReply \n", "0 None None None \n", "1 0.72920400000000 52.04225900000000 None \n", "2 0.71929200000000 52.07931500000000 None \n", "3 0.75501000000000 52.02886900000000 None \n", "4 0.79535500000000 52.10948700000000 None \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings_df = pd.read_sql_query(\"SELECT * FROM ratingstable WHERE BusinessType LIKE '%supermarket%' LIMIT 5;\", con)\n", "ratings_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What other business types are there?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BusinessType
0Restaurant/Cafe/Canteen
1Retailers - other
2Other catering premises
3Hospitals/Childcare/Caring Premises
4School/college/university
5Retailers - supermarkets/hypermarkets
6Manufacturers/packers
7Pub/bar/nightclub
8Takeaway/sandwich shop
9Farmers/growers
10Hotel/bed & breakfast/guest house
11Mobile caterer
12Distributors/Transporters
13Importers/Exporters
\n", "
" ], "text/plain": [ " BusinessType\n", "0 Restaurant/Cafe/Canteen\n", "1 Retailers - other\n", "2 Other catering premises\n", "3 Hospitals/Childcare/Caring Premises\n", "4 School/college/university\n", "5 Retailers - supermarkets/hypermarkets\n", "6 Manufacturers/packers\n", "7 Pub/bar/nightclub\n", "8 Takeaway/sandwich shop\n", "9 Farmers/growers\n", "10 Hotel/bed & breakfast/guest house\n", "11 Mobile caterer\n", "12 Distributors/Transporters\n", "13 Importers/Exporters" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "business_types = pd.read_sql_query(\"SELECT DISTINCT BusinessType FROM ratingstable;\", con)\n", "business_types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You could also filter down on business names:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BusinessNameNum
0Tesco1664
1Sainsbury's974
2Lidl444
3Aldi421
4Iceland398
5Co-op368
6Tesco Express363
7Morrisons288
8Asda275
9Marks & Spencer214
\n", "
" ], "text/plain": [ " BusinessName Num\n", "0 Tesco 1664\n", "1 Sainsbury's 974\n", "2 Lidl 444\n", "3 Aldi 421\n", "4 Iceland 398\n", "5 Co-op 368\n", "6 Tesco Express 363\n", "7 Morrisons 288\n", "8 Asda 275\n", "9 Marks & Spencer 214" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q='''\n", "SELECT DISTINCT BusinessName, COUNT(BusinessName) AS Num FROM ratingstable\n", "WHERE BusinessType LIKE '%supermarket%'\n", "GROUP BY BusinessName ORDER BY Num DESC;\n", "'''\n", "\n", "supermarket_brands = pd.read_sql_query(q, con)\n", "supermarket_brands.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple Mapping\n", "\n", "We can map things easily enough:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BusinessNameLatitudeLongitude
0Aldi52.00415000000000-0.79434300000000
1Aldi52.00359344482420-0.72759300470352
2Asda52.00531500000000-0.72898600000000
3Co-op51.99034900000000-0.75381900000000
4Co-op52.00190700000000-0.77912200000000
\n", "
" ], "text/plain": [ " BusinessName Latitude Longitude\n", "0 Aldi 52.00415000000000 -0.79434300000000\n", "1 Aldi 52.00359344482420 -0.72759300470352\n", "2 Asda 52.00531500000000 -0.72898600000000\n", "3 Co-op 51.99034900000000 -0.75381900000000\n", "4 Co-op 52.00190700000000 -0.77912200000000" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import folium\n", "\n", "q='''\n", "SELECT DISTINCT BusinessName, Latitude, Longitude FROM ratingstable\n", "WHERE BusinessType LIKE '%supermarket%' AND PostCode LIKE 'MK_ %' AND Longitude NOT NULL;\n", "'''\n", "\n", "mk_supermarkets = pd.read_sql_query(q, con)\n", "\n", "mk_supermarkets.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmm... so we're not? Maybe my scraper is borked:-(\n", "\n", "Anyway, example of how to work with what we've got..." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BusinessName object\n", "Latitude object\n", "Longitude object\n", "dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mk_supermarkets.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cast the lat/lon to numerics:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "mk_supermarkets[['Latitude', 'Longitude']] = mk_supermarkets[['Latitude', 'Longitude']].apply(pd.to_numeric, errors='coerce')\n", "# We could probably alternatively CAST() in the query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the average location to center the map:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Latitude 52.012905\n", "Longitude -0.751281\n", "dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "AVERAGE_LOCATION = mk_supermarkets[['Latitude', 'Longitude']].mean()\n", "AVERAGE_LOCATION" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(AVERAGE_LOCATION, width=500, height=800, zoom_start=11)\n", "\n", "def add_marker(row, m):\n", " folium.Circle(location=[row['Latitude'], row['Longitude']],\n", " color = 'red', radius=50, fill=True, fill_opacity=1.0,\n", " tooltip=row['BusinessName']).add_to(m)\n", "\n", "mk_supermarkets.apply(add_marker, m=m, axis=1)\n", "\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Index Of Deprivation — Example\n", "\n", "The following is cribbed from https://blog.ouseful.info/2019/03/26/we-need-to-talk-about-geo/ and is a demo at an LA level. You'd probably want to go to lower geography (I'm not sure how low deprivation stats collection areas go? LSOA at least, I think...) but this is just a proof of concept.\n", "\n", "There are other examples in [Components for Rolling Your Own GIS Inside Jupyter Notebooks](https://blog.ouseful.info/2019/04/02/fragment-components-for-rolling-your-own-gis-inside-jupyter-notebooks/) showing how to create draggable search areas and run geo-queries; I've also previosuly done some [LSOA examples](https://github.com/psychemedia/crime-data-demo/blob/master/LSOA%20Sketches.ipynb). The [Police API demo](https://github.com/psychemedia/crime-data-demo/blob/master/Police%20API%20Demo.ipynb) has some other interesting examples, like plotting crimes along a route, which could be reversioned here to give a map of food outlets along a road, for example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Ingest\n", "Start by grabbing some data. There's possibly an API way somewhere for doing this, or you can do it manually by searching http://geoportal.statistics.gov.uk/ eg for LAs http://geoportal.statistics.gov.uk/datasets/1d78d47c87df4212b79fe2323aae8e08_0\n", "\n", "Get a shapefile from [ONS geoportal](http://geoportal.statistics.gov.uk):" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "import geopandas\n", " \n", "#From the downloads area of the page, grab the link for the shapefile download\n", "# This may take some time...\n", "url='https://opendata.arcgis.com/datasets/7ff28788e1e640de8150fb8f35703f6e_2.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D'\n", "gdf = geopandas.read_file(url)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get some deprivation data - there may be more recent data available; it'd also be neater if this were done via an API call to something, perhaps on https://opendatacommunities.org/ or similar:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['Notes', 'IMD', 'Income', 'Employment', 'Education', 'Health', 'Crime', 'Barriers', 'Living', 'IDACI', 'IDAOPI'])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015\n", "#File 10: local authority district summaries\n", "data_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/464464/File_10_ID2015_Local_Authority_District_Summaries.xlsx'\n", " \n", "#Download and read in the deprivation data Excel file\n", "df = pd.read_excel(data_url, sheet_name=None)\n", " \n", "#Preview the name of the sheets in the data loaded from the Excel file\n", "df.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Merge boundaries and deprivation data:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "#Merge in data\n", "gdf = pd.merge(gdf, df['Education'],\n", " how='inner', #The type of join (what happens if data is in one dataset and not the other)\n", " left_on='lad16cd', #Column we're merging on in left dataframe\n", " right_on='Local Authority District code (2013)'#Column we're merging on in right dataframe\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Preview the merged data:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
objectidlad16cdlad16nmlad16nmwbng_ebng_nlonglatst_areashast_lengthsgeometryLocal Authority District code (2013)Local Authority District name (2013)Education, Skills and Training - Average rankEducation, Skills and Training - Rank of average rankEducation, Skills and Training - Average scoreEducation, Skills and Training - Rank of average scoreEducation, Skills and Training - Proportion of LSOAs in most deprived 10% nationallyEducation, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally
01E06000001HartlepoolNone447157531476-1.2702354.6761599.359786e+0769382.685924MULTIPOLYGON (((447097.001 537152.001, 447228....E06000001Hartlepool20101.487230.510470.206937
12E06000002MiddlesbroughNone451141516887-1.2109954.5446705.387900e+0742085.584812MULTIPOLYGON (((449861.900 521260.700, 449853....E06000002Middlesbrough22728.012440.64030.44191
23E06000003Redcar and ClevelandNone464359519597-1.0061154.5675202.448388e+0896189.660709MULTIPOLYGON (((455776.701 528322.499, 455684....E06000003Redcar and Cleveland19185.289527.875710.181854
34E06000004Stockton-on-TeesNone444937518183-1.3066954.5569112.049366e+08115439.477112MULTIPOLYGON (((444126.099 528005.799, 444165....E06000004Stockton-on-Tees16660.0915024.6371100.175059
45E06000005DarlingtonNone428029515649-1.5683554.5353511.974820e+08105799.568559POLYGON ((423475.701 524731.596, 423497.204 52...E06000005Darlington16385.0615522.5691290.138575
\n", "
" ], "text/plain": [ " objectid lad16cd lad16nm lad16nmw bng_e bng_n \\\n", "0 1 E06000001 Hartlepool None 447157 531476 \n", "1 2 E06000002 Middlesbrough None 451141 516887 \n", "2 3 E06000003 Redcar and Cleveland None 464359 519597 \n", "3 4 E06000004 Stockton-on-Tees None 444937 518183 \n", "4 5 E06000005 Darlington None 428029 515649 \n", "\n", " long lat st_areasha st_lengths \\\n", "0 -1.27023 54.676159 9.359786e+07 69382.685924 \n", "1 -1.21099 54.544670 5.387900e+07 42085.584812 \n", "2 -1.00611 54.567520 2.448388e+08 96189.660709 \n", "3 -1.30669 54.556911 2.049366e+08 115439.477112 \n", "4 -1.56835 54.535351 1.974820e+08 105799.568559 \n", "\n", " geometry \\\n", "0 MULTIPOLYGON (((447097.001 537152.001, 447228.... \n", "1 MULTIPOLYGON (((449861.900 521260.700, 449853.... \n", "2 MULTIPOLYGON (((455776.701 528322.499, 455684.... \n", "3 MULTIPOLYGON (((444126.099 528005.799, 444165.... \n", "4 POLYGON ((423475.701 524731.596, 423497.204 52... \n", "\n", " Local Authority District code (2013) Local Authority District name (2013) \\\n", "0 E06000001 Hartlepool \n", "1 E06000002 Middlesbrough \n", "2 E06000003 Redcar and Cleveland \n", "3 E06000004 Stockton-on-Tees \n", "4 E06000005 Darlington \n", "\n", " Education, Skills and Training - Average rank \\\n", "0 20101.48 \n", "1 22728.01 \n", "2 19185.28 \n", "3 16660.09 \n", "4 16385.06 \n", "\n", " Education, Skills and Training - Rank of average rank \\\n", "0 72 \n", "1 24 \n", "2 95 \n", "3 150 \n", "4 155 \n", "\n", " Education, Skills and Training - Average score \\\n", "0 30.510 \n", "1 40.640 \n", "2 27.875 \n", "3 24.637 \n", "4 22.569 \n", "\n", " Education, Skills and Training - Rank of average score \\\n", "0 47 \n", "1 3 \n", "2 71 \n", "3 110 \n", "4 129 \n", "\n", " Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally \\\n", "0 0.2069 \n", "1 0.4419 \n", "2 0.1818 \n", "3 0.1750 \n", "4 0.1385 \n", "\n", " Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally \n", "0 37 \n", "1 1 \n", "2 54 \n", "3 59 \n", "4 75 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Example of plotting boundary overlaid on previous map (this will probably clobber the tooltip/popup; choropleth should be plotted first or on a lower layer):" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.7/site-packages/pyproj/crs.py:77: FutureWarning: '+init=:' syntax is deprecated. ':' is the preferred initialization method.\n", " return _prepare_from_string(\" \".join(pjargs))\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "folium.Choropleth(gdf[gdf['lad16nm']=='Milton Keynes'], key_on='feature.properties.lad16cd',\n", " data=df['Education'],\n", " columns=['Local Authority District code (2013)',\n", " 'Education, Skills and Training - Rank of average rank'],\n", " fill_color='YlOrBr').add_to(m)\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the start of a simple recipe for finding areas that neighbour a particular area:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Via https://gis.stackexchange.com/a/300262/119781\n", " \n", "def plotNeighbours(gdf, region='Milton Keynes',\n", " indicator='Education, Skills and Training - Rank of average rank',\n", " cmap='OrRd'):\n", " ''' Plot choropleth for an indicator relative to a specified region and its neighbours. '''\n", " \n", " targetBoundary = gdf[gdf['lad16nm']==region]['geometry'].values[0]\n", " neighbours = gdf.apply(lambda row: row['geometry'].touches(targetBoundary) or row['geometry']==targetBoundary ,\n", " axis=1)\n", " # neighbours is a gdf that contains areas that neighbour the specified area\n", " # Now plot these etc..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tidy Up\n", "\n", "Close the database connection:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "con.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }