{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Explore places associated with collection objects\n", "\n", "In this notebook we'll explore the spatial dimensions of the `object` data. Where were objects created or collected? To do that we'll extract the nested spatial data, see what's there, and create a few maps.\n", "\n", "[See here](exploring_object_records.ipynb) for an introduction to the `object` data, and [here to explore objects over time](explore_collection_object_over_time.ipynb).\n", "\n", "If you haven't already, you'll either need to [harvest the `object` data](harvest_records.ipynb), or [unzip a pre-harvested dataset](unzip_preharvested_data.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "\n", "

Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.

\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import what we need" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from ipyleaflet import Map, Marker, Popup, MarkerCluster, basemap_to_tiles, CircleMarker\n", "import ipywidgets as widgets\n", "from tinydb import TinyDB, Query\n", "import reverse_geocode\n", "from pandas import json_normalize\n", "import altair as alt\n", "from IPython.display import display, HTML, FileLink\n", "from vega_datasets import data as vega_data\n", "import country_converter as coco" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the harvested data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Get the JSON data from the local db\n", "db = TinyDB('nma_object_db.json')\n", "records = db.all()\n", "Object = Query()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Convert to a dataframe\n", "df = pd.DataFrame(records)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How many different places are referred to in object records?\n", "\n", "Places are linked to object records through the `spatial` field. One object record can be linked to multiple places. Let's get a list of all the places linked to object records.\n", "\n", "First we'll use `json_normalize()` to extract the nested lists in the `spatial` field, creating one row for every linked place." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spatial_idspatial_typespatial_titlespatial_roleNamespatial_interactionTypespatial_geospatial_descriptionid
0693PlaceErnabella, South Australia, AustraliaPlace madeProduction-26.2642,132.176NaN124081
11187PlaceMandiupiNaNProductionNaNNaN20174
2333PlaceBroken Hill, New South Wales, AustraliaPlace collectedNaNNaNNaN188741
34600PlaceCanbbage Tree Island Public School, Cabbage Tr...Place createdProduction-28.9842,153.457NaN42084
480126PlaceFranceAssociated placeNaNNaNNaN148323
\n", "
" ], "text/plain": [ " spatial_id spatial_type spatial_title \\\n", "0 693 Place Ernabella, South Australia, Australia \n", "1 1187 Place Mandiupi \n", "2 333 Place Broken Hill, New South Wales, Australia \n", "3 4600 Place Canbbage Tree Island Public School, Cabbage Tr... \n", "4 80126 Place France \n", "\n", " spatial_roleName spatial_interactionType spatial_geo \\\n", "0 Place made Production -26.2642,132.176 \n", "1 NaN Production NaN \n", "2 Place collected NaN NaN \n", "3 Place created Production -28.9842,153.457 \n", "4 Associated place NaN NaN \n", "\n", " spatial_description id \n", "0 NaN 124081 \n", "1 NaN 20174 \n", "2 NaN 188741 \n", "3 NaN 42084 \n", "4 NaN 148323 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places = json_normalize(df.loc[df['spatial'].notnull()].to_dict('records'), record_path='spatial', meta=['id'], record_prefix='spatial_')\n", "df_places.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This list will include many duplicates as more than one object will be linked to a particular place. Let's drop duplicates based on the `spatial_id` and count how many there are." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3336" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places.drop_duplicates(subset=['spatial_id']).shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's put the places on a map. First, we'll filter the records to show only those that have geo-coordinates, and then remove the duplicates as before." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "df_places_with_geo = df_places.loc[df_places['spatial_geo'].notnull()].drop_duplicates(subset=['spatial_id'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might have noticed that the `spatial_geo` field contains the latitude and longitude, separated by a comma. Let's split the coordinates into separate columns." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spatial_idspatial_typespatial_titlespatial_roleNamespatial_interactionTypespatial_geospatial_descriptionidlatlon
0693PlaceErnabella, South Australia, AustraliaPlace madeProduction-26.2642,132.176NaN124081-26.2642132.176
34600PlaceCanbbage Tree Island Public School, Cabbage Tr...Place createdProduction-28.9842,153.457NaN42084-28.9842153.457
1180019PlaceCentral Australia, Northern Territory, AustraliaNaNProduction-24.3617,133.735NaN6840-24.3617133.735
154329PlaceMount Hagen, Western Highlands Province, Papua...Place madeProduction-5.8581,144.243NaN203086-5.8581144.243
261883PlaceTasmania, AustraliaNaNProduction-41.9253,146.497NaN70975-41.9253146.497
\n", "
" ], "text/plain": [ " spatial_id spatial_type spatial_title \\\n", "0 693 Place Ernabella, South Australia, Australia \n", "3 4600 Place Canbbage Tree Island Public School, Cabbage Tr... \n", "11 80019 Place Central Australia, Northern Territory, Australia \n", "15 4329 Place Mount Hagen, Western Highlands Province, Papua... \n", "26 1883 Place Tasmania, Australia \n", "\n", " spatial_roleName spatial_interactionType spatial_geo \\\n", "0 Place made Production -26.2642,132.176 \n", "3 Place created Production -28.9842,153.457 \n", "11 NaN Production -24.3617,133.735 \n", "15 Place made Production -5.8581,144.243 \n", "26 NaN Production -41.9253,146.497 \n", "\n", " spatial_description id lat lon \n", "0 NaN 124081 -26.2642 132.176 \n", "3 NaN 42084 -28.9842 153.457 \n", "11 NaN 6840 -24.3617 133.735 \n", "15 NaN 203086 -5.8581 144.243 \n", "26 NaN 70975 -41.9253 146.497 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places_with_geo[['lat', 'lon']] = df_places_with_geo['spatial_geo'].str.split(',', expand=True)\n", "df_places_with_geo.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, let's make a map!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This loads the country boundaries data\n", "countries = alt.topo_feature(vega_data.world_110m.url, feature='countries')\n", "\n", "# First we'll create the world map using the boundaries\n", "background = alt.Chart(countries).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").project('equirectangular').properties(width=700)\n", "\n", "# Then we'll plot the positions of places using circles\n", "points = alt.Chart(df_places_with_geo).mark_circle(\n", " \n", " # Style the circles\n", " size=10,\n", " color='steelblue'\n", ").encode(\n", " \n", " # Provide the coordinates\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # More info on hover\n", " tooltip=[alt.Tooltip('spatial_title', title='Place')]\n", ").properties(width=700)\n", "\n", "# Finally we layer the plotted points on top of the backgroup map\n", "alt.layer(background, points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What's missing?\n", "\n", "In order to put the places on a map, we filtered out places that didn't have geo-coordinates. But how many of the linked places have coordinates?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'32.22% of linked places have geo-coordinates'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'{:.2%} of linked places have geo-coordinates'.format(df_places_with_geo.shape[0] / df_places.drop_duplicates(subset=['spatial_id']).shape[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm, so a majority of linked places are actually missing from our map. Let's dig a bit deeper into the `spatial` records to see if we can work out why there are only geo-coordinates for some records." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Relationships to places\n", "\n", "The relationships between places and objects are described in the `spatial_roleName` column. Let's see what's in there." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Place collected 14604\n", "Associated place 12293\n", "Place made 5516\n", "Place depicted 4037\n", "Place of event 2158\n", "Place created 1858\n", "Place used 1828\n", "Place of use 1554\n", "Subject 1298\n", "Place of publication 1276\n", "Place printed 614\n", "Place of issue 605\n", "Place of production 576\n", "Place photographed 488\n", "Place worn 364\n", "Place compiled 205\n", "Place written 174\n", "Content created 147\n", "Place designed 122\n", "Place of execution 113\n", "Place Made 66\n", "Place purchased 57\n", "Place of restoration 13\n", "Place of component manufacture 10\n", "Place of Origin 9\n", "place made 6\n", "Associated Place 5\n", "Place assembled 5\n", "Place of conversion 4\n", "Place of death 4\n", "Place of birth 2\n", "Place Collected 2\n", "Place of Publication 1\n", "Place of Execution 1\n", "Place of Use 1\n", "place of Publication 1\n", "Name: spatial_roleName, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places['spatial_roleName'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see there's quite a few variations in format and capitalisation which makes it hard to aggregate. Fortunately the NMA has already applied some normalisation, grouping together all of the relationships that relate to creation or production. These are identified by the value 'Production' in the `interactionType` field. Let's see which of the `roleName` values are aggregated by `interactionType`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Place made 5516\n", "Place created 1858\n", "Place of publication 1276\n", "Place printed 614\n", "Place of production 576\n", "Place of issue 543\n", "Place photographed 488\n", "Place compiled 205\n", "Place written 174\n", "Content created 147\n", "Place designed 122\n", "Place of execution 113\n", "Place Made 66\n", "Place of restoration 13\n", "Place of component manufacture 10\n", "place made 6\n", "Place assembled 5\n", "Place of conversion 4\n", "place of Publication 1\n", "Place of Execution 1\n", "Place of Publication 1\n", "Name: spatial_roleName, dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places.loc[(df_places['spatial_interactionType'] == 'Production')]['spatial_roleName'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many of the places relate to production?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16831" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places.loc[(df_places['spatial_interactionType'] == 'Production')].shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at the numbers above, you might think that the counts by `roleName` don't seem to add up to the total number with `interactionType` set to 'Production'. Let's check by finding the number of 'Production' records that have no `roleName`." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5092" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places.loc[(df_places['spatial_interactionType'] == 'Production') & (df_places['spatial_roleName'].isnull())].shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, quite a lot of the places with a 'Production' relationship have no `roleName`. Let's look at a few." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spatial_idspatial_typespatial_titlespatial_roleNamespatial_interactionTypespatial_geospatial_descriptionid
11187PlaceMandiupiNaNProductionNaNNaN20174
1180019PlaceCentral Australia, Northern Territory, AustraliaNaNProduction-24.3617,133.735NaN6840
24621PlaceDjinmalinjera, Northern Territory, AustraliaNaNProductionNaNNaN19877
261883PlaceTasmania, AustraliaNaNProduction-41.9253,146.497NaN70975
3720PlaceDarwin, Northern Territory, AustraliaNaNProduction-12.45,130.83NaN213694
\n", "
" ], "text/plain": [ " spatial_id spatial_type spatial_title \\\n", "1 1187 Place Mandiupi \n", "11 80019 Place Central Australia, Northern Territory, Australia \n", "24 621 Place Djinmalinjera, Northern Territory, Australia \n", "26 1883 Place Tasmania, Australia \n", "37 20 Place Darwin, Northern Territory, Australia \n", "\n", " spatial_roleName spatial_interactionType spatial_geo \\\n", "1 NaN Production NaN \n", "11 NaN Production -24.3617,133.735 \n", "24 NaN Production NaN \n", "26 NaN Production -41.9253,146.497 \n", "37 NaN Production -12.45,130.83 \n", "\n", " spatial_description id \n", "1 NaN 20174 \n", "11 NaN 6840 \n", "24 NaN 19877 \n", "26 NaN 70975 \n", "37 NaN 213694 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_places.loc[(df_places['spatial_interactionType'] == 'Production') & (df_places['spatial_roleName'].isnull())].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm, that seems rather odd, but it shouldn't affect us too much. It just makes you wonder how those 'Production' values were set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Created vs Collected\n", "\n", "So according to the data above, it seems we have two major ways of categorising the relationships between places and objects. We can filter the `roleName` field by 'Place collected', or we can filter `interactionType` by 'Production'. Is there any overlap between these two groups?" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many records have both an interactionType equal to 'Production' and a roleName equal to 'Place collected'?\n", "df_places.loc[(df_places['spatial_interactionType'] == 'Production') & (df_places['spatial_roleName'] == 'Place collected')].shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nope, no overlap. These two groups don't capture all of the place relationships, but they do represent distinct types of relationships and are roughly equal in size. But before we start making more maps, let's see how many places in each group have geo-coordinates.\n", "\n", "First the 'created' places:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "16546 of 16831 places (98.31%) with a \"created\" relationship have coordinates\n" ] } ], "source": [ "created_count = df_places.loc[(df_places['spatial_interactionType'] == 'Production')].shape[0]\n", "created_geo_count = df_places.loc[(df_places['spatial_interactionType'] == 'Production') & (df_places['spatial_geo'].notnull())].shape[0]\n", "print('{} of {} places ({:.2%}) with a \"created\" relationship have coordinates'.format(created_geo_count, created_count, created_geo_count / created_count))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the 'collected' places:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 of 14604 places (0.00%) with a \"collected\" relationship have coordinates\n" ] } ], "source": [ "collected_count = df_places.loc[(df_places['spatial_roleName'] == 'Place collected')].shape[0]\n", "collected_geo_count = df_places.loc[(df_places['spatial_roleName'] == 'Place collected') & (df_places['spatial_geo'].notnull())].shape[0]\n", "print('{} of {} places ({:.2%}) with a \"collected\" relationship have coordinates'.format(collected_geo_count, collected_count, collected_geo_count / collected_count))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok... So in answer to our question above about what's missing, it seems that only places with a 'created' relationship have geo-coordinates. Let's see if we can fix that so we can map both 'created' and 'collected' records." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enriching the place data\n", "\n", "As well as the place data that's embedded in object records, the NMA provides access to all of the place records in its system. These [can be harvested](harvest_records.ipynb) from the `/place` endpoint. Assuming that you've harvested all the place records, we can now use them to enrich the object records.\n", "\n", "First we'll load all the places records." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "db_places = TinyDB('nma_place_db.json')\n", "place_records = db_places.all()\n", "df_all_places = pd.DataFrame(place_records)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're going to merge the place records with the object records, but before we do that, let's see if we can add information about country to the records.\n", "\n", "The `spatial_title` field is a string that often (but not always) includes the country as well as the place name. But it doesn't seem like a reliable way of identifying countries. An alternative is to use the geo-coordinates. Through a process known as reverse-geocoding, we can lookup the country that contains a set of coordinates. To do this we're going to use the [reverse-geocode package](https://pypi.org/project/reverse-geocode/)." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def find_country(row):\n", " '''\n", " Use reverse-geocode to get country details for a set of coordinates.\n", " '''\n", " try:\n", " coords = tuple([float(c) for c in row['geo'].split(',')]),\n", " location = reverse_geocode.search(coords)\n", " country = [location[0]['country_code'], location[0]['country']]\n", " except AttributeError:\n", " country = []\n", " return pd.Series(country, dtype='object')\n", " \n", "df_all_places[['country_code', 'country']] = df_all_places.apply(find_country, axis=1) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Did it work? Let's look at the `country` values." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Australia 2973\n", "United Kingdom 185\n", "United States 139\n", "Papua New Guinea 77\n", "Italy 38\n", " ... \n", "Cambodia 1\n", "Afghanistan 1\n", "Sudan 1\n", "Mozambique 1\n", " 1\n", "Name: country, Length: 126, dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_all_places['country'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now it's time to merge the place data we extracted from the object records, with the complete set of place records. By linking records using the place `id`, we can append the information from the place records to the object records." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spatial_idspatial_typespatial_titlespatial_roleNamespatial_interactionTypespatial_geospatial_descriptionid_xid_ytypetitlegeocountry_codecountry
0693PlaceErnabella, South Australia, AustraliaPlace madeProduction-26.2642,132.176NaN124081693placeErnabella, South Australia, Australia-26.2642,132.176AUAustralia
11187PlaceMandiupiNaNProductionNaNNaN201741187placeMandiupiNaNNaNNaN
2333PlaceBroken Hill, New South Wales, AustraliaPlace collectedNaNNaNNaN188741333placeBroken Hill, New South Wales, Australia-31.95,141.45AUAustralia
34600PlaceCanbbage Tree Island Public School, Cabbage Tr...Place createdProduction-28.9842,153.457NaN420844600placeCanbbage Tree Island Public School, Cabbage Tr...-28.9842,153.457AUAustralia
480126PlaceFranceAssociated placeNaNNaNNaN14832380126placeFrance46.5592,2.2742FRFrance
\n", "
" ], "text/plain": [ " spatial_id spatial_type spatial_title \\\n", "0 693 Place Ernabella, South Australia, Australia \n", "1 1187 Place Mandiupi \n", "2 333 Place Broken Hill, New South Wales, Australia \n", "3 4600 Place Canbbage Tree Island Public School, Cabbage Tr... \n", "4 80126 Place France \n", "\n", " spatial_roleName spatial_interactionType spatial_geo \\\n", "0 Place made Production -26.2642,132.176 \n", "1 NaN Production NaN \n", "2 Place collected NaN NaN \n", "3 Place created Production -28.9842,153.457 \n", "4 Associated place NaN NaN \n", "\n", " spatial_description id_x id_y type \\\n", "0 NaN 124081 693 place \n", "1 NaN 20174 1187 place \n", "2 NaN 188741 333 place \n", "3 NaN 42084 4600 place \n", "4 NaN 148323 80126 place \n", "\n", " title geo \\\n", "0 Ernabella, South Australia, Australia -26.2642,132.176 \n", "1 Mandiupi NaN \n", "2 Broken Hill, New South Wales, Australia -31.95,141.45 \n", "3 Canbbage Tree Island Public School, Cabbage Tr... -28.9842,153.457 \n", "4 France 46.5592,2.2742 \n", "\n", " country_code country \n", "0 AU Australia \n", "1 NaN NaN \n", "2 AU Australia \n", "3 AU Australia \n", "4 FR France " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Merging on the place id in each dataframe -- in the objects data it's 'spatial_id', in the places it's just 'id'\n", "df_places_merged = pd.merge(df_places, df_all_places, how='left', left_on='spatial_id', right_on='id')\n", "df_places_merged.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The point of this was to try and get geo-cordinates for more of the places in the object records. Let's see if it worked by repeating our check on 'collected' places. Note that the appended field is `geo` rather than `spatial_geo`." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14512 of 14604 places (99.37%) with a \"collected\" relationship have coordinates\n" ] } ], "source": [ "collected_count = df_places_merged.loc[(df_places_merged['spatial_roleName'] == 'Place collected')].shape[0]\n", "collected_geo_count = df_places_merged.loc[(df_places_merged['spatial_roleName'] == 'Place collected') & (df_places_merged['geo'].notnull())].shape[0]\n", "print('{} of {} places ({:.2%}) with a \"collected\" relationship have coordinates'.format(collected_geo_count, collected_count, collected_geo_count / collected_count))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Huzzah! Now we have geo-coordinates for almost all of the 'collected' places. Let's split the `geo` field into lats and lons as before." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "df_places_merged[['lat', 'lon']] = df_places_merged['geo'].str.split(',', expand=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objects by country\n", "\n", "Now that we have a `country_code` column we can use it to filter our data. For example, let's look at places where objects were created in Australia.\n", "\n", "First we'll filter our data by `interactionType` and `country_code`." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "df_created_aus = df_places_merged.loc[(df_places_merged['spatial_interactionType'] == 'Production') & (df_places_merged['country_code'] == 'AU')]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spatial_titlelatlon
0Ernabella, South Australia, Australia-26.2642132.176
3Canbbage Tree Island Public School, Cabbage Tr...-28.9842153.457
11Central Australia, Northern Territory, Australia-24.3617133.735
26Tasmania, Australia-41.9253146.497
27Melville Island, Tiwi Islands, Northern Territ...-11.55130.93
............
59568Sandy Blight Junction, Northern Territory, Aus...-23.1925129.56
59582Merigal, New South Wales, Australia-31.5025148.262
59697Hampden, South Australia, Australia-34.15139.05
59830Horn (Ngurupai) Island, Torres Strait, Queensl...-10.6069142.29
60110Morley, Perth, Western Australia, Australia-31.8872115.907
\n", "

789 rows × 3 columns

\n", "
" ], "text/plain": [ " spatial_title lat lon\n", "0 Ernabella, South Australia, Australia -26.2642 132.176\n", "3 Canbbage Tree Island Public School, Cabbage Tr... -28.9842 153.457\n", "11 Central Australia, Northern Territory, Australia -24.3617 133.735\n", "26 Tasmania, Australia -41.9253 146.497\n", "27 Melville Island, Tiwi Islands, Northern Territ... -11.55 130.93\n", "... ... ... ...\n", "59568 Sandy Blight Junction, Northern Territory, Aus... -23.1925 129.56\n", "59582 Merigal, New South Wales, Australia -31.5025 148.262\n", "59697 Hampden, South Australia, Australia -34.15 139.05\n", "59830 Horn (Ngurupai) Island, Torres Strait, Queensl... -10.6069 142.29\n", "60110 Morley, Perth, Western Australia, Australia -31.8872 115.907\n", "\n", "[789 rows x 3 columns]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_created_aus[['spatial_title', 'lat', 'lon']].drop_duplicates()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can create a map. Note that we're changing the map layer in this chart to use just Australian boundaries, not the world." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# remove duplicate places\n", "places = df_created_aus[['spatial_title', 'lat', 'lon']].drop_duplicates()\n", "\n", "# Load Australian boundaries\n", "australia = alt.topo_feature('https://raw.githubusercontent.com/GLAM-Workbench/trove-newspapers/master/data/aus_state.geojson', feature='features')\n", "\n", "# Create the map of Australia using the boundaries\n", "aus_background = alt.Chart(australia).mark_geoshape(\n", " \n", " # Style the map\n", " fill='lightgray',\n", " stroke='white'\n", ").project('equirectangular').properties(width=700)\n", "\n", "# Plot the places\n", "points = alt.Chart(places).mark_circle(\n", " \n", " # Style circle markers\n", " size=10,\n", " color='steelblue'\n", ").encode(\n", " \n", " # Set position of each place using lat and lon\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # More details on hover\n", " tooltip=[alt.Tooltip('spatial_title', title='Place'), 'lat', 'lon']\n", ").properties(width=700)\n", "\n", "# Combine map and points\n", "alt.layer(aus_background, points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm, what's with all that white space at the bottom? If you look closely, you'll see one blue dot right at the bottom of the chart. It's Commonwealth Bay in the Australian Antarctic Territory – technically part of Australia, but perhaps not what we expected. If we want our map centred on the Australian continent, we can filter out points with a latitude of less than -50.\n", "\n", "Pandas is fussy about comparing different types of things, so let's make sure it knows that the `lat` field contains floats." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "df_places_merged['lat'] = df_places_merged['lat'].astype('float')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can filter the data." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "df_created_aus = df_places_merged.loc[(df_places_merged['spatial_interactionType'] == 'Production') & (df_places_merged['country_code'] == 'AU') & (df_places_merged['lat'] > -50)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And update our chart." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remove duplicate places\n", "places = df_created_aus[['spatial_title', 'lat', 'lon']].drop_duplicates()\n", "\n", "# Plot the places\n", "points = alt.Chart(places).mark_circle(\n", " \n", " # Style circle markers\n", " size=10,\n", " color='steelblue'\n", ").encode(\n", " \n", " # Set position of each place using lat and lon\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # More details on hover\n", " tooltip=[alt.Tooltip('spatial_title', title='Place'), 'lat', 'lon']\n", ").properties(width=700)\n", "\n", "# Combine map and points\n", "alt.layer(aus_background, points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Created vs Collected – second try\n", "\n", "Now we have locations for the 'collected' records we can put both 'created' and 'collected' on a map.\n", "\n", "To make things a bit easier, let's create a new column which will indicate if the place relationship is either 'collected' or 'created'." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "def add_place_status(row):\n", " '''\n", " Determine relationship between object and place.\n", " '''\n", " if row['spatial_interactionType'] == 'Production':\n", " status = 'created'\n", " elif str(row['spatial_roleName']).lower() == 'place collected':\n", " status = 'collected'\n", " else:\n", " status = None\n", " return status\n", "\n", "# Add a new column to the dataframe showing the relationship between place and object\n", "df_places_merged['place_relation'] = df_places_merged.apply(add_place_status, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll also filter out places without coordinates." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "df_places_merged_with_geo = df_places_merged.loc[(df_places_merged['geo'].notnull()) & (df_places_merged['place_relation'].notnull())]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And remove duplicates, based on both `spatial_id` and the new `place_relation` field." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "df_places_merged_with_geo = df_places_merged_with_geo.copy().drop_duplicates(subset=['spatial_id', 'place_relation'])" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "background = alt.Chart(countries).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").project('equirectangular').properties(width=700)\n", "\n", "# Plot the places\n", "points = alt.Chart(df_places_merged_with_geo).mark_circle(\n", " size=10,\n", ").encode(\n", " # Plot places by lat and lon\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # Details on hover\n", " tooltip=[alt.Tooltip('spatial_title', title='Place')],\n", " \n", " # Color will show whether 'collected' or 'created'\n", " color=alt.Color('place_relation:N', legend=alt.Legend(title='Relationship to place'))\n", ").properties(width=700)\n", "\n", "# Combine map and points\n", "alt.layer(background, points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Number of objects associated with places\n", "\n", "So far we've only looked at the places themselves, but we can also find out how many objects are associated with each place. To do this, we'll create separate dataframes for our 'created' and 'collected' places, then we'll group them by place and count the number of grouped objects.\n", "\n", "First we'll filter out places without coordinates or place relations." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# Filter places\n", "df_created_collected = df_places_merged.loc[(df_places_merged['geo'].notnull()) & (df_places_merged['place_relation'].notnull())]" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "# New df with created places\n", "df_created = df_created_collected.loc[df_created_collected['place_relation'] == 'created']\n", "\n", "# Group created places by place fields -- count the number of grouped records, then convert the results to a new dataframe\n", "df_created_groups = df_created.groupby(by=['spatial_id', 'spatial_title', 'place_relation', 'lat', 'lon'])['id_x'].count().to_frame().reset_index().rename({'id_x': 'count'}, axis=1)\n", "\n", "# New df with collected places\n", "df_collected = df_created_collected.loc[df_created_collected['place_relation'] == 'collected']\n", "\n", "# Group collected places by place fields -- count the number of grouped records, then convert the results to a new dataframe\n", "df_collected_groups = df_collected.groupby(by=['spatial_id', 'spatial_title', 'place_relation', 'lat', 'lon'])['id_x'].count().to_frame().reset_index().rename({'id_x': 'count'}, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have new dataframes with counts by place and relationship, let's peek inside one." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spatial_idspatial_titleplace_relationlatloncount
01Yogjakarta, Java, Indonesiacreated-7.7956110.3692
110Point Puer, Carnarvon Bay, Tasmania, Australiacreated-43.1500147.862
2100Amoonguna, Northern Territory, Australiacreated-23.7700133.931
31001Government House, Yarralumla, Canberra, Austra...created-35.3014149.0772
41003Kerang, Victoria, Australiacreated-35.7200143.921
\n", "
" ], "text/plain": [ " spatial_id spatial_title \\\n", "0 1 Yogjakarta, Java, Indonesia \n", "1 10 Point Puer, Carnarvon Bay, Tasmania, Australia \n", "2 100 Amoonguna, Northern Territory, Australia \n", "3 1001 Government House, Yarralumla, Canberra, Austra... \n", "4 1003 Kerang, Victoria, Australia \n", "\n", " place_relation lat lon count \n", "0 created -7.7956 110.369 2 \n", "1 created -43.1500 147.86 2 \n", "2 created -23.7700 133.93 1 \n", "3 created -35.3014 149.077 2 \n", "4 created -35.7200 143.92 1 " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_created_groups.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can map the results! In this case we'll create two maps, but display them combined with a single legend." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.VConcatChart(...)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create the world map\n", "background = alt.Chart(countries).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").project('equirectangular').properties(width=700)\n", "\n", "# First we'll plot the created places\n", "points = alt.Chart(df_created_groups).mark_circle().encode(\n", " \n", " # Position the circles\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # Hover for more details\n", " tooltip=[alt.Tooltip('count:Q', title='Number of objects'), alt.Tooltip('spatial_title', title='Place')],\n", " \n", " # Color shows the relationship types\n", " color=alt.Color('place_relation:N', title='Relationship to place'),\n", " \n", " # The size of the circles is determined by the number of objects\n", " size=alt.Size('count:Q',\n", " scale=alt.Scale(range=[0, 1000]),\n", " legend=alt.Legend(title='Number of objects')\n", " )\n", ").properties(width=700)\n", "\n", "# Create a map by combining the background map and the points\n", "created = alt.layer(background, points)\n", "\n", "# Now we'll plot the collected places\n", "points = alt.Chart(df_collected_groups).mark_circle().encode(\n", " \n", " # Position the circles\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # Hover for more details\n", " tooltip=[alt.Tooltip('count:Q', title='Number of objects'), alt.Tooltip('spatial_title', title='Place')],\n", " \n", " # Color shows the relationship types\n", " color=alt.Color('place_relation:N', title='Relationship to place'),\n", " \n", " # The size of the circles is determined by the number of objects\n", " size=alt.Size('count:Q',\n", " scale=alt.Scale(range=[0, 1000]),\n", " legend=alt.Legend(title='Number of objects')\n", " )\n", ").properties(width=700)\n", "\n", "# Create a map by combining the background map and the points\n", "collected = alt.layer(background, points)\n", "\n", "# Display the two maps together with a single legend.\n", "created & collected" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why were 243 objects collected in Namibia?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Focus on Australia\n", "\n", "As we did above we can zoom in on Australia by using the `country_code` to filter the data. As before, we then group by place and count the number of objects in each group." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "# Filter by created places and limit country to AU\n", "df_created_aus = df_created_collected.loc[(df_created_collected['place_relation'] == 'created') & (df_created_collected['country_code'] == 'AU')& (df_created_collected['lat'] > -50)]\n", "\n", "# Group created places by place fields -- count the number of grouped records, then convert the results to a new dataframe\n", "df_created_aus_groups = df_created_aus.groupby(by=['spatial_id', 'spatial_title', 'place_relation', 'lat', 'lon'])['id_x'].count().to_frame().reset_index().rename({'id_x': 'count'}, axis=1)\n", "\n", "# Filter by collected places and limit country to AU\n", "df_collected_aus = df_created_collected.loc[(df_created_collected['place_relation'] == 'collected') & (df_created_collected['country_code'] == 'AU') & (df_created_collected['lat'] > -50)]\n", "\n", "# Group collected places by place fields -- count the number of grouped records, then convert the results to a new dataframe\n", "df_collected_aus_groups = df_collected_aus.groupby(by=['spatial_id', 'spatial_title', 'place_relation', 'lat', 'lon'])['id_x'].count().to_frame().reset_index().rename({'id_x': 'count'}, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can map the results! As above, we'll create two maps, but display them combined with a single legend." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.VConcatChart(...)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make map of Australia\n", "aus_background = alt.Chart(australia).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").project('equirectangular').properties(width=700)\n", "\n", "# Plot points for created places\n", "points = alt.Chart(df_created_aus_groups).mark_circle().encode(\n", " \n", " # Postion the markers\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # More detail on hover\n", " tooltip=[alt.Tooltip('count:Q', title='Number of objects'), alt.Tooltip('spatial_title', title='Place')],\n", " \n", " # Color determined by relationship type\n", " color=alt.Color('place_relation:N', title='Relationship to place'),\n", " \n", " # Size determined by the number of objects\n", " size=alt.Size('count:Q',\n", " scale=alt.Scale(range=[0, 1000]),\n", " legend=alt.Legend(title='Number of objects')\n", " )\n", ").properties(width=700)\n", "\n", "# Create a map by combining background and points\n", "created_aus = alt.layer(aus_background, points)\n", "\n", "# Plot points for collected places\n", "points = alt.Chart(df_collected_aus_groups).mark_circle().encode(\n", "\n", " # Postion the markers\n", " longitude='lon:Q',\n", " latitude='lat:Q',\n", " \n", " # More detail on hover\n", " tooltip=[alt.Tooltip('count:Q', title='Number of objects'), alt.Tooltip('spatial_title', title='Place')],\n", " \n", " # Color determined by relationship type\n", " color=alt.Color('place_relation:N', title='Relationship to place'),\n", " \n", " # Size determined by the number of objects\n", " size=alt.Size('count:Q',\n", " scale=alt.Scale(range=[0, 1000]),\n", " legend=alt.Legend(title='Number of objects')\n", " )\n", ").properties(width=700)\n", "\n", "# Create a map by combining background and points\n", "collected_aus = alt.layer(aus_background, points)\n", "\n", "# Display the two maps together with a single legend.\n", "created_aus & collected_aus" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Number of objects by country\n", "\n", "Finally let's look at the number of objects created/collected in each country. In this case we'll group by `country_code` rather than place `id`. Note that because the world boundaries use ISO numeric ids for countries, we have to use [country_converter](https://pypi.org/project/country_converter/) to get the numeric id for each country code and add it to our dataframe. We'll use this field to link the data and the map." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "# Group created places by country and count the results\n", "df_created_country_groups = df_created.groupby(by=['country_code', 'country'])['id_x'].count().to_frame().reset_index().rename({'id_x': 'count'}, axis=1)\n", "\n", "# Create a new field and add the numeric version of the country code using country_converter\n", "df_created_country_groups['numeric'] = df_created_country_groups['country_code'].apply(lambda x: coco.convert(x, to='ISOnumeric'))\n", "\n", "# Group collected places by country and count the results\n", "df_collected_country_groups = df_collected.groupby(by=['country_code', 'country'])['id_x'].count().to_frame().reset_index().rename({'id_x': 'count'}, axis=1)\n", "\n", "# Create a new field and add the numeric version of the country code using country_converter\n", "df_collected_country_groups['numeric'] = df_collected_country_groups['country_code'].apply(lambda x: coco.convert(x, to='ISOnumeric'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll create the choropleth maps. these are a little more complicated as we need to link the country boundaries with the counts of objects." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.VConcatChart(...)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We'll use the world map as a background to the choropleth, otherwise countries with no objects will be invisible!\n", "background = alt.Chart(countries).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").project('equirectangular').properties(width=700)\n", "\n", "# Chart the created numbers by country - once again we use the world boundaries to define countries\n", "choro = alt.Chart(countries).mark_geoshape(\n", " stroke='white'\n", ").encode(\n", " \n", " # Color is determined by the number of objects\n", " color=alt.Color('count:Q', scale=alt.Scale(scheme='greenblue'), legend=alt.Legend(title='Number of objects')),\n", " \n", " # Hover for details\n", " tooltip=[alt.Tooltip('country:N', title='Country'), alt.Tooltip('count:Q', title='Number of objects')]\n", " \n", " # This is the critical section that links the map to the object data\n", ").transform_lookup(\n", " \n", " # This is the field that contains the country ids in the boundaries file\n", " lookup='id',\n", " \n", " # This is where we link the dataframe with the counts by country\n", " # The numeric field is the country identifier and will be used to connect data with country\n", " # We can also need the count and country fields\n", " from_=alt.LookupData(df_created_country_groups, 'numeric', ['count', 'country'])\n", ").project('equirectangular').properties(width=700, title='Countries where objects were created')\n", "\n", "# Create the map by combining the background and the choropleth\n", "created_choro = alt.layer(background, choro)\n", "\n", "# Chart the collected numbers by country - once again we use the world boundaries to define countries\n", "choro = alt.Chart(countries).mark_geoshape(\n", " stroke='white'\n", ").encode(\n", " \n", " # Color is determined by the number of objects\n", " color=alt.Color('count:Q', scale=alt.Scale(scheme='greenblue'), legend=alt.Legend(title='Number of objects')),\n", " \n", " # Hover for details\n", " tooltip=[alt.Tooltip('country:N', title='Country'), alt.Tooltip('count:Q', title='Number of objects')]\n", " \n", " # This is the critical section that links the map to the object data\n", ").transform_lookup(\n", " \n", " # This is the field that contains the country ids in the boundaries file\n", " lookup='id',\n", " \n", " # This is where we link the dataframe with the counts by country\n", " # The numeric field is the country identifier and will be used to connect data with country\n", " # We can also need the count and country fields\n", " from_=alt.LookupData(df_collected_country_groups, 'numeric', ['count', 'country'])\n", ").project('equirectangular').properties(width=700, title='Countries where objects were collected')\n", "\n", "# Create the map by combining the background and the choropleth\n", "collected_choro = alt.layer(background, choro)\n", "\n", "# Combine the two maps for display\n", "created_choro & collected_choro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save counts by country as a CSV file\n", "\n", "It might be handy to have a CSV file that shows the count of objects by relationship type and country. Let's combine the two dataframes, clean things up, and save." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "nma_object_numbers_by_country.csv
" ], "text/plain": [ "/Volumes/Workspace/mycode/glam-workbench/national-museum-australia/notebooks/nma_object_numbers_by_country.csv" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Merge the collected and created country data, linking on country\n", "df_countries = pd.merge(df_created_country_groups, df_collected_country_groups, on=['country_code', 'country', 'numeric'], how='outer')\n", "\n", "# Rename columns and fill null values with zeros\n", "df_countries = df_countries.rename({'count_x': 'number_created', 'count_y': 'number_collected'}, axis=1).fillna(0)\n", "\n", "# Convert count fields from floats to ints\n", "df_countries[['number_created', 'number_collected']] = df_countries[['number_created', 'number_collected']].astype(int)\n", "\n", "# Save to CSV\n", "df_countries[['country_code', 'country', 'number_created', 'number_collected']].to_csv('nma_object_numbers_by_country.csv', index=False)\n", "\n", "display(FileLink('nma_object_numbers_by_country.csv'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).\n", "\n", "Work on this notebook was supported by the [Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab](https://tinker.edu.au/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }