{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Explore collection objects over time\n", "\n", "In this notebook we'll explore the temporal dimensions of the `object` data. When were objects created, collected, or used? To do that we'll extract the nested temporal data, see what's there, and create a few charts.\n", "\n", "[See here](exploring_object_records.ipynb) for an introduction to the `object` data, and [here to explore places](explore_objects_and_places.ipynb) associated with objects.\n", "\n", "If you haven't already, you'll either need to [harvest the `object` data](harvest_records.ipynb), or [unzip a pre-harvested dataset](unzip_preharvested_data.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "\n", "

Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.

\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import what we need" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from ipyleaflet import Map, Marker, Popup, MarkerCluster\n", "import ipywidgets as widgets\n", "from tinydb import TinyDB, Query\n", "from pandas import json_normalize\n", "import altair as alt\n", "from IPython.display import display, HTML, FileLink" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the harvested data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load JSON data from file\n", "db = TinyDB('nma_object_db.json')\n", "records = db.all()\n", "Object = Query()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Convert to a dataframe\n", "df = pd.DataFrame(records)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract the nested events data\n", "\n", "Events are linked to objects through the `temporal` field. This field contains nested data that we need to extract and flatten so we can work with it easily. We'll use `json_normalize` to extract the nested data and save each event to a new row." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temporal_typetemporal_titletemporal_startDatetemporal_endDatetemporal_interactionTypetemporal_roleNametemporal_descriptionidtitleadditionalType
0Event21 February 20092009-02-212009-02-21ProductionNaNNaN195843Reproduction cartoon titled 'Better than the b...[Political cartoons]
1EventJune 19081908-061908-06NaNDate of useNaN31257Kind Regards From Newtown[Postcards]
2Event26 January 19821982-01-261982-01-26NaNAssociated dateNaN135579Protests during the campaign to save the Frank...[Photographs]
3Event192619261926NaNDate acquired by donorby Australian Institute of Anatomy6840Spinning top[Centre of gravity toys]
4Event187218721872NaNAssociated dateNaN251967Financial document from Tirranna Picnic Race C...[Financial records]
\n", "
" ], "text/plain": [ " temporal_type temporal_title temporal_startDate temporal_endDate \\\n", "0 Event 21 February 2009 2009-02-21 2009-02-21 \n", "1 Event June 1908 1908-06 1908-06 \n", "2 Event 26 January 1982 1982-01-26 1982-01-26 \n", "3 Event 1926 1926 1926 \n", "4 Event 1872 1872 1872 \n", "\n", " temporal_interactionType temporal_roleName \\\n", "0 Production NaN \n", "1 NaN Date of use \n", "2 NaN Associated date \n", "3 NaN Date acquired by donor \n", "4 NaN Associated date \n", "\n", " temporal_description id \\\n", "0 NaN 195843 \n", "1 NaN 31257 \n", "2 NaN 135579 \n", "3 by Australian Institute of Anatomy 6840 \n", "4 NaN 251967 \n", "\n", " title additionalType \n", "0 Reproduction cartoon titled 'Better than the b... [Political cartoons] \n", "1 Kind Regards From Newtown [Postcards] \n", "2 Protests during the campaign to save the Frank... [Photographs] \n", "3 Spinning top [Centre of gravity toys] \n", "4 Financial document from Tirranna Picnic Race C... [Financial records] " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use json_normalise() to explode the temporal into multiple rows and columns\n", "# Then merge the exploded rows back with the original dataset using the id value\n", "# df_dates = pd.merge(df.loc[df['temporal'].notnull()], json_normalize(df.loc[df['temporal'].notnull()].to_dict('records'), record_path='temporal', meta=['id'], record_prefix='temporal_'), how='inner', on='id')\n", "df_dates = json_normalize(df.loc[df['temporal'].notnull()].to_dict('records'), record_path='temporal', meta=['id', 'title', 'additionalType'], record_prefix='temporal_')\n", "df_dates.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now instead of having one row for each object, we have one row for each object event.\n", "\n", "How many date records do we have?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(39219, 10)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dates.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring events" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's extract years from the dates to make comparisons a bit easier." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Use a regular expression to find the first four digits in the date fields\n", "df_dates['start_year'] = df_dates['temporal_startDate'].str.extract(r'^(\\d{4})').fillna(0).astype('int')\n", "df_dates['end_year'] = df_dates['temporal_endDate'].str.extract(r'^(\\d{4})').fillna(0).astype('int')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the earliest `start_year` (greater than 0)?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1001" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dates.loc[df_dates['start_year'] > 0]['start_year'].min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is it?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Poster titled 'Celebrating Indigenous Sport', 'Prime Minister's XI v ATSIC Chairman's XI', 19 April 2001" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "earliest = df_dates.loc[df_dates.loc[df_dates['start_year'] > 0]['start_year'].idxmin()]\n", "display(HTML('{}'.format(earliest['id'], earliest['title'])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the latest end date?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2992" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dates['end_year'].max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oh, that doesn't look quite right! Let's look to see how many of the dates are in the future!" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temporal_typetemporal_titletemporal_startDatetemporal_endDatetemporal_interactionTypetemporal_roleNametemporal_descriptionidtitleadditionalTypestart_yearend_year
5787Event17 September 20822082-09-172082-09-17ProductionNaNNaN213266Courtroom sketch 'NT Ranger, Mr. Roth.' by Ver...[Courtroom drawings]20822082
6360Event7 January 20852085-01-072085-01-07ProductionNaNNaN195336Woven basket with feathers and ochre[Baskets]20852085
12505Event20 March 20852085-03-202085-03-20ProductionNaNNaN146492Feathered stick with handle[Ornaments]20852085
23828Event12 December 29922992-12-122992-12-12NaNAssociated dateNaN67099Souvenir beaker - Princess Anne[Commemorative mugs]29922992
\n", "
" ], "text/plain": [ " temporal_type temporal_title temporal_startDate temporal_endDate \\\n", "5787 Event 17 September 2082 2082-09-17 2082-09-17 \n", "6360 Event 7 January 2085 2085-01-07 2085-01-07 \n", "12505 Event 20 March 2085 2085-03-20 2085-03-20 \n", "23828 Event 12 December 2992 2992-12-12 2992-12-12 \n", "\n", " temporal_interactionType temporal_roleName temporal_description id \\\n", "5787 Production NaN NaN 213266 \n", "6360 Production NaN NaN 195336 \n", "12505 Production NaN NaN 146492 \n", "23828 NaN Associated date NaN 67099 \n", "\n", " title \\\n", "5787 Courtroom sketch 'NT Ranger, Mr. Roth.' by Ver... \n", "6360 Woven basket with feathers and ochre \n", "12505 Feathered stick with handle \n", "23828 Souvenir beaker - Princess Anne \n", "\n", " additionalType start_year end_year \n", "5787 [Courtroom drawings] 2082 2082 \n", "6360 [Baskets] 2085 2085 \n", "12505 [Ornaments] 2085 2085 \n", "23828 [Commemorative mugs] 2992 2992 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dates.loc[(df_dates['start_year'] > 2019) | (df_dates['end_year'] > 2019)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like these records need some editing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Types of events" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Events are linked to objects in many different ways, they might document when the object was created, collected, or acquired by the museum. We can examine the types of relationships that have been documented between events and objects by looking in the `temporal_roleName` field." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date of publication 5022\n", "Associated date 4015\n", "Date made 3950\n", "Date of event 2995\n", "Associated period 2973\n", "Date collected 2503\n", "Date of voyage 2477\n", "Date photographed 1979\n", "Period of use 1706\n", "Date created 1473\n", "Date of production 1030\n", "Date of use 936\n", "Date of issue 857\n", "Date acquired by donor 544\n", "Date acquired by NMA 451\n", "Date written 422\n", "Date of work 399\n", "Date compiled 201\n", "Date worn 198\n", "Date drawn 162\n", "Date of Event 139\n", "Date acquired 128\n", "Content created 120\n", "Date posted 116\n", "Date of purchase 78\n", "Date awarded 76\n", "Date printed 70\n", "Date presented 69\n", "Production date 58\n", "Date designed 47\n", "Date of death 24\n", "Date painted 20\n", "Date of restoration 18\n", "Date of conversion 14\n", "Date reprinted 12\n", "Date of correspondence 10\n", "Date of birth 9\n", "Date built 9\n", "Date of patent 9\n", "Date of Publication 7\n", "date created 6\n", "date of publication 5\n", "Date Acquired 4\n", "Date of Production 4\n", "date of production 2\n", "Date of Correspondence 2\n", "Date repographed 1\n", "date of correspondence 1\n", "Date reproduced 1\n", "date made 1\n", "Period of Use 1\n", "Date of Work 1\n", "Associated Period 1\n", "date painted 1\n", "Name: temporal_roleName, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dates['temporal_roleName'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm, you can see that data entry into this field wasn't closely controlled – there are a number of minor variations in capitalisation, format and word order. For example, we have: 'Date of production', 'Date of Production', 'Production date', and 'date of production'!\n", "\n", "Some normalisation has taken place though, because of the creation and production related events can be identified through the `temporal_interactionType` field. What sorts of values does it contain?" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Production 18012\n", "Name: temporal_interactionType, dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dates['temporal_interactionType'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's only one value – 'Production'. According to the [documentation](https://github.com/NationalMuseumAustralia/Collection-API/wiki/Searching-the-API#retrieving-objects-by-date-place-or-party), a value of 'Production' in `interactionType` indicates the event was related to the creation of the item. Let's look to see which of the values in `roleName` have been aggregated by the 'Production' value." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date of publication 5016\n", "Date made 3950\n", "Date photographed 1761\n", "Date created 1473\n", "Date of production 1030\n", "Date of issue 674\n", "Date written 422\n", "Date of work 374\n", "Date compiled 201\n", "Date drawn 162\n", "Content created 120\n", "Date posted 116\n", "Date printed 70\n", "Production date 58\n", "Date designed 47\n", "Date painted 20\n", "Date of restoration 18\n", "Date of conversion 14\n", "Date reprinted 12\n", "Date of correspondence 10\n", "Date of patent 9\n", "Date of Publication 7\n", "date created 6\n", "date of publication 5\n", "Date of Production 4\n", "Date of Correspondence 2\n", "date of production 2\n", "date of correspondence 1\n", "Date repographed 1\n", "date made 1\n", "Date reproduced 1\n", "Date of Work 1\n", "date painted 1\n", "Name: temporal_roleName, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dates.loc[(df_dates['temporal_interactionType'] == 'Production')]['temporal_roleName'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So the `temporal_interactionType` field helps us find all the creation-related events without dealing with the variations in the ways event types are described. Yay for normalisation!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creation dates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a dataframe that contains just the creation dates." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "df_creation_dates = df_dates.loc[(df_dates['temporal_interactionType'] == 'Production')].copy()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(18012, 12)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_creation_dates.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One other thing to note is that not every event has a start date. Some just have an end date. To make sure we have at least one date for every event, let's create a new `year` column – we'll set its value to `start_year` if it exists, or `end_year` if not." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "df_creation_dates['year'] = df_creation_dates.apply(lambda x: x['start_year'] if x['start_year'] else x['end_year'], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Time to make a chart! Let's show how the creation events are distributed over time." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# First we'll get the number of objects per year\n", "year_counts = df_creation_dates['year'].value_counts().to_frame().reset_index()\n", "year_counts.columns = ['year', 'count']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a bar chart (limit to years greater than 0)\n", "alt.Chart(year_counts.loc[year_counts['year'] > 0]).mark_bar(size=2).encode(\n", " \n", " # Year on the X axis\n", " x=alt.X('year:Q', axis=alt.Axis(format='c', title='Year of production')),\n", " \n", " # Number of objects on the Y axis\n", " y=alt.Y('count:Q', title='Number of objects'),\n", " \n", " # Show details on hover\n", " tooltip=[alt.Tooltip('year:Q', title='Year'), alt.Tooltip('count():Q', title='Objects', format=',')]\n", ").properties(width=700)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so something interesting was happening in 1980 and 1913. Let's see if we can find out what." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In another notebook I showed how you can [use the `additionalType` column](exploring_object_records.ipynb#The-additionalType-field) to find out about the types of things in the collection. Let's use it to see what types of objects were created in 1980.\n", "\n", "Let's explode `additionalType` and create a new dataframe with the results!" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitleyearadditionalType
0195843Reproduction cartoon titled 'Better than the b...2009Political cartoons
559924Walka design from Ernabella1954Acrylic paintings
833064Wonderland city, Sydney, 19081906Photographic postcards
1019877Cylindrical hollow wood pipe with protruding bowl1973Smoking pipes
12124027Oak oil stone1790Sharpening stones
\n", "
" ], "text/plain": [ " id title year \\\n", "0 195843 Reproduction cartoon titled 'Better than the b... 2009 \n", "5 59924 Walka design from Ernabella 1954 \n", "8 33064 Wonderland city, Sydney, 1908 1906 \n", "10 19877 Cylindrical hollow wood pipe with protruding bowl 1973 \n", "12 124027 Oak oil stone 1790 \n", "\n", " additionalType \n", "0 Political cartoons \n", "5 Acrylic paintings \n", "8 Photographic postcards \n", "10 Smoking pipes \n", "12 Sharpening stones " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_creation_dates_types = df_creation_dates.loc[df_creation_dates['additionalType'].notnull()][['id', 'title', 'year', 'additionalType']].explode('additionalType')\n", "df_creation_dates_types.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can filter by year to see what types of things were created in 1980." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitleyearadditionalType
16166857Spergularia media1980Mounts
28166221Centranthera cochinchinensis1980Engravings
36167935Carpha alpina var. schoenoides1980Mounts
60166367Persoonia levis1980Engravings
79165539Triumfetta repens1980Engravings
\n", "
" ], "text/plain": [ " id title year additionalType\n", "16 166857 Spergularia media 1980 Mounts\n", "28 166221 Centranthera cochinchinensis 1980 Engravings\n", "36 167935 Carpha alpina var. schoenoides 1980 Mounts\n", "60 166367 Persoonia levis 1980 Engravings\n", "79 165539 Triumfetta repens 1980 Engravings" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "created_1980 = df_creation_dates_types.loc[df_creation_dates_types['year'] == 1980]\n", "created_1980.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the top twenty types of things created in 1980!" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Engravings 1486\n", "Mounts 743\n", "Folders 100\n", "Lists 42\n", "Notes 36\n", "Boxes 35\n", "Technical notes 34\n", "Cartoons 5\n", "Paintings 4\n", "Placards 3\n", "Journals 3\n", "Storybooks 2\n", "Advertising posters 2\n", "Jugs 2\n", "Books 2\n", "Botanical drawings 2\n", "Passes 2\n", "Textbooks 2\n", "Netballs 1\n", "Event posters 1\n", "Name: additionalType, dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "created_1980['additionalType'].value_counts()[:20]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So the vast majority are either 'Engravings' or 'Mounts'. Let's look at one of the 'Engravings' in more detail." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitleyearadditionalType
28166221Centranthera cochinchinensis1980Engravings
60166367Persoonia levis1980Engravings
79165539Triumfetta repens1980Engravings
155167443Hibiscus tiliaceus subsp. hastatus Malvaceae1980Engravings
195167685Lecanthus solandri1980Engravings
\n", "
" ], "text/plain": [ " id title year additionalType\n", "28 166221 Centranthera cochinchinensis 1980 Engravings\n", "60 166367 Persoonia levis 1980 Engravings\n", "79 165539 Triumfetta repens 1980 Engravings\n", "155 167443 Hibiscus tiliaceus subsp. hastatus Malvaceae 1980 Engravings\n", "195 167685 Lecanthus solandri 1980 Engravings" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter by Engravings\n", "created_1980.loc[created_1980['additionalType'] == 'Engravings'].head()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Centranthera cochinchinensis" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get the first item\n", "item = created_1980.loc[created_1980['additionalType'] == 'Engravings'].iloc[0]\n", "\n", "# Create a link to the collection db\n", "display(HTML('{}'.format(item['id'], item['title'])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you follow the link you'll find that the engravings were created for a new publication of Banks' *Florilegium*.\n", "\n", "Can you repeat this process to find out what happened in 1913?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creation dates by object type\n", "\n", "Now that we have a dataframe that combines creation dates with object types, we can look at how the creation of particular object types changes over time. For example let's look at 'Photographs' and 'Postcards'." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a dataframe containing just Photographs and Postcards -- use .isin() to filter the additionalType field\n", "df_photos_postcards = df_creation_dates_types.loc[(df_creation_dates_types['year'] > 0) & (df_creation_dates_types['additionalType'].isin(['Photographs', 'Postcards']))]\n", "\n", "# Create a stacked bar chart\n", "alt.Chart(df_photos_postcards).mark_bar(size=3).encode(\n", " \n", " # Year on the X axis\n", " x=alt.X('year:Q', axis=alt.Axis(format='c', title='Year of production')),\n", " \n", " # Number of objects on the Y axis\n", " y=alt.Y('count()', title='Number of objects'),\n", " \n", " # Color according to the type\n", " color='additionalType:N',\n", " \n", " # Details on hover\n", " tooltip=[alt.Tooltip('additionalType:N', title='Type'), alt.Tooltip('year:Q', title='Year'), alt.Tooltip('count():Q', title='Objects', format=',')]\n", ").properties(width=700)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's 1913 again... It's also interesting to see a shift from postcards to photos in the early decades of the 20th century.\n", "\n", "We could add additional types to this chart, but it will get a bit confusing. Let's try another way of charting changes in the creation of the most common object types over time.\n", "\n", "First we'll get the top twenty-five object types (which have creation dates) as a list." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Engravings',\n", " 'Bark paintings',\n", " 'Cartoons',\n", " 'Negatives',\n", " 'Mounts',\n", " 'Photographs',\n", " 'Paintings',\n", " 'Prints',\n", " 'Drawings',\n", " 'Photographic postcards',\n", " 'Acrylic paintings',\n", " 'Letters',\n", " 'Books',\n", " 'Photographic slides',\n", " 'Postcards',\n", " 'Courtroom drawings',\n", " 'Glass plate negatives',\n", " 'Cards',\n", " 'Botanical specimens',\n", " 'Prize certificates',\n", " 'Collecting cards',\n", " 'Posters',\n", " 'Sculptures',\n", " 'Portrait photographs',\n", " 'Telegrams']" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get most common 25 values and convert to a list\n", "top_types = df_creation_dates_types['additionalType'].value_counts()[:25].index.to_list()\n", "top_types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll use the list of `top_types` to filter the creation dates, so we only have events relating to those types og objects." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# Only include records where the additionalType value is in the list of top_types\n", "df_top_types = df_creation_dates_types.loc[(df_creation_dates_types['year'] > 0) & (df_creation_dates_types['additionalType'].isin(top_types))]" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "# Get the counts for year / type\n", "top_type_counts = df_top_types.groupby('year')['additionalType'].value_counts().to_frame()\n", "top_type_counts.columns = ['count']\n", "top_type_counts.reset_index(inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To chart this data we're going to use circles for each point and create 'bubble lines' for each object type to show how the number of objects created varied year by year." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a chart\n", "alt.Chart(top_type_counts).mark_circle(\n", " \n", " # Style the circles\n", " opacity=0.8,\n", " stroke='black',\n", " strokeWidth=1\n", ").encode(\n", " \n", " # Year on the X axis\n", " x=alt.X('year:O', axis=alt.Axis(format='c', title='Year of production', labelAngle=0)),\n", " \n", " # Object type on the Y axis\n", " y=alt.Y('additionalType:N', title='Object type'),\n", " \n", " # Size of the circles represents the number of objects\n", " size=alt.Size('count:Q',\n", " scale=alt.Scale(range=[0, 2000]),\n", " legend=alt.Legend(title='Number of objects')\n", " ),\n", " \n", " # Color the circles by object type\n", " color=alt.Color('additionalType:N', legend=None),\n", " \n", " # More details on hover\n", " tooltip=[alt.Tooltip('additionalType:N', title='Type'), alt.Tooltip('year:O', title='Year'), alt.Tooltip('count:Q', title='Objects', format=',')]\n", ").properties(\n", " width=700\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What patterns can you see? Hover over the cricles for more information. Once again the engravings dominate, but also look at the bark paintings and cartoons, what might be happening there?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).\n", "\n", "Work on this notebook was supported by the [Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab](https://tinker.edu.au/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }