{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring object records" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we'll have a preliminary poke around in the `object` data harvested from the [NMA Collection API](https://www.nma.gov.au/about/our-collection/our-apis). I'll focus here on the basic shape/stats of the data, other notebooks will explore the object data over [time](explore_collection_object_over_time.ipynb) and [space](explore_objects_and_places.ipynb).\n", "\n", "If you haven't already, you'll either need to [harvest the `object` data](harvest_records.ipynb), or [unzip a pre-harvested dataset](unzip_preharvested_data.ipynb).\n", "\n", "* [The shape of the data](#The-shape-of-the-data)\n", "* [Nested data](#Nested-data)\n", "* [The `additionalType` field](#The-additionalType-field)\n", "* [The `extent` field](#The-extent-field)\n", "* [How big is the collection?](#How-big-is-the-collection?)\n", "* [The biggest object?](#The-biggest-object?)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!

\n", "\n", "

\n", " Some tips:\n", "

\n", "

\n", "\n", "

Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.

\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import what we need" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import math\n", "from IPython.display import display, HTML, FileLink\n", "from tinydb import TinyDB, Query\n", "from pandas.io.json import json_normalize" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the harvested data" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "# Load the harvested data from the json db\n", "db = TinyDB('nma_object_db.json')\n", "records = db.all()\n", "Object = Query()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtypetitle_metaadditionalTypecollectionidentifiermediumextentphysicalDescription...isPartOfseeAlsodescriptionhasVersiontemporalrelationhasPartlocationacknowledgementeducationalSignificance
0145400objectWahlo and Tribal law by Kevin Gilbert, reprint...{'modified': '2018-07-09', 'issued': '2011-10-...NaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1251390objectPair of woven shoes made from feathers and hair{'modified': '2019-01-17', 'issued': '2018-04-...[Shoes]{'id': '5244', 'type': 'Collection', 'title': ...2000.0014.0495[{'type': 'Material', 'title': 'Feather'}, {'t...{'type': 'Measurement', 'length': 260, 'width'...Shoes, the soles of which are made from woven ......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2124081objectPair of ceremonial shoes{'modified': '2018-12-04', 'issued': '2006-10-...NaN{'id': '1892', 'type': 'Collection', 'title': ...1992.0089.0165[{'type': 'Material', 'title': 'Feather'}]{'type': 'Measurement', 'length': 246, 'width'...A pair of ceremonial shoes made with several m......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
321507objectGrinding stone{'modified': '2018-06-19', 'issued': '2014-12-...[Grinding stones]{'id': '2229', 'type': 'Collection', 'title': ...1985.0288.0109NaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4142308object'time CHange' [sic]{'modified': '2019-04-15', 'issued': '2012-06-...[Compact discs]{'id': '3893', 'type': 'Collection', 'title': ...AR00213.012NaNNaNA compact disc, housed within a clear and blac......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " id type title \\\n", "0 145400 object Wahlo and Tribal law by Kevin Gilbert, reprint... \n", "1 251390 object Pair of woven shoes made from feathers and hair \n", "2 124081 object Pair of ceremonial shoes \n", "3 21507 object Grinding stone \n", "4 142308 object 'time CHange' [sic] \n", "\n", " _meta additionalType \\\n", "0 {'modified': '2018-07-09', 'issued': '2011-10-... NaN \n", "1 {'modified': '2019-01-17', 'issued': '2018-04-... [Shoes] \n", "2 {'modified': '2018-12-04', 'issued': '2006-10-... NaN \n", "3 {'modified': '2018-06-19', 'issued': '2014-12-... [Grinding stones] \n", "4 {'modified': '2019-04-15', 'issued': '2012-06-... [Compact discs] \n", "\n", " collection identifier \\\n", "0 NaN NaN \n", "1 {'id': '5244', 'type': 'Collection', 'title': ... 2000.0014.0495 \n", "2 {'id': '1892', 'type': 'Collection', 'title': ... 1992.0089.0165 \n", "3 {'id': '2229', 'type': 'Collection', 'title': ... 1985.0288.0109 \n", "4 {'id': '3893', 'type': 'Collection', 'title': ... AR00213.012 \n", "\n", " medium \\\n", "0 NaN \n", "1 [{'type': 'Material', 'title': 'Feather'}, {'t... \n", "2 [{'type': 'Material', 'title': 'Feather'}] \n", "3 NaN \n", "4 NaN \n", "\n", " extent \\\n", "0 NaN \n", "1 {'type': 'Measurement', 'length': 260, 'width'... \n", "2 {'type': 'Measurement', 'length': 246, 'width'... \n", "3 NaN \n", "4 NaN \n", "\n", " physicalDescription ... isPartOf seeAlso \\\n", "0 NaN ... NaN NaN \n", "1 Shoes, the soles of which are made from woven ... ... NaN NaN \n", "2 A pair of ceremonial shoes made with several m... ... NaN NaN \n", "3 NaN ... NaN NaN \n", "4 A compact disc, housed within a clear and blac... ... NaN NaN \n", "\n", " description hasVersion temporal relation hasPart location acknowledgement \\\n", "0 NaN NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN NaN NaN \n", "\n", " educationalSignificance \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert to a dataframe\n", "df = pd.DataFrame(records)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The shape of the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many objects are there?" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 86,679 objects in the collection\n" ] } ], "source": [ "print('There are {:,} objects in the collection'.format(df.shape[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obviously not every record has a value for every field, let's create a quick count of the number of values in each field." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id 86679\n", "type 86679\n", "title 86463\n", "_meta 86679\n", "additionalType 86652\n", "collection 84256\n", "identifier 86654\n", "medium 73743\n", "extent 64077\n", "physicalDescription 86359\n", "significanceStatement 32437\n", "creator 25076\n", "spatial 46658\n", "contributor 40796\n", "isAggregatedBy 4353\n", "isPartOf 10718\n", "seeAlso 467\n", "description 9097\n", "hasVersion 19845\n", "temporal 29399\n", "relation 3066\n", "hasPart 2345\n", "location 1364\n", "acknowledgement 785\n", "educationalSignificance 201\n", "dtype: int64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's express those counts as a percentage of the total number of records, and display them as a bar chart using Pandas." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
field count proportion
0id86679100.00%
1type86679100.00%
2title8646399.75%
3_meta86679100.00%
4additionalType8665299.97%
5collection8425697.20%
6identifier8665499.97%
7medium7374385.08%
8extent6407773.92%
9physicalDescription8635999.63%
10significanceStatement3243737.42%
11creator2507628.93%
12spatial4665853.83%
13contributor4079647.07%
14isAggregatedBy43535.02%
15isPartOf1071812.37%
16seeAlso4670.54%
17description909710.50%
18hasVersion1984522.89%
19temporal2939933.92%
20relation30663.54%
21hasPart23452.71%
22location13641.57%
23acknowledgement7850.91%
24educationalSignificance2010.23%
" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get field counts and convert to dataframe\n", "field_counts = df.count().to_frame().reset_index()\n", "\n", "# Change column headings\n", "field_counts.columns = ['field', 'count']\n", "\n", "# Calculate proportion of the total\n", "field_counts['proportion'] = field_counts['count'].apply(lambda x: x / df.shape[0])\n", "\n", "# Style the results as a barchart\n", "field_counts.style.bar(subset=['proportion'], color='#d65f5f').format({'proportion': '{:.2%}'.format})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nested data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing you might note is that some of the fields contain nested JSON arrays or objects. For example `additionalType` contains a list of object types, while `extent` is a dictionary with keys and values. Let's unpack these columns for the second row (index of 1)." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Shoes'" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['additionalType'][1][0]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'type': 'Measurement',\n", " 'length': 260,\n", " 'width': 120,\n", " 'depth': 40,\n", " 'unitText': 'mm'}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['extent'][1]" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "260" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['extent'][1]['length']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `additionalType` field" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many objects have values in the `additionalType` column?" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(86652, 25)" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['additionalType'].notnull()].shape" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "99.968851% of objects have an additionalType value\n" ] } ], "source": [ "print('{:%} of objects have an additionalType value'.format(df.loc[df['additionalType'].notnull()].shape[0] / df.shape[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So which ones don't have an `additionalType`?" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtypetitle_metaadditionalTypecollectionidentifiermediumextentphysicalDescription...isPartOfseeAlsodescriptionhasVersiontemporalrelationhasPartlocationacknowledgementeducationalSignificance
0145400objectWahlo and Tribal law by Kevin Gilbert, reprint...{'modified': '2018-07-09', 'issued': '2011-10-...NaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2124081objectPair of ceremonial shoes{'modified': '2018-12-04', 'issued': '2006-10-...NaN{'id': '1892', 'type': 'Collection', 'title': ...1992.0089.0165[{'type': 'Material', 'title': 'Feather'}]{'type': 'Measurement', 'length': 246, 'width'...A pair of ceremonial shoes made with several m......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1054224632objectGlass plate negative of family and horse stand...{'copyright': '', 'licence': ''}NaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1276180161objectAwelye- panel 1 by Lily Kngwarreye{'copyright': '', 'licence': ''}NaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2333180168objectAwelye- panel 5 by Lily Kngwarreye{'copyright': '', 'licence': ''}NaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " id type title \\\n", "0 145400 object Wahlo and Tribal law by Kevin Gilbert, reprint... \n", "2 124081 object Pair of ceremonial shoes \n", "1054 224632 object Glass plate negative of family and horse stand... \n", "1276 180161 object Awelye- panel 1 by Lily Kngwarreye \n", "2333 180168 object Awelye- panel 5 by Lily Kngwarreye \n", "\n", " _meta additionalType \\\n", "0 {'modified': '2018-07-09', 'issued': '2011-10-... NaN \n", "2 {'modified': '2018-12-04', 'issued': '2006-10-... NaN \n", "1054 {'copyright': '', 'licence': ''} NaN \n", "1276 {'copyright': '', 'licence': ''} NaN \n", "2333 {'copyright': '', 'licence': ''} NaN \n", "\n", " collection identifier \\\n", "0 NaN NaN \n", "2 {'id': '1892', 'type': 'Collection', 'title': ... 1992.0089.0165 \n", "1054 NaN NaN \n", "1276 NaN NaN \n", "2333 NaN NaN \n", "\n", " medium \\\n", "0 NaN \n", "2 [{'type': 'Material', 'title': 'Feather'}] \n", "1054 NaN \n", "1276 NaN \n", "2333 NaN \n", "\n", " extent \\\n", "0 NaN \n", "2 {'type': 'Measurement', 'length': 246, 'width'... \n", "1054 NaN \n", "1276 NaN \n", "2333 NaN \n", "\n", " physicalDescription ... isPartOf seeAlso \\\n", "0 NaN ... NaN NaN \n", "2 A pair of ceremonial shoes made with several m... ... NaN NaN \n", "1054 NaN ... NaN NaN \n", "1276 NaN ... NaN NaN \n", "2333 NaN ... NaN NaN \n", "\n", " description hasVersion temporal relation hasPart location \\\n", "0 NaN NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN NaN \n", "1054 NaN NaN NaN NaN NaN NaN \n", "1276 NaN NaN NaN NaN NaN NaN \n", "2333 NaN NaN NaN NaN NaN NaN \n", "\n", " acknowledgement educationalSignificance \n", "0 NaN NaN \n", "2 NaN NaN \n", "1054 NaN NaN \n", "1276 NaN NaN \n", "2333 NaN NaN \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Just show the first 5 rows\n", "df.loc[df['additionalType'].isnull()].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many rows have more than one `additionalType`?" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1037" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['additionalType'].str.len() > 1].shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look at a sample." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtypetitle_metaadditionalTypecollectionidentifiermediumextentphysicalDescription...isPartOfseeAlsodescriptionhasVersiontemporalrelationhasPartlocationacknowledgementeducationalSignificance
45202601objectAlbum of Newspaper clippings{'modified': '2019-04-22', 'issued': '2010-11-...[Albums, Newspaper clippings]{'id': '4760', 'type': 'Collection', 'title': ...1989.0009.0108[{'type': 'Material', 'title': 'Cardboard'}, {...{'type': 'Measurement', 'height': 345, 'width'...A brown textured hardback album with gold colo......NaNNaNNaNNaN[{'type': 'Event', 'title': '1935', 'startDate...NaNNaNNaNNaNNaN
118223557objectReceipt issued to Tirranna Race Club, 1878{'modified': '2019-04-23', 'issued': '2017-11-...[Invoices, Receipts]{'id': '6139', 'type': 'Collection', 'title': ...2012.0019.0170[{'type': 'Material', 'title': 'Ink'}, {'type'...{'type': 'Measurement', 'height': 114, 'width'...A receipt handwritten on a piece of grey paper......NaNNaNNaNNaN[{'type': 'Event', 'title': '1878', 'startDate...NaNNaNNaNNaNNaN
155227915objectTwo toned ceramic toy tea set{'modified': '2019-05-17', 'issued': '2018-08-...[Tea sets, Toy tea sets]{'id': '6773', 'type': 'Collection', 'title': ...2013.0038.0255[{'type': 'Material', 'title': 'Ceramic'}, {'t...{'type': 'Measurement', 'height': 15, 'diamete...A hand-painted ceramic toy tea set with a blue......NaNNaNNaNNaN[{'type': 'Event', 'title': '1925 - 1935', 'st...NaNNaNNaNDonated through the Australian Government’s Cu...NaN
173256766objectHandmade wolf figurine in yellow dress likely ...{'modified': '2018-12-13', 'issued': '2018-10-...[Novelty toys, Toys]{'id': '6773', 'type': 'Collection', 'title': ...2013.0038.0556.005[{'type': 'Material', 'title': 'Cotton thread'...{'type': 'Measurement', 'height': 88, 'width':...A handmade wolf figurine robed in a yellow dre......NaNNaNNaNNaN[{'type': 'Event', 'title': '1925 - 1935', 'st...NaNNaNNaNNaNNaN
564224635objectPhotograph of'Freda Mitchell'{'modified': '2019-07-01', 'issued': '2018-11-...[Photographs, Sepia photographs]{'id': '6339', 'type': 'Collection', 'title': ...2013.0062.0017.002[{'type': 'Material', 'title': 'Card'}, {'type...{'type': 'Measurement', 'height': 147, 'width'...A sepia photograph showing a young woman posin......NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " id type title \\\n", "45 202601 object Album of Newspaper clippings \n", "118 223557 object Receipt issued to Tirranna Race Club, 1878 \n", "155 227915 object Two toned ceramic toy tea set \n", "173 256766 object Handmade wolf figurine in yellow dress likely ... \n", "564 224635 object Photograph of'Freda Mitchell' \n", "\n", " _meta \\\n", "45 {'modified': '2019-04-22', 'issued': '2010-11-... \n", "118 {'modified': '2019-04-23', 'issued': '2017-11-... \n", "155 {'modified': '2019-05-17', 'issued': '2018-08-... \n", "173 {'modified': '2018-12-13', 'issued': '2018-10-... \n", "564 {'modified': '2019-07-01', 'issued': '2018-11-... \n", "\n", " additionalType \\\n", "45 [Albums, Newspaper clippings] \n", "118 [Invoices, Receipts] \n", "155 [Tea sets, Toy tea sets] \n", "173 [Novelty toys, Toys] \n", "564 [Photographs, Sepia photographs] \n", "\n", " collection identifier \\\n", "45 {'id': '4760', 'type': 'Collection', 'title': ... 1989.0009.0108 \n", "118 {'id': '6139', 'type': 'Collection', 'title': ... 2012.0019.0170 \n", "155 {'id': '6773', 'type': 'Collection', 'title': ... 2013.0038.0255 \n", "173 {'id': '6773', 'type': 'Collection', 'title': ... 2013.0038.0556.005 \n", "564 {'id': '6339', 'type': 'Collection', 'title': ... 2013.0062.0017.002 \n", "\n", " medium \\\n", "45 [{'type': 'Material', 'title': 'Cardboard'}, {... \n", "118 [{'type': 'Material', 'title': 'Ink'}, {'type'... \n", "155 [{'type': 'Material', 'title': 'Ceramic'}, {'t... \n", "173 [{'type': 'Material', 'title': 'Cotton thread'... \n", "564 [{'type': 'Material', 'title': 'Card'}, {'type... \n", "\n", " extent \\\n", "45 {'type': 'Measurement', 'height': 345, 'width'... \n", "118 {'type': 'Measurement', 'height': 114, 'width'... \n", "155 {'type': 'Measurement', 'height': 15, 'diamete... \n", "173 {'type': 'Measurement', 'height': 88, 'width':... \n", "564 {'type': 'Measurement', 'height': 147, 'width'... \n", "\n", " physicalDescription ... isPartOf seeAlso \\\n", "45 A brown textured hardback album with gold colo... ... NaN NaN \n", "118 A receipt handwritten on a piece of grey paper... ... NaN NaN \n", "155 A hand-painted ceramic toy tea set with a blue... ... NaN NaN \n", "173 A handmade wolf figurine robed in a yellow dre... ... NaN NaN \n", "564 A sepia photograph showing a young woman posin... ... NaN NaN \n", "\n", " description hasVersion temporal \\\n", "45 NaN NaN [{'type': 'Event', 'title': '1935', 'startDate... \n", "118 NaN NaN [{'type': 'Event', 'title': '1878', 'startDate... \n", "155 NaN NaN [{'type': 'Event', 'title': '1925 - 1935', 'st... \n", "173 NaN NaN [{'type': 'Event', 'title': '1925 - 1935', 'st... \n", "564 NaN NaN NaN \n", "\n", " relation hasPart location \\\n", "45 NaN NaN NaN \n", "118 NaN NaN NaN \n", "155 NaN NaN NaN \n", "173 NaN NaN NaN \n", "564 NaN NaN NaN \n", "\n", " acknowledgement educationalSignificance \n", "45 NaN NaN \n", "118 NaN NaN \n", "155 Donated through the Australian Government’s Cu... NaN \n", "173 NaN NaN \n", "564 NaN NaN \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['additionalType'].str.len() > 1].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `additionalType` field contains a nested list of values. Using `json_normalize()` or `explode()` we can explode these lists, creating a row for each separate value." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitleadditionalType
1251390Pair of woven shoes made from feathers and hairShoes
321507Grinding stoneGrinding stones
4142308'time CHange' [sic]Compact discs
520174Ten Days To Live - A supposed sorcery painting.Bark paintings
6144359'The Dance of Life (1898-1902)' by Diana Boyer...Booklets
\n", "
" ], "text/plain": [ " id title additionalType\n", "1 251390 Pair of woven shoes made from feathers and hair Shoes\n", "3 21507 Grinding stone Grinding stones\n", "4 142308 'time CHange' [sic] Compact discs\n", "5 20174 Ten Days To Live - A supposed sorcery painting. Bark paintings\n", "6 144359 'The Dance of Life (1898-1902)' by Diana Boyer... Booklets" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use json_normalize to expand 'additionalType' into separate rows, adding the id and title from the parent record\n", "# df_types = json_normalize(df.loc[df['additionalType'].notnull()].to_dict('records'), record_path='additionalType', meta=['id', 'title'], errors='ignore').rename({0: 'additionalType'}, axis=1)\n", "\n", "# In pandas v.0.25 and above you can just use explode -- this prodices the same result as above\n", "df_types = df.loc[df['additionalType'].notnull()][['id', 'title', 'additionalType']].explode('additionalType')\n", "\n", "df_types.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've exploded the type values, we can aggregate them in different ways. Let's look at the 25 most common object types!" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Mineral samples 6000\n", "Photographs 4747\n", "Stone artefacts 4364\n", "Photographic postcards 4250\n", "Drawings 3759\n", "Postcards 3697\n", "Zoological specimens 2168\n", "Bark paintings 2110\n", "Geological specimens 1993\n", "Cartoons 1535\n", "Engravings 1495\n", "Negatives 1124\n", "Boomerangs 1025\n", "Spears 1012\n", "Percussion and abrading stones 982\n", "Paintings 840\n", "Clubs 747\n", "Mounts 745\n", "Cards 709\n", "Armbands 649\n", "Shells 563\n", "Letters 542\n", "Documents 517\n", "Geophysical survey equipment 509\n", "Posters 495\n", "Name: additionalType, dtype: int64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_types['additionalType'].value_counts()[:25]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many object types only appear once?" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "639" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type_counts = df_types['additionalType'].value_counts().to_frame().reset_index().rename({'index': 'type', 'additionalType': 'count'}, axis=1)\n", "unique_types = type_counts.loc[type_counts['count'] == 1]\n", "unique_types.shape[0]" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
typecount
1852Genealogical charts1
1853Skivvies1
1854Shopping bags1
1855Jam spoons1
1856Architectural models1
\n", "
" ], "text/plain": [ " type count\n", "1852 Genealogical charts 1\n", "1853 Skivvies 1\n", "1854 Shopping bags 1\n", "1855 Jam spoons 1\n", "1856 Architectural models 1" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unique_types.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's save the complete list of types as a CSV file." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "nma_object_type_counts.csv
" ], "text/plain": [ "/Volumes/Workspace/mycode/glam-workbench/national-museum-australia/notebooks/nma_object_type_counts.csv" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "type_counts.to_csv('nma_object_type_counts.csv', index=False)\n", "display(FileLink('nma_object_type_counts.csv'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Browsing the CSV I noticed that there was one item with the type `Vegetables`. Let's find some more out about it." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtypetitle_metaadditionalTypecollectionidentifiermediumextentphysicalDescription...isPartOfseeAlsodescriptionhasVersiontemporalrelationhasPartlocationacknowledgementeducationalSignificance
63775256742objectWooden toy toad stalk{'modified': '2019-04-24', 'issued': '2018-10-...[Toys, Vegetables]{'id': '6773', 'type': 'Collection', 'title': ...2013.0038.0540[{'type': 'Material', 'title': 'Paint - non sp...{'type': 'Measurement', 'height': 65, 'diamete...A painted wooden toy toad stalk with a red cap......NaNNaNNaNNaN[{'type': 'Event', 'title': '1925 - 1935', 'st...NaNNaNNaNNaNNaN
\n", "

1 rows × 25 columns

\n", "
" ], "text/plain": [ " id type title \\\n", "63775 256742 object Wooden toy toad stalk \n", "\n", " _meta additionalType \\\n", "63775 {'modified': '2019-04-24', 'issued': '2018-10-... [Toys, Vegetables] \n", "\n", " collection identifier \\\n", "63775 {'id': '6773', 'type': 'Collection', 'title': ... 2013.0038.0540 \n", "\n", " medium \\\n", "63775 [{'type': 'Material', 'title': 'Paint - non sp... \n", "\n", " extent \\\n", "63775 {'type': 'Measurement', 'height': 65, 'diamete... \n", "\n", " physicalDescription ... isPartOf \\\n", "63775 A painted wooden toy toad stalk with a red cap... ... NaN \n", "\n", " seeAlso description hasVersion \\\n", "63775 NaN NaN NaN \n", "\n", " temporal relation hasPart \\\n", "63775 [{'type': 'Event', 'title': '1925 - 1935', 'st... NaN NaN \n", "\n", " location acknowledgement educationalSignificance \n", "63775 NaN NaN NaN \n", "\n", "[1 rows x 25 columns]" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find in the complete data set\n", "mask = df.loc[df['additionalType'].notnull()]['additionalType'].apply(lambda x: 'Vegetables' in x)\n", "veggie = df.loc[df['additionalType'].notnull()][mask]\n", "veggie" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can create a link into the NMA Collections Explorer using the object `id`." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Wooden toy toad stalk" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(HTML('{}'.format(veggie.iloc[0]['id'], veggie.iloc[0]['title'])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Does a toad stool count as a vegetable?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `extent` field\n", "\n", "The `extent` field is a nested object, so once again we'll use `json_normalize()` to expand it out into separate columns." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexidtypetitle_metaadditionalTypecollectionidentifiermediumextent...educationalSignificanceextent_typeextent_lengthextent_widthextent_depthextent_unitTextextent_heightextent_diameterextent_weightextent_unitTextWeight
01251390objectPair of woven shoes made from feathers and hair{'modified': '2019-01-17', 'issued': '2018-04-...[Shoes]{'id': '5244', 'type': 'Collection', 'title': ...2000.0014.0495[{'type': 'Material', 'title': 'Feather'}, {'t...{'type': 'Measurement', 'length': 260, 'width'......NaNMeasurement260.0120.040.0mmNaNNaNNaNNaN
12124081objectPair of ceremonial shoes{'modified': '2018-12-04', 'issued': '2006-10-...NaN{'id': '1892', 'type': 'Collection', 'title': ...1992.0089.0165[{'type': 'Material', 'title': 'Feather'}]{'type': 'Measurement', 'length': 246, 'width'......NaNMeasurement246.0190.045.0mmNaNNaNNaNNaN
2520174objectTen Days To Live - A supposed sorcery painting.{'modified': '2019-04-21', 'issued': '2013-06-...[Bark paintings]{'id': '2202', 'type': 'Collection', 'title': ...1985.0246.0077[{'type': 'Material', 'title': 'Bark'}, {'type...{'type': 'Measurement', 'length': 574, 'width'......NaNMeasurement574.0185.0NaNmmNaNNaNNaNNaN
36144359object'The Dance of Life (1898-1902)' by Diana Boyer...{'modified': '2018-06-18', 'issued': '2012-06-...[Booklets]{'id': '3893', 'type': 'Collection', 'title': ...2008.0043.0022.001[{'type': 'Material', 'title': 'Paper'}, {'typ...{'type': 'Measurement', 'height': 214, 'width'......NaNMeasurementNaN150.05.0mm214.0NaNNaNNaN
4842084objectChild's drawing by Lester Moran, Cabbage Tree ...{'modified': '2019-04-07', 'issued': '2016-10-...[Drawings]{'id': '2261', 'type': 'Collection', 'title': ...1991.0024.0027[{'type': 'Material', 'title': 'Paint - non sp...{'type': 'Measurement', 'length': 560, 'width'......NaNMeasurement560.0380.00.5mmNaNNaNNaNNaN
\n", "

5 rows × 35 columns

\n", "
" ], "text/plain": [ " index id type title \\\n", "0 1 251390 object Pair of woven shoes made from feathers and hair \n", "1 2 124081 object Pair of ceremonial shoes \n", "2 5 20174 object Ten Days To Live - A supposed sorcery painting. \n", "3 6 144359 object 'The Dance of Life (1898-1902)' by Diana Boyer... \n", "4 8 42084 object Child's drawing by Lester Moran, Cabbage Tree ... \n", "\n", " _meta additionalType \\\n", "0 {'modified': '2019-01-17', 'issued': '2018-04-... [Shoes] \n", "1 {'modified': '2018-12-04', 'issued': '2006-10-... NaN \n", "2 {'modified': '2019-04-21', 'issued': '2013-06-... [Bark paintings] \n", "3 {'modified': '2018-06-18', 'issued': '2012-06-... [Booklets] \n", "4 {'modified': '2019-04-07', 'issued': '2016-10-... [Drawings] \n", "\n", " collection identifier \\\n", "0 {'id': '5244', 'type': 'Collection', 'title': ... 2000.0014.0495 \n", "1 {'id': '1892', 'type': 'Collection', 'title': ... 1992.0089.0165 \n", "2 {'id': '2202', 'type': 'Collection', 'title': ... 1985.0246.0077 \n", "3 {'id': '3893', 'type': 'Collection', 'title': ... 2008.0043.0022.001 \n", "4 {'id': '2261', 'type': 'Collection', 'title': ... 1991.0024.0027 \n", "\n", " medium \\\n", "0 [{'type': 'Material', 'title': 'Feather'}, {'t... \n", "1 [{'type': 'Material', 'title': 'Feather'}] \n", "2 [{'type': 'Material', 'title': 'Bark'}, {'type... \n", "3 [{'type': 'Material', 'title': 'Paper'}, {'typ... \n", "4 [{'type': 'Material', 'title': 'Paint - non sp... \n", "\n", " extent ... \\\n", "0 {'type': 'Measurement', 'length': 260, 'width'... ... \n", "1 {'type': 'Measurement', 'length': 246, 'width'... ... \n", "2 {'type': 'Measurement', 'length': 574, 'width'... ... \n", "3 {'type': 'Measurement', 'height': 214, 'width'... ... \n", "4 {'type': 'Measurement', 'length': 560, 'width'... ... \n", "\n", " educationalSignificance extent_type extent_length extent_width \\\n", "0 NaN Measurement 260.0 120.0 \n", "1 NaN Measurement 246.0 190.0 \n", "2 NaN Measurement 574.0 185.0 \n", "3 NaN Measurement NaN 150.0 \n", "4 NaN Measurement 560.0 380.0 \n", "\n", " extent_depth extent_unitText extent_height extent_diameter extent_weight \\\n", "0 40.0 mm NaN NaN NaN \n", "1 45.0 mm NaN NaN NaN \n", "2 NaN mm NaN NaN NaN \n", "3 5.0 mm 214.0 NaN NaN \n", "4 0.5 mm NaN NaN NaN \n", "\n", " extent_unitTextWeight \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "\n", "[5 rows x 35 columns]" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Without reset_index() the rows are misaligned\n", "df_extent = df.loc[df['extent'].notnull()].reset_index().join(json_normalize(df.loc[df['extent'].notnull()]['extent'].tolist()).add_prefix(\"extent_\"))\n", "df_extent.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check to see what types of things are in the `extent` field." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Measurement 64077\n", "Name: extent_type, dtype: int64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_extent['extent_type'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So they're all measurements. Let's have a look at the units being used." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mm 63382\n", "MM 10\n", "cm 9\n", "m 5\n", "Name: extent_unitText, dtype: int64" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_extent['extent_unitText'].value_counts()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "g 1473\n", "kg 209\n", "lb 5\n", "oz 4\n", "tonne 1\n", "Name: extent_unitTextWeight, dtype: int64" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_extent['extent_unitTextWeight'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm, are those measurements really in metres, or might they be meant to be 'mm'? Let's have a look at them." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitleextent_lengthextent_widthextent_unitText
16781202783The Percival Project, Gull Twelve, in a manill...NaN230.0m
18291214193Extension tube55.0000NaNm
41612123962Gunter's chain20.1168NaNm
47232171768Fair BreezeNaN138.0m
56789257184Fishing line inside envelope137.0000110.0m
\n", "
" ], "text/plain": [ " id title \\\n", "16781 202783 The Percival Project, Gull Twelve, in a manill... \n", "18291 214193 Extension tube \n", "41612 123962 Gunter's chain \n", "47232 171768 Fair Breeze \n", "56789 257184 Fishing line inside envelope \n", "\n", " extent_length extent_width extent_unitText \n", "16781 NaN 230.0 m \n", "18291 55.0000 NaN m \n", "41612 20.1168 NaN m \n", "47232 NaN 138.0 m \n", "56789 137.0000 110.0 m " ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_extent.loc[df_extent['extent_unitText'] == 'm'][['id', 'title', 'extent_length', 'extent_width', 'extent_unitText']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other than 'Gunter's chain' it looks like the unit should indeed by 'mm'. We'll need to take that into account in calculations.\n", "\n", "Now let's convert all the measurements into a single unit – millimetre for lengths, and gram for weights." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "def conversion_factor(unit):\n", " '''\n", " Get the factor required to convery current unit to either mm or g.\n", " '''\n", " factors = {\n", " 'mm': 1,\n", " 'cm': 10,\n", " 'm': 1, # Most should in fact be mm (see above)\n", " 'g': 1,\n", " 'kg': 1000,\n", " 'tonne': 1000000,\n", " 'oz': 28.35,\n", " 'lb': 453.592\n", " }\n", " try:\n", " factor = factors[unit.lower()]\n", " except KeyError:\n", " factor = 0 \n", " return factor\n", "\n", "def normalise_measurements(row):\n", " '''\n", " Convert measurements to standard units.\n", " '''\n", " l_factor = conversion_factor(str(row['extent_unitText']))\n", " length = row['extent_length'] * l_factor\n", " width = row['extent_width'] * l_factor\n", " depth = row['extent_depth'] * l_factor\n", " height = row['extent_height'] * l_factor\n", " diameter = row['extent_diameter'] * l_factor\n", " w_factor = conversion_factor(str(row['extent_unitTextWeight']))\n", " weight = row['extent_weight'] * w_factor\n", " return pd.Series([length, width, depth, height, diameter, weight])\n", "\n", "# Add normalised measurements to the dataframe\n", "df_extent[['length_mm', 'width_mm', 'depth_mm', 'height_mm', 'diameter_mm', 'weight_g']] = df_extent.apply(normalise_measurements, axis=1)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexidtypetitle_metaadditionalTypecollectionidentifiermediumextent...extent_heightextent_diameterextent_weightextent_unitTextWeightlength_mmwidth_mmdepth_mmheight_mmdiameter_mmweight_g
01251390objectPair of woven shoes made from feathers and hair{'modified': '2019-01-17', 'issued': '2018-04-...[Shoes]{'id': '5244', 'type': 'Collection', 'title': ...2000.0014.0495[{'type': 'Material', 'title': 'Feather'}, {'t...{'type': 'Measurement', 'length': 260, 'width'......NaNNaNNaNNaN260.0120.040.0NaNNaNNaN
12124081objectPair of ceremonial shoes{'modified': '2018-12-04', 'issued': '2006-10-...NaN{'id': '1892', 'type': 'Collection', 'title': ...1992.0089.0165[{'type': 'Material', 'title': 'Feather'}]{'type': 'Measurement', 'length': 246, 'width'......NaNNaNNaNNaN246.0190.045.0NaNNaNNaN
2520174objectTen Days To Live - A supposed sorcery painting.{'modified': '2019-04-21', 'issued': '2013-06-...[Bark paintings]{'id': '2202', 'type': 'Collection', 'title': ...1985.0246.0077[{'type': 'Material', 'title': 'Bark'}, {'type...{'type': 'Measurement', 'length': 574, 'width'......NaNNaNNaNNaN574.0185.0NaNNaNNaNNaN
36144359object'The Dance of Life (1898-1902)' by Diana Boyer...{'modified': '2018-06-18', 'issued': '2012-06-...[Booklets]{'id': '3893', 'type': 'Collection', 'title': ...2008.0043.0022.001[{'type': 'Material', 'title': 'Paper'}, {'typ...{'type': 'Measurement', 'height': 214, 'width'......214.0NaNNaNNaNNaN150.05.0214.0NaNNaN
4842084objectChild's drawing by Lester Moran, Cabbage Tree ...{'modified': '2019-04-07', 'issued': '2016-10-...[Drawings]{'id': '2261', 'type': 'Collection', 'title': ...1991.0024.0027[{'type': 'Material', 'title': 'Paint - non sp...{'type': 'Measurement', 'length': 560, 'width'......NaNNaNNaNNaN560.0380.00.5NaNNaNNaN
\n", "

5 rows × 41 columns

\n", "
" ], "text/plain": [ " index id type title \\\n", "0 1 251390 object Pair of woven shoes made from feathers and hair \n", "1 2 124081 object Pair of ceremonial shoes \n", "2 5 20174 object Ten Days To Live - A supposed sorcery painting. \n", "3 6 144359 object 'The Dance of Life (1898-1902)' by Diana Boyer... \n", "4 8 42084 object Child's drawing by Lester Moran, Cabbage Tree ... \n", "\n", " _meta additionalType \\\n", "0 {'modified': '2019-01-17', 'issued': '2018-04-... [Shoes] \n", "1 {'modified': '2018-12-04', 'issued': '2006-10-... NaN \n", "2 {'modified': '2019-04-21', 'issued': '2013-06-... [Bark paintings] \n", "3 {'modified': '2018-06-18', 'issued': '2012-06-... [Booklets] \n", "4 {'modified': '2019-04-07', 'issued': '2016-10-... [Drawings] \n", "\n", " collection identifier \\\n", "0 {'id': '5244', 'type': 'Collection', 'title': ... 2000.0014.0495 \n", "1 {'id': '1892', 'type': 'Collection', 'title': ... 1992.0089.0165 \n", "2 {'id': '2202', 'type': 'Collection', 'title': ... 1985.0246.0077 \n", "3 {'id': '3893', 'type': 'Collection', 'title': ... 2008.0043.0022.001 \n", "4 {'id': '2261', 'type': 'Collection', 'title': ... 1991.0024.0027 \n", "\n", " medium \\\n", "0 [{'type': 'Material', 'title': 'Feather'}, {'t... \n", "1 [{'type': 'Material', 'title': 'Feather'}] \n", "2 [{'type': 'Material', 'title': 'Bark'}, {'type... \n", "3 [{'type': 'Material', 'title': 'Paper'}, {'typ... \n", "4 [{'type': 'Material', 'title': 'Paint - non sp... \n", "\n", " extent ... extent_height \\\n", "0 {'type': 'Measurement', 'length': 260, 'width'... ... NaN \n", "1 {'type': 'Measurement', 'length': 246, 'width'... ... NaN \n", "2 {'type': 'Measurement', 'length': 574, 'width'... ... NaN \n", "3 {'type': 'Measurement', 'height': 214, 'width'... ... 214.0 \n", "4 {'type': 'Measurement', 'length': 560, 'width'... ... NaN \n", "\n", " extent_diameter extent_weight extent_unitTextWeight length_mm width_mm \\\n", "0 NaN NaN NaN 260.0 120.0 \n", "1 NaN NaN NaN 246.0 190.0 \n", "2 NaN NaN NaN 574.0 185.0 \n", "3 NaN NaN NaN NaN 150.0 \n", "4 NaN NaN NaN 560.0 380.0 \n", "\n", " depth_mm height_mm diameter_mm weight_g \n", "0 40.0 NaN NaN NaN \n", "1 45.0 NaN NaN NaN \n", "2 NaN NaN NaN NaN \n", "3 5.0 214.0 NaN NaN \n", "4 0.5 NaN NaN NaN \n", "\n", "[5 rows x 41 columns]" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_extent.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How big is the collection?" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "def calculate_volume(row):\n", " '''\n", " Look for 3 linear dimensions and multiply them to get a volume.\n", " '''\n", " # Create a list of valid linear measurements from the available fields\n", " dimensions = [d for d in [row['length_mm'], row['width_mm'], row['depth_mm'], row['height_mm'], row['diameter_mm']] if not math.isnan(d)]\n", " \n", " # If there's only 2 dimensions...\n", " if len(dimensions) == 2:\n", " # Set a default height of 1 for items with only 2 dimensions\n", " dimensions.append(1)\n", " \n", " # If there's 3 or more dimensions, multiple the first 3 together\n", " if len(dimensions) >= 3:\n", " volume = dimensions[0] * dimensions[1] * dimensions[2]\n", " else:\n", " volume = 0\n", " return volume\n", "\n", "df_extent['volume'] = df_extent.apply(calculate_volume, axis=1)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total length of objects is 15.36 km\n" ] } ], "source": [ "print('Total length of objects is {:.2f} km'.format(df_extent['length_mm'].sum() / 1000 / 1000))" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total weight of objects is 194.30 tonnes\n" ] } ], "source": [ "print('Total weight of objects is {:.2f} tonnes'.format(df_extent['weight_g'].sum() / 1000000))" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total volume of objects is 2873.14 m³\n" ] } ], "source": [ "print('Total volume of objects is {:.2f} m\\N{SUPERSCRIPT THREE}'.format(df_extent['volume'].sum() / 1000000000))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The biggest object?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the biggest thing?" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Percival Proctor Mk 1 monoplane VH-FEP" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get the object with the largest volume\n", "biggest = df_extent.loc[df_extent['volume'].idxmax()]\n", "\n", "# Create a link to Collection Explorer\n", "display(HTML('{}'.format(biggest['id'], biggest['title'])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).\n", "\n", "Work on this notebook was supported by the [Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab](https://tinker.edu.au/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }