{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DOWNLOAD ALL THE ANTECHINUSES!\n", "\n", "In [another notebook](museumvic-find-specimens-of-each-species.ipynb) we discovered there are 2,883 Antechinus specimens in the Museum of Victoria. Let's see how many pictures we can find of them.\n", "\n", "![Antechinus total results](images/antechinus-totals.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import what we need" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "from slugify import slugify\n", "from pathlib import Path\n", "import os\n", "from tqdm.auto import tqdm" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "SEARCH_URL = 'https://collections.museumsvictoria.com.au/api/search'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define some functions" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def get_totals(params):\n", " '''\n", " Get the total number of results and pages returned by a search.\n", " '''\n", " response = requests.get(SEARCH_URL, params=params, headers={'User-Agent': 'Mozilla/5.0'})\n", " # The total results and pages values are in the API response's headers!\n", " total_results = int(response.headers['Total-Results'])\n", " total_pages = int(response.headers['Total-Pages'])\n", " return (total_results, total_pages)\n", "\n", "def download_images(species, size='small'):\n", " '''\n", " Download all the available images for the given specimen.\n", " '''\n", " # Create a directory to save the images in based on the species name\n", " image_path = Path('images', slugify(species))\n", " image_path.mkdir(exist_ok=True, parents=True)\n", " # Note the hasimages parameter to only get records with images\n", " params = {\n", " 'recordtype': 'specimen',\n", " 'taxon': species,\n", " 'perpage': 100,\n", " 'hasimages': 'yes',\n", " 'sort': 'date'\n", " }\n", " _, total_pages = get_totals(params)\n", " # Loop through the pages\n", " for page in tqdm(range(1, total_pages + 1)):\n", " response = requests.get(SEARCH_URL, params=params, headers={'User-Agent': 'Mozilla/5.0'})\n", " data = response.json()\n", " # Loop through the records\n", " for record in data:\n", " # Loop through the attached media (may be more than one image in a record)\n", " for media in record['media']:\n", " # Media items can be videos too\n", " if media['type'] == 'image':\n", " url = media[size]['uri']\n", " # Get the current image name\n", " image_name = os.path.basename(url)\n", " # Add the specimen id to the image name\n", " image_file = Path(image_path, f'{slugify(record[\"id\"])}-{image_name}')\n", " response = requests.get(url)\n", " # Save the image\n", " image_file.write_bytes(response.content) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DOWNLOAD THE ANTECHINUSES!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "download_images('Antechinus agilis')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results\n", "\n", "The code above creates a directory under `images` for each downloaded species." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While only one of the 2,883 antechinus records had images attached, there were multiple images in that record.\n", "\n", "![Antechinus thumbnails](images/antechinus-images.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What next?\n", "\n", "Obviously this little image download example could be modified to download things other than Antechinuses!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). Support me by becoming a [GitHub sponsor](https://github.com/sponsors/wragge)!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }