# Find how many specimens of each species are in the Museums Victoria collection

In [another notebook](museumvic-get-a-list-of-species.ipynb) we harvested a list of species from the Museum of Victoria using their collection API and saved the results as a CSV file.

Here we'll search for specimens matching each of the species and save the total number of records.

We'll use these search parameters:

* `recordtype` which we'll set to 'specimen'
* `taxon` which we'll set the the species' taxon name

## Import what we need

In [1]:
import requests
from tqdm.auto import tqdm
import pandas as pd

In [2]:
SEARCH_URL = 'https://collections.museumsvictoria.com.au/api/search'

Load the CSV file containing the list of species.

In [3]:
df_species = pd.read_csv('museum-victoria-species.csv')
df_species.head()

Unnamed: 0,id,taxon_name,common_name
0,species/8583,Melangyna viridiceps,Common Hover Fly
1,species/8307,Tetractenos glaber,Smooth Toadfish
2,species/8815,Salticidae,Jumping Spider
3,species/8456,Hydromys chrysogaster,Common Water Rat
4,species/12377,Dromaius novaehollandiae,Emu


## Define some functions

In [4]:
def get_totals(params):
    '''
    Get the total number of results and pages returned by a search.
    '''
    response = requests.get(SEARCH_URL, params=params, headers={'User-Agent': 'Mozilla/5.0'})
    # The total results and pages values are in the API response's headers!
    total_results = int(response.headers['Total-Results'])
    total_pages = int(response.headers['Total-Pages'])
    return (total_results, total_pages)

def get_specimen_totals(species):
    '''
    Find the number of specimens matching each species.
    '''
    params = {
            'recordtype': 'specimen'
        }
    total_specimens = []
    for s in tqdm(species):
        params['taxon'] = s['taxon_name']
        total_results, _ = get_totals(params)
        s['total_specimens'] = total_results
        total_specimens.append(s)
    return total_specimens

## Download the data!

In [None]:
specimens = get_specimen_totals(df_species.to_dict('records'))

## Convert to a dataframe

In [6]:
df_specimens = pd.DataFrame(specimens)

Show the top twenty specimens by species!

In [7]:
# Sort the dataframe by total_results then show a slice of the first 20 records
df_specimens.sort_values(by='total_specimens', ascending=False)[:20]

Unnamed: 0,id,taxon_name,common_name,total_specimens
211,species/8463,Amphipoda,Amphipod,20655
1184,species/8483,Leptoceridae,Caddisfly,16639
1072,species/8494,Leptoceridae,Caddisfly larva,16639
1103,species/15127,Chrysomelidae,Eucalyptus Leaf Beetle,11534
204,species/8532,Castiarina,Jewel Beetle,9626
208,species/8480,Hydropsychidae,Caddisfly,8340
1079,species/8492,Hydropsychidae,Caddisfly larva,8340
459,species/15892,Ophiurida,Brittle Star,8318
226,species/8360,Litoria ewingii,Brown Tree Frog,6040
1196,species/8468,Ostracoda,Seed Shrimp,5925


## What next?

* How might you visualise these results?
* Could we include other taxonomic data to group the species?
* How could we get an image of each species (selected at random from matching specimens)? 

----

Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/).  Support me by becoming a [GitHub sponsor](https://github.com/sponsors/wragge)!