# Analyzing Bird Audio

- toc: false
- badges: true
- comments: true
- category: datascience
- description: An analysis of bird audio from Xeno Canto to determine if it was a song or a call.
- image: images/copied_from_nb/birds.png

![Analyzing Bird Audio Cover Photo](birds.png)

**Authors**: [Adithya Balaji](https://www.linkedin.com/in/adithyabsk/),
[Malika Khurana](https://www.linkedin.com/in/malikakhurana/)

> This report details our process for analyzing bird audio, with some snippets of code. You can find the full project
> repo [on github](https://github.com/adithyabsk/bird_audio).

In [None]:
import IPython.display as ipd
import requests

In [None]:
# hide_input
def display_code_block(url, start_line=1, end_line=-1):
    raw_audio = requests.get(url)
    text = "\n".join(raw_audio.content.decode().split("\n")[start_line - 1 : end_line])
    ipd.display(
        ipd.Markdown(
            f"""
```python
{text}
```
"""
        )
    )

## Introduction

We aim to accurately classify bird sounds as songs or calls. We used 3 different approaches and models based on
recording metadata, the audio data itself, and spectrogram images of the recording to perform this classification task.

<div class="d-flex flex-justify-center">
<div style="width:80%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/pipeline.png" alt="Pipeline Overview"/>
</div>
</div>

### Motivation

The primary motivation to address this problem is to make it easier for scientists to collect data on bird populations
and verify community-sourced labels.

The other motivation is more open-ended: to understand the "hidden" insights in bird sounds. Bird calls reveal regional
dialects, a sense of humor, information about predators in the area, indicators of ecosystem health—and inevitably also
the threat on their ecosystems posed by human activity. Through the process of exploring bird call audio data, we hope
we can build towards better understanding the impacts of the sounds produced by humans and become better listeners.

### Songs vs Calls

Bird sounds have a variety of different dimensions, but one of the first levels of categorizing bird sounds is
classifying them as a song or a call, as each have distinct functions and reveal different aspects of the birds’
ecology ([1](https://www.audubon.org/news/a-beginners-guide-common-bird-sounds-and-what-they-mean),
[2](https://www.youtube.com/watch?v=4_1zIwEENt8)).

#### Songs
Songs tend to be longer, more melodic, and used for marking territory and attracting mates. Birds' song repertoire and
song rate can indicate their health and the quality of their habitat, including pollutant levels and plant diversity
([3](https://en.wikipedia.org/wiki/Bird_vocalization#Function), [4](https://www.jstor.org/stable/20062442),
[5](https://www.fs.usda.gov/treesearch/pubs/46856)).


In [None]:
# song sparrow - song
ipd.Audio(
    requests.get(
        "https://github.com/adithyabsk/bird_audio/blob/main/notebooks/assets/574080.mp3?raw=true"
    ).content
)

#### Calls
Calls are shorter than songs, and perform a wider range of functions like signalling food, maintaining social cohesion
and contact, coordinating flight, resolving conflicts, and sounding alarms (distress, mobbing, hawk alarms) ([6](https://doi.org/10.1196/annals.1298.034)).
Bird alarm calls can be understood and passed along across species, and have been found to encode information about the
size and threat of a potential predator, so birds can respond accordingly - i.e. more intense mobbing for a higher
threat ([7](https://www.nationalgeographic.com/animals/article/nuthatches-chickadees-communication-danger),
[8](https://doi.org/10.1126/science.1108841)). Alarm calls can also give scientists an estimate of the number of
predators in an area.

In [None]:
# song sparrow - call
ipd.Audio(requests.get("https://github.com/adithyabsk/bird_audio/blob/main/notebooks/assets/585148.mp3?raw=true".content)

## Related Work
**Allometry of Alarm Calls: Black-Capped Chickadees Encode Information About Predator Size** ([8](https://doi.org/10.1126/science.1108841))
The number of D-notes in chickadee alarm mobbing calls varies indirectly with the size of predator.

**Gender identification using acoustic analysis in birds without external sexual dimorphism** ([9](https://doi.org/10.1186/s40657-015-0033-y))
Bird sounds were analyzed to classify gender. Important acoustic features were: fundamental frequency (mean, max,
count), note duration, syllable count and spacing, and amplitude modulation.

**Regional dialects have been discovered among many bird species, and the Yellowhammer is a great example** ([10](http://www.yellowhammers.net/about), [11](https://doi.org/10.1093/beheco/arz114))
Yellowhammer bird sounds in the Czech Republic and UK were studied to identify regional dialects, which differed in
frequency and length of final syllables.

## DVC

[Data Version Control (DVC)](https://dvc.org/) is a useful tool for data science projects. You can think of it like git
but for data. We built out our pipeline first in jupyter notebooks, and then in DVC, making it easy to change parameters
and run the full pipeline from one place.

<div class="flash">
<b>Note:</b> Due to the size of the datasets, we chose not to include inline Jupyter snippets of code processing real
data and instead opted to present only the outputs of the DVC scripts. (Python files, not notebooks)
</div>

## Collecting Data

For our analysis, we used audio files and metadata from [xeno-canto.org]. Xeno-canto (XC) is a website for collecting
and sharing audio recordings of birds. Recordings and identifications on XC are sourced from the community (anyone can
join).

<div class="d-flex flex-justify-center">
<div style="width:80%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/xenocanto.png" alt="Xeno Canto API Page"/>
</div>
</div>

XC has a [straightforward API](https://www.xeno-canto.org/explore/api) that allows us to make RESTful queries, and
specify a number of [filter parameters](https://www.xeno-canto.org/help/search) including country, species, recording
quality, and duration. We used the XC API to get metadata and IDs for all recordings in the United States, and saved the
JSON payload as a dataframe and csv. Below we see the main snippet of code from the DVC step that parallelizes data
collection from XC.

In [None]:
# hide_input
display_code_block(
    "https://raw.githubusercontent.com/adithyabsk/bird_audio/main/pracds_final/data/build_meta.py",
    45,
    80,
)


```python
def search_recordings(**query_params) -> List[JSON]:
    """Search for recordings using the Xeno Canto search API

    The keys in the return dictionaries are specified in the API docs

    https://www.xeno-canto.org/explore/api


    Args:
        **query_params: A dictionary with query and/or page as keys with values as specified in the
            API docs.

    Returns:
        A list of recording information (list of dictionaries)

    """
    url = (BASE_URL / "recordings").with_query(query_params)
    resp = requests.get(str(url))
    if resp.status_code == 200:
        resp_json = resp.json()
        num_pages = resp_json["numPages"]
        recordings = resp_json["recordings"]
        if num_pages > 1:
            page_urls = [url.update_query(page=p) for p in range(2, num_pages + 1)]
            with ProcessPoolExecutor(max_workers=10) as ppe:
                recordings.extend(
                    itertools.chain(
                        *tqdm(
                            ppe.map(get_page_recordings, page_urls), total=num_pages - 1
                        )
                    )
                )
    else:
        raise Exception(f"Request failed with status code: {resp.status_code}")

    return recordings
```


## Filtering & Labeling
Through our DVC pipeline, we further filtered by the top 220 unique species, recordings under 20 seconds, recording
quality A or B, and recordings with spectrograms available on XC. This reduced our dataset size from ~60,000 to get a
dataframe of 5,800 recordings. We created labels (1 for call, 0 for song) by parsing the 'type' column of the df.

The following scripts handle that process:

1. [build_filter.py](https://github.com/adithyabsk/bird_audio/blob/main/pracds_final/data/build_filter.py)
1. [build_song_vs_call.py](https://github.com/adithyabsk/bird_audio/blob/main/pracds_final/data/build_song_vs_call.py)
1. [proc_svc_meta.py](https://github.com/adithyabsk/bird_audio/blob/main/pracds_final/features/proc_svc_meta.py)

## Exploring & Visualizing Data

With our dataset assembled, we began exploring it visually. A distribution of recordings by genus, with song-call splits
shows that the genus most represented in the dataset are warblers (*Setophaga*) with many more songs than call
recordings. We can also see that, as expected, woodpeckers (*Melanerpes*), jays, magpies, and crows (*Cyanocitta*,
*Corvus*) have almost no song recordings in the dataset.

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/svc_count_vs_genus.png" alt="Count vs Genus for the Top 20 Largest Genus"/>
</div>
</div>

A map of recording density shows the regions most represented in the dataset which are, unsurprisingly, bird watching
hot spots.

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/svc_sample_density_usa.png" alt="Observation Count KDE Plot"/>
</div>
</div>

Given our domain knowledge that songs serve an important function in mating, we expected to see a higher proportion of
songs in the spring, which is confirmed by the data.

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/svc_vs_month.png" alt="Song and Call Percent vs Month"/>
</div>
</div>

## Metadata Classification Model

In our first model, we used the tabular metadata from XC entries to train a Gradient Boosted Decision Tree (GBDT) model using
[XGBoost](https://xgboost.readthedocs.io/en/latest/). XGBoost, is a particular Python implementation of GBDTs that is
designed to work on large amounts of data.

<div class="d-flex flex-justify-center">
<a href="https://xgboost.readthedocs.io/en/latest/" style="width:50%">
<img src="https://raw.githubusercontent.com/dmlc/dmlc.github.io/master/img/logo-m/xgboost.png" alt="XGBoost Logo"/>
</a>
</div>

We used the genus, species, English name, and location (latitude and longitude) from XC metadata. These features were
then all mapped and imputed using sklearn transformers to one-hot encoded form apart from latitude, longitude, and time
(all mapped using standard or min-max scaling, and time features transformed with a sin function). We can see 10 rows of
unprocessed data in the HTML table below.

In [None]:
# hide_input
ipd.display(
    ipd.HTML(
        "<table border=\"1\" class=\"dataframe sample table table-striped\"><thead><tr style=\"text-align: right;\"><th></th><th>df_index</th><th>id</th><th>gen</th><th>sp</th><th>ssp</th><th>en</th><th>rec</th><th>cnt</th><th>loc</th><th>lat</th><th>lng</th><th>alt</th><th>type</th><th>url</th><th>file</th><th>file-name</th><th>sono</th><th>lic</th><th>q</th><th>length</th><th>time</th><th>date</th><th>uploaded</th><th>also</th><th>rmk</th><th>bird-seen</th><th>playback-used</th><th>pred</th><th>gender</th><th>age</th><th>month</th><th>day</th><th>hour</th><th>minute</th></tr></thead><tbody><tr><th>0</th><td>96</td><td>454911</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Bruce Lagerquist</td><td>United States</td><td>Sedro-Woolley, Skagit County, Washington</td><td>48.5237</td><td>-122.0185</td><td>30</td><td>call</td><td>//www.xeno-canto.org/454911</td><td>//www.xeno-canto.org/454911/download</td><td>XC454911-190202_02 Canadian Geese.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>A</td><td>0:17</td><td>1900-01-01 11:30:00</td><td>2019-02-02</td><td>2019-02-04</td><td>['Cygnus buccinator']</td><td>Mixed flock of Trumpeter Swans and Canada Geese feeding in an agricultural field. Recording of Swan's here XC454910</td><td>yes</td><td>no</td><td>1</td><td>NaN</td><td>NaN</td><td>2.0</td><td>2.0</td><td>11.0</td><td>30.0</td></tr><tr><th>1</th><td>97</td><td>418340</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Sue Riffe</td><td>United States</td><td>Au Sable SF - Big Creek Rd, Michigan</td><td>44.0185</td><td>-83.7560</td><td>180</td><td>song</td><td>//www.xeno-canto.org/418340</td><td>//www.xeno-canto.org/418340/download</td><td>XC418340-Canada Goose on 5.11.18 at Au Sable SF MI at 11.20 for .14 _0908 .mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>A</td><td>0:14</td><td>1900-01-01 11:20:00</td><td>2018-05-11</td><td>2018-06-03</td><td>['Agelaius phoeniceus']</td><td>Natural vocalization</td><td>yes</td><td>no</td><td>0</td><td>NaN</td><td>NaN</td><td>5.0</td><td>11.0</td><td>11.0</td><td>20.0</td></tr><tr><th>2</th><td>107</td><td>291051</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Eric Hough</td><td>United States</td><td>San Juan River, Cottonwood Day-Use Area, Navajo Lake State Park, San Juan County, New Mexico</td><td>36.8068</td><td>-107.6789</td><td>1800</td><td>call</td><td>//www.xeno-canto.org/291051</td><td>//www.xeno-canto.org/291051/download</td><td>XC291051-CANG_11515_1730_SanJuanRiver-NavajoDam.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>A</td><td>0:15</td><td>1900-01-01 17:30:00</td><td>2015-11-15</td><td>2015-11-18</td><td>['']</td><td>Flock calling while flying over at dusk. Amplification, low and high pass filters used in Audacity.</td><td>yes</td><td>no</td><td>1</td><td>NaN</td><td>NaN</td><td>11.0</td><td>15.0</td><td>17.0</td><td>30.0</td></tr><tr><th>3</th><td>108</td><td>283618</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Garrett MacDonald</td><td>United States</td><td>Beluga--North Bog, Kenai Peninsula Borough, Alaska</td><td>61.2089</td><td>-151.0103</td><td>40</td><td>call, flight call</td><td>//www.xeno-canto.org/283618</td><td>//www.xeno-canto.org/283618/download</td><td>XC283618-LS100466.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>A</td><td>0:10</td><td>1900-01-01 11:00:00</td><td>2015-05-20</td><td>2015-10-03</td><td>['']</td><td>Natural vocalizations from a pair of birds in flight. Recording not modified.</td><td>yes</td><td>no</td><td>1</td><td>NaN</td><td>NaN</td><td>5.0</td><td>20.0</td><td>11.0</td><td>0.0</td></tr><tr><th>4</th><td>110</td><td>209702</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Albert @ Max lastukhin</td><td>United States</td><td>Oyster Bay (near Lattingtown), Nassau, New York</td><td>40.8881</td><td>-73.5851</td><td>10</td><td>call</td><td>//www.xeno-canto.org/209702</td><td>//www.xeno-canto.org/209702/download</td><td>XC209702-Poecile atricapillus Dec_27,_2014,_4_05_PM,C1.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>A</td><td>0:11</td><td>1900-01-01 16:00:00</td><td>2014-12-27</td><td>2015-01-09</td><td>['Poecile atricapillus']</td><td>NaN</td><td>yes</td><td>no</td><td>1</td><td>NaN</td><td>NaN</td><td>12.0</td><td>27.0</td><td>16.0</td><td>0.0</td></tr><tr><th>5</th><td>118</td><td>165398</td><td>Branta</td><td>canadensis</td><td>parvipes</td><td>Canada Goose</td><td>Ted Floyd</td><td>United States</td><td>Boulder, Colorado</td><td>40.0160</td><td>-105.2765</td><td>1600</td><td>call</td><td>//www.xeno-canto.org/165398</td><td>//www.xeno-canto.org/165398/download</td><td>XC165398-CanG for Xeno-Canto.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/3.0/</td><td>A</td><td>0:19</td><td>1900-01-01 09:30:00</td><td>2014-01-24</td><td>2014-01-25</td><td>['']</td><td>A large flock of Canada Geese taking off. I believe most of the birds in this flock were parvipes (\"Lesser\") Canada Geese, but there were also larger (subspecies moffitti?) Canada Geese and a few Cackling Geese (several of the subspecies hutchinsii and possibly one of the subspecies minima) in the general vicinity. \r\n\r\nIn the old days this would have been an \"obvious\" or \"easy\" flock of \"Canada Geese.\" Now we're dealing with perhaps two species and probably two or three subspecies in the recording. Again, I believe most of the birds audible here are parvipes (\"Lesser\") Canada Geese.</td><td>yes</td><td>no</td><td>1</td><td>NaN</td><td>NaN</td><td>1.0</td><td>24.0</td><td>9.0</td><td>30.0</td></tr><tr><th>6</th><td>129</td><td>1136</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Don Jones</td><td>United States</td><td>Brace Road, Southampton, NJ</td><td>39.9337</td><td>-74.7170</td><td>?</td><td>song</td><td>//www.xeno-canto.org/1136</td><td>//www.xeno-canto.org/1136/download</td><td>bird034.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-full.png'}</td><td>//creativecommons.org/licenses/by-nc-nd/2.5/</td><td>A</td><td>0:10</td><td>NaT</td><td>1997-10-17</td><td>2008-11-20</td><td>['']</td><td>NaN</td><td>unknown</td><td>unknown</td><td>0</td><td>NaN</td><td>NaN</td><td>10.0</td><td>17.0</td><td>NaN</td><td>NaN</td></tr><tr><th>7</th><td>132</td><td>536877</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Sue Riffe</td><td>United States</td><td>S Cape May Meadows, Cape May Cty, New Jersey</td><td>38.9381</td><td>-74.9446</td><td>0</td><td>adult, call, sex uncertain</td><td>//www.xeno-canto.org/536877</td><td>//www.xeno-canto.org/536877/download</td><td>XC536877-Canada Goose on 10.18.19 at S Cape May Meadows NJ at 18.52 for .19.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>B</td><td>0:19</td><td>1900-01-01 18:52:00</td><td>2019-10-18</td><td>2020-03-21</td><td>['Charadrius vociferus']</td><td>Natural vocalization of a flock of geese landing on the water near sunset. Windy</td><td>yes</td><td>no</td><td>1</td><td>NaN</td><td>adult</td><td>10.0</td><td>18.0</td><td>18.0</td><td>52.0</td></tr><tr><th>8</th><td>133</td><td>511453</td><td>Branta</td><td>canadensis</td><td>NaN</td><td>Canada Goose</td><td>Phoenix Birder</td><td>United States</td><td>Gilbert, Maricopa County, Arizona</td><td>33.3634</td><td>-111.7341</td><td>380</td><td>adult, call, female, male</td><td>//www.xeno-canto.org/511453</td><td>//www.xeno-canto.org/511453/download</td><td>XC511453-CaGo.2019.12.10.AZ.Maricopa.RiparianPreserve.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>B</td><td>0:19</td><td>1900-01-01 08:52:00</td><td>2019-12-10</td><td>2019-12-10</td><td>['Toxostoma curvirostre']</td><td>Sound Devices MixPre-3 Wildtronics Stereo Model #WTPMMSA 22” Parabolic Reflector, phoenixbirder@gmail.com</td><td>yes</td><td>no</td><td>1</td><td>male</td><td>adult</td><td>12.0</td><td>10.0</td><td>8.0</td><td>52.0</td></tr><tr><th>9</th><td>134</td><td>504983</td><td>Branta</td><td>canadensis</td><td>canadensis</td><td>Canada Goose</td><td>nick talbot</td><td>United States</td><td>Central Park, New York city,USA</td><td>40.7740</td><td>-73.9710</td><td>20</td><td>call</td><td>//www.xeno-canto.org/504983</td><td>//www.xeno-canto.org/504983/download</td><td>XC504983-2019_10_21 Branta canadensis2.mp3</td><td>{'small': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-full.png'}</td><td>//creativecommons.org/licenses/by-nc-sa/4.0/</td><td>B</td><td>0:13</td><td>1900-01-01 13:00:00</td><td>2019-10-21</td><td>2019-10-30</td><td>['']</td><td>A pair of birds calling from a lake</td><td>yes</td><td>no</td><td>1</td><td>NaN</td><td>NaN</td><td>10.0</td><td>21.0</td><td>13.0</td><td>0.0</td></tr></tbody></table>"
    )
)

Unnamed: 0,df_index,id,gen,sp,ssp,en,rec,cnt,loc,lat,lng,alt,type,url,file,file-name,sono,lic,q,length,time,date,uploaded,also,rmk,bird-seen,playback-used,pred,gender,age,month,day,hour,minute
0,96,454911,Branta,canadensis,,Canada Goose,Bruce Lagerquist,United States,"Sedro-Woolley, Skagit County, Washington",48.5237,-122.0185,30,call,//www.xeno-canto.org/454911,//www.xeno-canto.org/454911/download,XC454911-190202_02 Canadian Geese.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/JHFICMRVUX/ffts/XC454911-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,A,0:17,1900-01-01 11:30:00,2019-02-02,2019-02-04,['Cygnus buccinator'],Mixed flock of Trumpeter Swans and Canada Geese feeding in an agricultural field. Recording of Swan's here XC454910,yes,no,1,,,2.0,2.0,11.0,30.0
1,97,418340,Branta,canadensis,,Canada Goose,Sue Riffe,United States,"Au Sable SF - Big Creek Rd, Michigan",44.0185,-83.756,180,song,//www.xeno-canto.org/418340,//www.xeno-canto.org/418340/download,XC418340-Canada Goose on 5.11.18 at Au Sable SF MI at 11.20 for .14 _0908 .mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC418340-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,A,0:14,1900-01-01 11:20:00,2018-05-11,2018-06-03,['Agelaius phoeniceus'],Natural vocalization,yes,no,0,,,5.0,11.0,11.0,20.0
2,107,291051,Branta,canadensis,,Canada Goose,Eric Hough,United States,"San Juan River, Cottonwood Day-Use Area, Navajo Lake State Park, San Juan County, New Mexico",36.8068,-107.6789,1800,call,//www.xeno-canto.org/291051,//www.xeno-canto.org/291051/download,XC291051-CANG_11515_1730_SanJuanRiver-NavajoDam.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/BCFUZDOSJZ/ffts/XC291051-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,A,0:15,1900-01-01 17:30:00,2015-11-15,2015-11-18,[''],"Flock calling while flying over at dusk. Amplification, low and high pass filters used in Audacity.",yes,no,1,,,11.0,15.0,17.0,30.0
3,108,283618,Branta,canadensis,,Canada Goose,Garrett MacDonald,United States,"Beluga--North Bog, Kenai Peninsula Borough, Alaska",61.2089,-151.0103,40,"call, flight call",//www.xeno-canto.org/283618,//www.xeno-canto.org/283618/download,XC283618-LS100466.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/CDHIAMGTRT/ffts/XC283618-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,A,0:10,1900-01-01 11:00:00,2015-05-20,2015-10-03,[''],Natural vocalizations from a pair of birds in flight. Recording not modified.,yes,no,1,,,5.0,20.0,11.0,0.0
4,110,209702,Branta,canadensis,,Canada Goose,Albert @ Max lastukhin,United States,"Oyster Bay (near Lattingtown), Nassau, New York",40.8881,-73.5851,10,call,//www.xeno-canto.org/209702,//www.xeno-canto.org/209702/download,"XC209702-Poecile atricapillus Dec_27,_2014,_4_05_PM,C1.mp3","{'small': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/LELYWQKUZX/ffts/XC209702-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,A,0:11,1900-01-01 16:00:00,2014-12-27,2015-01-09,['Poecile atricapillus'],,yes,no,1,,,12.0,27.0,16.0,0.0
5,118,165398,Branta,canadensis,parvipes,Canada Goose,Ted Floyd,United States,"Boulder, Colorado",40.016,-105.2765,1600,call,//www.xeno-canto.org/165398,//www.xeno-canto.org/165398/download,XC165398-CanG for Xeno-Canto.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/KADPGEQPZI/ffts/XC165398-full.png'}",//creativecommons.org/licenses/by-nc-sa/3.0/,A,0:19,1900-01-01 09:30:00,2014-01-24,2014-01-25,[''],"A large flock of Canada Geese taking off. I believe most of the birds in this flock were parvipes (""Lesser"") Canada Geese, but there were also larger (subspecies moffitti?) Canada Geese and a few Cackling Geese (several of the subspecies hutchinsii and possibly one of the subspecies minima) in the general vicinity. In the old days this would have been an ""obvious"" or ""easy"" flock of ""Canada Geese."" Now we're dealing with perhaps two species and probably two or three subspecies in the recording. Again, I believe most of the birds audible here are parvipes (""Lesser"") Canada Geese.",yes,no,1,,,1.0,24.0,9.0,30.0
6,129,1136,Branta,canadensis,,Canada Goose,Don Jones,United States,"Brace Road, Southampton, NJ",39.9337,-74.717,?,song,//www.xeno-canto.org/1136,//www.xeno-canto.org/1136/download,bird034.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/BCWZQTGMSO/ffts/XC1136-full.png'}",//creativecommons.org/licenses/by-nc-nd/2.5/,A,0:10,NaT,1997-10-17,2008-11-20,[''],,unknown,unknown,0,,,10.0,17.0,,
7,132,536877,Branta,canadensis,,Canada Goose,Sue Riffe,United States,"S Cape May Meadows, Cape May Cty, New Jersey",38.9381,-74.9446,0,"adult, call, sex uncertain",//www.xeno-canto.org/536877,//www.xeno-canto.org/536877/download,XC536877-Canada Goose on 10.18.19 at S Cape May Meadows NJ at 18.52 for .19.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/PVQOLRXXWL/ffts/XC536877-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,B,0:19,1900-01-01 18:52:00,2019-10-18,2020-03-21,['Charadrius vociferus'],Natural vocalization of a flock of geese landing on the water near sunset. Windy,yes,no,1,,adult,10.0,18.0,18.0,52.0
8,133,511453,Branta,canadensis,,Canada Goose,Phoenix Birder,United States,"Gilbert, Maricopa County, Arizona",33.3634,-111.7341,380,"adult, call, female, male",//www.xeno-canto.org/511453,//www.xeno-canto.org/511453/download,XC511453-CaGo.2019.12.10.AZ.Maricopa.RiparianPreserve.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/UKNISVRBBF/ffts/XC511453-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,B,0:19,1900-01-01 08:52:00,2019-12-10,2019-12-10,['Toxostoma curvirostre'],"Sound Devices MixPre-3 Wildtronics Stereo Model #WTPMMSA 22” Parabolic Reflector, phoenixbirder@gmail.com",yes,no,1,male,adult,12.0,10.0,8.0,52.0
9,134,504983,Branta,canadensis,canadensis,Canada Goose,nick talbot,United States,"Central Park, New York city,USA",40.774,-73.971,20,call,//www.xeno-canto.org/504983,//www.xeno-canto.org/504983/download,XC504983-2019_10_21 Branta canadensis2.mp3,"{'small': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-small.png', 'med': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-med.png', 'large': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-large.png', 'full': '//www.xeno-canto.org/sounds/uploaded/CCUCXWCPSW/ffts/XC504983-full.png'}",//creativecommons.org/licenses/by-nc-sa/4.0/,B,0:13,1900-01-01 13:00:00,2019-10-21,2019-10-30,[''],A pair of birds calling from a lake,yes,no,1,,,10.0,21.0,13.0,0.0


Here we also see a snippet of the data transformation pipeline and model training code which was done in
[the following jupyter notebook](https://github.com/adithyabsk/bird_audio/blob/main/notebooks/4.0-ab-metadata-model.ipynb).

In [None]:
# hide_input
metadata_notebook = requests.get(
    "https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/4.0-ab-metadata-model.ipynb"
).json()
mapping_snippet = "".join(metadata_notebook["cells"][7]["source"][30:])
traintest_snippet = "".join(metadata_notebook["cells"][8]["source"])
xgb_snippet = "".join(metadata_notebook["cells"][10]["source"])
ipd.display(
    ipd.Markdown(
        f"""
```python
{mapping_snippet}

{traintest_snippet}

{xgb_snippet}
```
"""
    )
)


```python
feature_mapper = DataFrameMapper(
    [
        ("id", None),
        (["gen"], OneHotEncoder(drop_invariant=True, use_cat_names=True)),
        (["sp"], OneHotEncoder(drop_invariant=True, use_cat_names=True)),
        (["en"], OneHotEncoder(drop_invariant=True, use_cat_names=True)),
        (["lat"], [SimpleImputer(), StandardScaler()]),  # gaussian
        (["lng"], [SimpleImputer(), MinMaxScaler()]),  # bi-modal --> MinMaxScaler
        # TODO: maybe later look into converting month / day into days since start of year
        (
            ["month"],
            [
                SimpleImputer(),
                FunctionTransformer(lambda X: np.sin((X - 1) * 2 * np.pi / 12)),
                StandardScaler(),  # gaussian
            ],
        ),
        (
            ["day"],
            [
                SimpleImputer(),
                FunctionTransformer(lambda X: np.sin(X * 2 * np.pi / 31)),
                MinMaxScaler(),  # uniform
            ],
        ),
        # TODO: maybe later look into converting hour / minute into seconds since start of day
        (
            ["hour"],
            [
                SimpleImputer(),
                FunctionTransformer(lambda X: np.sin(X * 2 * np.pi / 24)),
                StandardScaler(),  # gaussian
            ],
        ),
        (
            ["minute"],
            [
                SimpleImputer(),
                FunctionTransformer(lambda X: np.sin(X * 2 * np.pi / 60)),
                MinMaxScaler(),  # uniform
            ],
        ),
    ],
    df_out=True,
)

X_feat_df = feature_mapper.fit_transform(X_df, y_df["pred"])
X_train, X_test = (
    X_feat_df[X_feat_df.id.isin(train_ids)].drop(columns=["id"]),
    X_feat_df[X_feat_df.id.isin(test_ids)].drop(columns=["id"]),
)
y_train, y_test = (
    y_df[y_df.id.isin(train_ids)].drop(columns=["id"]).squeeze(),
    y_df[y_df.id.isin(test_ids)].drop(columns=["id"]).squeeze(),
)

xgb_clf = xgb.XGBClassifier()
eval_set = [(X_train, y_train), (X_test, y_test)]
xgb_clf.fit(
    X_train, y_train, eval_metric=["error", "logloss"], eval_set=eval_set, verbose=False
)

print(xgb_clf.score(X_test, y_test))
```


## Audio Classification Model

In one model we used the bird audio recordings themselves (mp3 and wav files), converted into time series arrays using
[librosa](https://librosa.org/) and processed with [tsfresh](https://tsfresh.readthedocs.io/en/latest/index.html) to
extract features, which we used to train a Gradient Boosted Tree model.

<div class="d-flex flex-justify-center">
<a href="https://github.com/blue-yonder/tsfresh" style="width:35%">
<img src="https://i.imgur.com/cYryjIn.png" alt="ts-fresh Logo"/>
</a>
</div>

### Building Audio Features

We ran audio data through a high-pass [Butterworth filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.butter.html)
to take out background noise. We tested different parameters for Butterworth and Firwin filters, then examined resulting
spectrograms and audio to determine which best reduced background noise without clipping bird sound frequencies.

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/audio_comparing_filters.png" alt="Filter Comparisons"/>
</div>
</div>

The below code snippet shows the process of loading the `.mp3` file and performing the above filtering steps before
saving as a `pd.DataFrame` which is what ts-fresh expects.

In [None]:
# hide_input
display_code_block(
    "https://raw.githubusercontent.com/adithyabsk/bird_audio/main/pracds_final/features/proc_audio.py",
    18,
    46,
)


```python
def unpack_audio(recordings_path: Path, id, filter_order, cutoff_freq):
    """Load mp3 or wav into a floating point time series, then run a high-pass filter.

    Args:
        recordings_path: The path to the recordings dir
        id: The id for the audio recording to unpack
        filter_order: The order for the Butter filter
        cutoff_freq: The critical frequency for the Butter filter (below this is filtered out)

    Returns:
        A df of filtered time series data for the given id, with 'id', 'time', and 'val' columns
    """
    try:
        audio_path = FILE_PATH / ("data/raw/recordings/" + str(id) + ".mp3")
        # load mp3 as audio timeseries arr
        timeseries, sr = librosa.load(audio_path)
    except FileNotFoundError:
        audio_path = FILE_PATH / ("data/raw/recordings/" + str(id) + ".wav")
        timeseries, sr = librosa.load(audio_path)

    # high-pass filter on audio timeseries
    timeseries_filt = highpass_filter(timeseries, sr, filter_order, cutoff_freq)

    df = pd.DataFrame(timeseries_filt, columns=["val"])
    df.reset_index(inplace=True)
    df["id"] = id  # fill col with id
    df = df.reindex(columns=["id", "index", "val"])
    df.columns = ["id", "time", "val"]
    return df
```


#### Feature Selection & Extraction

We used ts-fresh to featurize each audio array after unpacking and filtering to avoid running out of memory. ts-fresh
takes in dataframes with an id column, time column, and value column.

<div class="d-flex flex-justify-center">
<div style="width:50%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/timeseriesdf_singleid.png" alt="Time Series Input DF for a Single ID"/>
</div>
</div>

ts-fresh provides feature calculator presets, but due to their and `librosa.load`'s long runtimes (13+ hours for 5% of
the dataset), we manually specified the following small set of features based on our domain understanding of bird audio
analysis.

Lastly, we passed this "static" time series feature dataframe into a similar XGBoost model (from above) to predict the
output class.

In [None]:
# hide_input
display_code_block(
    "https://raw.githubusercontent.com/adithyabsk/bird_audio/main/pracds_final/features/proc_audio.py",
    71,
    102,
)


```python
manual_fc_params = {
    "abs_energy": None,
    "fft_aggregated": [{"aggtype": "centroid"}, {"aggtype": "kurtosis"}],
    "root_mean_square": None,
    "spkt_welch_density": [{"coeff": 2}, {"coeff": 5}, {"coeff": 8}],
}

# select features to calculate
# features can be found here: https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_extraction.html#tsfresh.feature_extraction.feature_calculators.fft_aggregated
def featurize_audio(id, fc_params):
    return extract_features(
        unpack_audio(id),
        column_id="id",
        column_sort="time",
        default_fc_parameters=fc_params,
        disable_progressbar=True,
        # we impute = remove all NaN features automatically
        impute_function=impute,
        # turn off parallelization
        n_jobs=0,
    )


# featurize dataset
# returns df of all combined
def featurize_set(ids, fc_params=None):
    if fc_params is None:
        fc_params = EfficientFCParameters()
    X_df = pd.DataFrame()
    for id in tqdm(ids):
        X_df = pd.concat([X_df, featurize_audio(id, fc_params)])
    return X_df
```


<div class="d-flex flex-justify-center">
<div style="width:85%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/audio_Xdf.png" alt="Feat Output for all IDs"/>
</div>
</div>

## Spectrogram Classification Model

[Training Notebook Link](https://github.com/adithyabsk/bird_audio/blob/main/notebooks/5.0-ab-sonogram-model.ipynb)

We used a computer vision approach to analyze spectrograms using [fast.ai](https://docs.fast.ai/) pre-trained model. We use an [`xresnet18`](https://github.com/fastai/fastai/blob/d7779196359c8e497a80e2f7f85c327318777c1a/fastai/vision/models/xresnet.py#L64) architecture pre-trained on ImageNet.

<div class="d-flex flex-justify-center">
<a href="https://github.com/fastai/fastai" style="width:50%">
<img src="https://fastpages.fast.ai/images/logo.png" alt="fast.ai Logo"/>
</a>
</div>

We load the data using fast.ai's `ImageDataLoader`. The model is then cut at the pooling layer (frozen weights)  and
then trained on its last layers to utilize transfer learning on our spectrogram images. A diagram of the architecture
pulled directly from the original resnet paper is included below.

<div class="d-flex flex-justify-center">
<div style="width:65%">
<img src="https://yann-leguilly.gitlab.io/img/bagstricks/bag_tricks_figure_1.webp" alt="ResNet50 Architecture"/>
</div>
</div>

The model itself was trained on a Tesla K80 using [Google Colab](https://colab.research.google.com/signup) to speed up
the training process. Additionally, we used [Weights and Biases](https://wandb.ai/site) to track the training and
improve the model tuning. We've listed the main snippets of code below that handle the training process.

In [None]:
# hide_input
sonogram_notebook = requests.get(
    "https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/5.0-ab-sonogram-model.ipynb"
).json()
datasetup_snippet = "".join(sonogram_notebook["cells"][5]["source"])
smodel_train_snippet = "".join(sonogram_notebook["cells"][6]["source"])
ipd.display(
    ipd.Markdown(
        f"""
```python
{datasetup_snippet}

{smodel_train_snippet}
```
"""
    )
)


```python
bs = 128  # Batch size
kwargs = {}
if IS_COLAB:
    kwargs["num_workers"] = 0
data = (
    # convert_mode is passed on intern|ally to the relevant function that will handle converting the images;
    # 'L' results in one color channel
    ImageDataLoaders.from_df(
        image_df,
        folder=ROOT_PATH / "data/raw/sonograms",
        valid_col="is_valid",
        bs=bs,
        # num_works needs to be set to 0 for local evaluation to turn off multiprocessing
        **kwargs,
    )
)
learn = cnn_learner(data, xresnet.xresnet18, pretrained=True)

# Make sure this path exists on colab
fname = "sono_model.pth"
model_path = (ROOT_PATH / f"models/{fname}").resolve().absolute()
if IS_COLAB and TRAIN:
    # Fine tune model
    wandb.init(project="sono-model")
    learn.fit_one_cycle(1, cbs=WandbCallback())
    # GDrive fails when you try to use mkdir
    # so we manually call `save_model`
    save_path = f"/home/{fname}"
    save_model(save_path, learn.model, getattr(learn, "opt", None))
    %ls -al /home
    from google.colab import files

    files.download(save_path)
else:
    load_model(model_path, learn.model, learn.opt)
```


## Results

Across our three models, we achieved scores in a range of 64-77%. This is above the baseline score of 55% (mean of
labels), and we believe with more time to tune and ensemble the models, one could achieve an even more accurate
classifier. We are encouraged by the amount of room both the time series based and sonogram based models have for
improvement given that the metadata model wipes the floor in terms of accuracy.

<center>

| Model | Train Log Loss | Test Log Loss | Train Accuracy | Test Accuracy |
|-|-|-|-|-|
| Metadata Model | 0.331 | 0.507 | 0.879 | 0.773 |
| Audio Model | 0.255 | 0.694 | 0.957 | 0.639 |
| Spectrogram Model | 0.661 | 0.675 | 0.682 | 0.682 |
| Baseline | 0.69 | 0.55 | 0.55 | 0.54 |

</center>

### Plots

#### Metadata Model

We note a plateau in the XGBoost validation accuracy which tends to suggest that further improvements in early stopping may
be achieved.

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/svc_meta_xgb_loss.png" alt="XGBoost Log Loss"/>
</div>
</div>

Additionally, due to the nature of the decision tree based model we are able to compute feature importance. The most
important features include the genera - this is not so surprising when we recall our genus-count distribution and see
that the genera here are mostly those with recordings that are almost entirely songs or calls. The other important
feature is month - again, we recall that in the spring the ratio of songs to calls goes up, so time of year is a "good"
feature.

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/svc_meta_xgb_imp.png" alt="XGBoost Feature Importance"/>
</div>
</div>

#### Time Series Model

We can see that the test loss increases due to over-fitting, also evidenced by the very high training accuracy. This is
a potential area of improvement in further research.

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/svc_audiotime_xgb_loss.png" alt="LogReg Log Loss"/>
</div>
</div>

#### Spectrogram Model

This is the direct output from WandB which depicts the training process for the fine-tuned xresnet model. It is
important to note that the X axis is steps and not epochs as this model was only trained for a single epoch (to save
time and memory).

<div class="d-flex flex-justify-center">
<div style="width:75%">
<img src="https://raw.githubusercontent.com/adithyabsk/bird_audio/main/notebooks/assets/cnn_train_loss.png" alt="Wandb Train Ouput"/>
</div>
</div>

## Future Work

We would like to note that there are a couple of immediate next steps that the project could take to dramatically
improve the model performance

- Ensembling the 3 models using a [`VotingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html)
- More training time for the Spectrogram model (only 30 minutes was provided for fine-tuning)
  * Additional epochs (only 1 epoch was provided)
- Filtering features in the audio classification model (ts-fresh likely generates more features than are needed)

Long term: integrate model with Xeno Canto to provide tag suggestions based on the audio clip

## Conclusion

The classification of song vs call is the first distinction one can make in bird audio data across species, and on its
own can give insights into the number of predators in an ecosystem, the timing of mating season, and other behaviors.
It could also be valuable when part of a larger system of models. This report presents a promising start to tackle this
problem with three separate machine learning models with reasonable accuracy. These models will likely prove quite handy
in downstream classification tasks that look to find species, gender, location, and other parameters from the bird audio
sample.

## References
1. "A Beginner’s Guide to Common Bird Sounds and What They Mean." [*Audubon.org.*](https://www.audubon.org/news/a-beginners-guide-common-bird-sounds-and-what-they-mean)
2. "Two Types of Communication Between Birds: Understanding Bird Language Songs And Calls." [*Youtube.*](https://www.youtube.com/watch?v=4_1zIwEENt8)
3. "Bird Vocalization." [*Wikipedia.*](https://en.wikipedia.org/wiki/Bird_vocalization#Function)
4. Gorissen, Leen, et al. “Heavy Metal Pollution Affects Dawn Singing Behaviour in a Small Passerine Bird.” *Oecologia*, vol. 145, no. 3, 2005, pp. 504–509. [JSTOR](https://www.jstor.org/stable/20062442)
5. Ortega, Yvette K.; Benson, Aubree; Greene, Erick. 2014. Invasive plant erodes local song diversity in a migratory passerine. *Ecology.* 95(2): 458-465. [Ecological Society of America](https://www.fs.usda.gov/treesearch/pubs/46856)
6. Marler, P. (2004), Bird Calls: Their Potential for Behavioral Neurobiology. Annals of the New York Academy of Sciences, 1016: 31-44. [https://doi.org/10.1196/annals.1298.034](https://doi.org/10.1196/annals.1298.034)
7. "These birds 'retweet' alarm calls—but are careful about spreading rumors." [*National Geographic.*](https://www.nationalgeographic.com/animals/article/nuthatches-chickadees-communication-danger)
8. Templeton, Christopher N., et al. “Allometry of Alarm Calls: Black-Capped Chickadees Encode Information About Predator Size.” Science, vol. 308, no. 5730, American Association for the Advancement of Science, 2005, pp. 1934–37, [doi:10.1126/science.1108841](https://doi.org/10.1126/science.1108841).
9. Volodin, I.A., Volodina, E.V., Klenova, A.V. et al. Gender identification using acoustic analysis in birds without external sexual dimorphism. Avian Res 6, 20 (2015). [https://doi.org/10.1186/s40657-015-0033-y](https://doi.org/10.1186/s40657-015-0033-y)
10. "About yellowhammers." [*Yellowhammer Dialects.*](http://www.yellowhammers.net/about)
11. Harry R Harding, Timothy A C Gordon, Emma Eastcott, Stephen D Simpson, Andrew N Radford, Causes and consequences of intraspecific variation in animal responses to anthropogenic noise, Behavioral Ecology, Volume 30, Issue 6, November/December 2019, Pages 1501–1511, [https://doi.org/10.1093/beheco/arz114](https://doi.org/10.1093/beheco/arz114)
12. "Open-source Version Control System for Machine Learning Projects." [*DVC.*](https://dvc.org/)
13. [*xeno-canto.*](https://www.xeno-canto.org/explore/api)
14. [*scikit-learn.*](https://scikit-learn.org/stable/index.html)
15. [*xgboost.*](https://xgboost.readthedocs.io/en/latest/)
16. [*fast.ai.*](https://docs.fast.ai/)

### Metrics

#### Word Count

1753 words

#### Code Line count

We used [CLOC](https://github.com/AlDanial/cloc) to generate the code line counts

| Language | Files | Code |
|-|-|-|
| Jupyter Notebook | 9 | 1195 |
| Python | 8 | 397 |
| Sum | **17** | **1592** |