Machine Learning Fail¶
Introduction¶
A little while ago, I created a machine learning App to recognize Australian urban birds. I took the Australian Museum list of the 30 most common small urban birds, and used the fastai
methodology and libraries to train a recognizer for each bird species.
I had a unique and refreshing approach to input data quality: I had none at all!
I used the common name of the bird, did a google image search, and used the top 200 thumbnail images returned, with no vetting at all. This was part of an experiment to see how far you could get with the lazy persons approach to Machine Learning.
I was close to astounded at how well it worked.
Live Test Images¶
Then just the other day, my brother gave me a wildlife camera to position next to my birdbath (which is very popular with the local wild life). I was able to take a number of close-ups of the local bird species. So I thought, I will run these images through my recognizer.
Success - (Pride before the Fall)¶
The first few were OK
I was a little surprised that the Rainbow Lorikeet was not identified with more certainty
It is hard to mistake a Pied Currawong up close
Fail¶
The last one was crushing.
Whiskey Tango Foxtrot! My recognizer got the Grey Butcherbird completely wrong! The shame!
The Reason Why (I think)¶
I think I know what happened.
It turns out that there are two common butcherbird species (and both are common where I live).
There is the Grey Butcherbird (Cracticus torquatus),
and the Pied Butcherbird (Cracticus nigrogularis).
I suspect that my naive search for "Butcherbird" got images of both species. When I repeated the search just now, it certainly got images of both species. My bird recognizer really just matches by textures, and these two birds have completely different textures, so it is no wonder that recognition failed.
Conclusions¶
So the moral of the story is: Data Quality Matters! Ignoring input data quality might look to be the easy way to start, but it will come back to bite you later, in production.