My name is Jason Parham and I’m the senior Artificial Intelligence (AI) researcher at Wild Me. Actually, my full title is “Senior Computer Vision Research Engineer” but that seems a bit too formal and only does a bit better job at describing what I do every day…
As a quick note: be careful not to confuse me with Jason Holmberg (or JH), who is the director of our non-profit. We know, we get it, it gets confusing sometimes. You may see me sign emails or be referenced online as Jason P. (or JP for short).
Before we begin, let’s go over some of the basics of AI. In fact, what is “AI”, “Computer Vision”, “Machine Learning”, and “Deep Learning”? What do all of these really mean and how can it be used for wildlife conservation?
“AI” is quite a big umbrella for a lot of different software and a bit of a misnomer with programers. For example, the word “car” can be used to describe everything from 1,000 lbs. 2-person Hondas to 10,000 lbs. 6-person heavy duty F-350 Ford trucks to medical ambulances to race cars. When talking to a car mechanic, simply saying your “car” broke down is certainly accurate but contains so many possible types of vehicles it doesn’t really mean much. The mechanic needs to know what kind of car you are talking about. Just like the word “car” to a mechanic, the phrase “Artificial Intelligence” to a programmer is a label that can be applied to a wide range of applications, from self-driving cars (no pun intended) to financial market predictions to weather rainfall estimates to your personal YouTube video recommendations to your phone’s voice assistant (e.g. “Hey Google” or “Hey Siri”). It is all “Artificial Intelligence”.
“Machine Learning” and “Deep Learning” are labels that provide us with a bit more detail. As the words literally suggest, we are now talking about an area within AI that requires some type of learning to be involved. So, how do we teach a computer to learn? While there are many different ways a computer can learn, let’s focus on one type called Supervised Learning. Supervised learning is the process of providing a computer with 1) a large bucket of images or audio samples or a history of temperature readings, etc. AND 2) the correct decision as decided by a human or sensor. An example is to decide if a picture is of a beach, or to transcribe the sound of your voice to written words, or to predict tomorrow’s highs and lows. The goal of supervised learning is to have the computer learn how to convert the input image, sound, or sequence into a useful prediction. When the system makes a prediction that is the same as the correct decision, it is rewarded. When the system makes a prediction that is different from the correct decision, it is punished. In this way, we are acting as the boss or “supervisor” for the computer by teaching it right from wrong and the computer will learn to get better and better. “Deep Learning” is a specialized way to do “Machine Learning”, but uses roughly the same process we have described.
Lastly, we have “Computer Vision”, which is simply “Machine Learning” but with a special focus on images and video (and not on audio or financial market trends). For example, we could teach a computer to solve the Computer Vision task of recognizing cats vs. dogs. To do this we will give it a big bucket of pictures where half are of cats and the other half of dogs. We then provide the correct decision (remember, Supervised Learning) of which images are cats and which are dogs. As the computer learns to solve the task, we will give it a pat on the head when it decides correctly.
Whew, that was quite the walk. Let’s review.
We went from a very general label of “AI” all the way to “Computer Vision”. To summarize, Computer Vision often uses Deep Learning to teach a computer to do solve a Supervised Learning task on images or video. Supervised Learning is a special type of Machine Learning that teaches a computer by showing it an example and giving it praise when it predicts it correct and a penalty when it doesn’t. All of this is covered by the big label of AI, but it is a better fit to use the phrase “Computer Vision” because it more accurately describes the specific in the kind of work we do at Wild Me. So, how does Wild Me use Computer Vision for wildlife conservation?
We have now arrived at the purpose of my job at Wild Me. The projects we work on bring the power of “AI” (haha!) to ecologists working in conservation. One of the ways we can ease the burden of ecologists is to automate some of the tedious tasks they find themselves doing to curate their sightings of animals. My job is to prepare the images for a supervised learning task and teach the computer how to automatically decide things about the images. This includes automatically deciding what species are in an image, how many animals are in an image (counting them), trying to decide which side of the animal we saw, trying to decide what parts of the image we can ignore (e.g. grass, trees, rocks, buildings), and preparing the researcher’s sightings for ID.
Woah… wait a second… what is ID?
I’m glad you asked! The Computer Vision we do can be roughly broken into two phases: Detection and Identification (or ID). The Detection phase is anything we do to prepare a sighting of an animal for ID. This includes all of the tasks we mentioned above, where we may want to put a box on an animal, crop it out, figure out if it is the correct species the researcher wants, check if we see the correct side of the animal (left vs right, for example), and remove as much of the background grass as possible. Why is this so important? Well, we want to give the ID phase the best possible chance at finding the animal in a previous sighting. The ID phase is much different to Detection: Detection is taking images and turning them into high-quality sightings, whereas ID is taking high-quality sightings from Detection and comparing them against each other. This is not easy when you have 10,000 sightings of animals and you don’t know how many you saw in the first place. Did you see 1,000 animals (an average of 10 sightings per animal) or did you see 5,000 animals or only 100 individuals?
Wild Me’s focus is to navigate the Computer Vision challenges of working with wildlife images. Our goal is to achieve high amounts of accuracy and automation so a ecologist can bring new images of animal sightings taken in the field and very quickly get a sense of how many unique animals were seen. Over time, this allows wildlife conservationists to build a history and a census of the animal population.