Batch classification from camera trap data

mtobler · June 9, 2020, 5:52am

Putting this here to start a conversation, this will need some discussion and refinement. The question is “How do we most efficiently and accurately identify all individuals in a survey?”. This should apply to most camera trap based projects were we have specific surveys and data are batch-uploaded periodically but might also be useful to other projects (e.g zebra). I will take jaguars as an example. Let’s say we upload a new dateset from a 3-month camera trap survey with 100 survey locations and paired cameras at each location. We will have several 100 events with 1-10 images per event (some both sides, some only one side of the animal). Based on my experience doing this manually, here is what I think would be a good workflow:

Upload new dataset with batch upload.
Match all images against all other images including previous images from the same region (as a batch process, no user interaction required), calculate total score for each event. Make sure to exclude scores from images in the same event.
User looks at all events where at least one of the top 2 (or threshold) matches are a known individual. We want to first eliminate all events that are of already know individuals.
User looks at image from events with highest total score AND with left and right image available, assigns new individual. We assume these are the best quality images that have the most matches in the overall dataset, we want to ID those individuals first. We want to make sure we only classify an event as a new individual if we can be sure that it is a new individual.
User continues classifying events going from highest overall score to lowest. Some of the events we will not be able to match to an individual, some events will require manual matching. Those should be the last ones the user looks at.

Pre-calculating all the matches will greatly reduce the time the user will sit at the computer waiting for matches to run. It will also allow for an efficient scheduling of task (run them when CPU/GPU is idle). Overall this workflow should speed up classification of large new datasets and reduce possible identification errors or splitting that can happen when the users looks at images in a random order. I am aware that implementing this would require quite a bit of work but I would like to put it out there for input from others.

tanyastere · June 10, 2020, 9:39pm

Hey there, thanks for posting!
I want to make sure I have an understanding of what you’re proposing before we dig into refinement and if it’s something that would be a strong addition to the platform (although if my current understanding is correct, I can definitely see the value!)

My understanding of what you’re saying:

A user upload a large number of encounters all at once
Wildbook groups encounters into sightings based on time and location
Wildbook performs matching against existing data, limited by region if applicable
Wildbook notifies the user that matching is complete
Wildbook presents a way to look through sightings so a user can determine if any sightings can be associated with a known individual and removes them from the pool
Wildbook presents a way to look through sightings ranked based on certain parameters (you’ve said high quality images from multiple angles) so a user can most easily determine what is a unique, new individual
Wildbook indicates when a user has gone through all sightings.

Is that accurate?

mtobler · June 18, 2020, 11:25pm

Hi Tanya, yes, that is accurate. We might need to standardize vocabulary for event, encounter and sighting. The main points are 1) batch process all possible matches to speed things up for user, 2) combine information from all images in an event/sighting to more accurately identify the individual (use sighting instead of image as the main unit for ID), 3) a way to prioritize high-quality events/sightings (with multiple images and from both sides) when naming new individuals.

tanyastere · June 19, 2020, 1:10am

That’s so great to hear because we actually have functionality like this in alpha testing right now!

Once we get through this first round of testing and customer feedback, we’d like to expand to a couple additional platforms and get feedback from the wider community. We’d love for your involvement at that stage!

Meanwhile, I’m going to mark this as accepted.

mtobler · June 19, 2020, 5:18am

That’s great news. I currently have a jaguar dataset with about 1000 images (~300 events) and double cameras that would be perfect for testing this. Another one with a similar number of images will be ready soon. So please let me know when you are ready and I am happy to test. .

ACWadmin1 · October 14, 2020, 11:31pm

@mtobler @tanyastere
I think ACW users would also be interested in this functionality. I’m sure we could drum up some good camera trap datasets as well to test this out on. Please keep us in the loop. Thanks, @mtobler for suggesting this! @PaulK, fyi.

mtobler · October 20, 2020, 8:37pm

@tanyastere Is there any update on this? The current system just does not work for the large datasets we have. I would love to help test the new workflow.

PaulK · April 7, 2021, 4:26pm

We would also be interested in testing this new workflow in ACW.

We are onboarding 2 very large Camera Trap surveys and have come up with an approach using Sighting fields to group data but it is all relatively manual.

Please keep us posted.

Thanks
Paul

CC @ACWadmin1 @tanyastere