Adding Wobbegong Sharks to a Wildbook

Hi there,

We’re starting a wobbegong shark photo-ID study at Byron Bay in Australia. There are three species present, all with complex patterns.

Some spotted wobbegong (Orectolobus maculatus) examples:



Ornate wobbegong (O. ornatus):

Banded wobbegong (O. halei):

I guess my first question is whether any of the algorithms that have been integrated into Wildbooks would potentially be suitable for distinguishing individual wobbegong sharks with minimal user input? There are a LOT of spotted wobbegongs here :slight_smile:

Second question: if so, would integration with any of the existing ‘sharky’ Wildbook instances be feasible? The leopard shark Wildbook would make the most sense to me, if that would work, as it’ll be many of the same people working at the Cape Byron study site.

Any suggestions or pointers would be greatly appreciated!

Very best,
Simon.

Hi @simonjpierce,

This looks like a great species for Hotspotter and PIE v2 for ID algorithms.

We would first need to train up a detector to crop to the body. Any chance the community has about 2000 photos like your examples? If not, could we start building that?

Second question: if so, would integration with any of the existing ‘sharky’ Wildbook instances be feasible?

Yes, actually we have an interest in combining them. Whaleshark.org, leopardshark, etc. could all handle this species…and I’d even love them all to be in one. As another example, we have all the AI trained up to do basking sharks and white shark fin ID…just haven’t had the time or user to add it to one of the Wildbooks. A combined Wildbook for sharks (like Flukebook) would provide this.

What are your thoughts?

Thanks,
Jason

Thanks @jason!

We can certainly start building that dataset (fun project for me) for spotted wobbies. Full body photos, similar to those included above, are preferred then I take it?

I’m definitely open to the ‘Sharkbook’ concept.

Best,
Simon.

Yes, as close to 2000 diverse photos as possible from all desired angles, lighting conditions, reefs, etc. Successive frame photos taken rapidly are the opposite of what we want. We need the ML to see enough of what isn’t the shark to help it identify what is the shark.

We’re going through this same exercise for spotted eagle rays and just hit critical mass. We’re now annotating the data. You can see that process here:

This (beautiful!) species will likely be the same.

Thanks,
Jason

P.S. Has anyone made a Big Lebowski pun and named their paper “That carpet shark really tied the reef together?”

That’s super interesting @jason. Thanks for including the video! Very useful to see the intended use.

I’m raiding iNaturalist for images along with social contacts. For this exercise, am I just looking for ~2,000 variable wobbegong images to train the ML (which is fast, as I can screen-shot), or am I actually building the matching database? I was planning to focus on area-specific high-quality images for the latter, so it’d take a lot longer.

No Big Lebowski puns in my Paperpile database, but well played sir. The Dude abides, etc.

Thanks,
Simon.

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

That comes later, but it is important to start working toward. The basic process is:

  1. Gather 2000-ish diverse photos to build the detector (ID not necessary but do track IDs if you can).
  2. Manually label species, viewpoint, orientation, and quality of all animals in the 2000+ images.
  3. Build ML detector from labeled data and deploy to Wildbook.
  4. Use Hotspotter in Wildbook to match individuals. Hotspotter is a visual texture matcher and does not need a priori knowledge of individuals.
  5. Work to build a well curated ID catalog of about 200 individuals. This is the longest step may take a year or two.
  6. Train Pose Invariant Emebeddings (PIE v2) to learn how to extract features and match individuals. Generally PIE is going to outperform Hotspotter significantly.
  7. Ideally, we would also release the ID catalog as a COCO model (machine learning data format) on a platform like https://lila.science (whale shark and zebra data already there) to promote even better ML technique development in academia for the future.

Just for some background on our ML pipeline and its steps:

@simonjpierce

This is a FANTASTIC roadmap. Thanks @jason! Much appreciated.