Appearance of unassigned duplicates of old encounters

In which Wildbook did the issue occur?
IoT

What operating system were you using? (eg. MacOS 10.15.3)
MacOS 13.4

What web browser were you using? (eg. Chrome 79)
Chrome 114.0.5735.133, Safari 16.5

What is your role on the site? (admin, researcher, etc)
OrgAdmin

What happened?
An encounter search of approved encounters created in 2023 revealed the presence of 500+ duplicates (cloneWithoutAnnotation) of old encounters.

Search parameters where set as follows:
Query Details

Search filter:
Assigned to one of the following usernames: ORP
Location ID is one of the following: Maldives Addu Ari Baa Dhaalu Faafu Fuamulaku Gaaf Haa Laamu Lhaviyani Meemu Noonu North Male Raa South Male Shaviyani Thaa Vaavu
State is one of the following: approved
genus and species are “Chelonia mydas”.
Encounter creation dates between: 2023-01-01 and 2023-06-27

JDOQL portion of the query
SELECT FROM org.ecocean.Encounter WHERE catalogNumber != null && ( submitterID == “ORP” ) && ( locationID == “Maldives” || locationID == “Addu” || locationID == “Ari” || locationID == “Baa” || locationID == “Dhaalu” || locationID == “Faafu” || locationID == “Fuamulaku” || locationID == “Gaaf” || locationID == “Haa” || locationID == “Laamu” || locationID == “Lhaviyani” || locationID == “Meemu” || locationID == “Noonu” || locationID == “North Male” || locationID == “Raa” || locationID == “South Male” || locationID == “Shaviyani” || locationID == “Thaa” || locationID == “Vaavu” ) && ( state == “approved” ) && genus == ‘Chelonia’ && specificEpithet == ‘mydas’ && ((dwcDateAddedLong >= 1672531200000) && (dwcDateAddedLong <= 1687824000000))

Results
Matching encounters : 1451
871 identified and unique
543 unidentified
37 daily duplicates

The 543 unidentified encounters are duplicates of existing and already processed encounters, which have been uploaded weeks to years ago, see for example:

It seems the duplicates were created yesterday (26.06.2023), while the original encounters have not been altered since their initial upload.

What did you expect to happen?
The search to turn up just the intended encounters from this year, not any duplicates

What are some steps we could take to reproduce the issue?
Is it possible to figure out where the duplicates came from?
Is there a possibility to bulk delete the 543 incorrect encounters and to prevent this from happening again?

Hope this is clear enough!

Best wishes,
Steph

Hi @ORP-StephanieK

We did some cleanup of incorrect IA class assignments in the IoT database yesterday and also found 10-20k media assets that never went through detection or were stuck so those were all sent through detection.

Since the cloned encounters you’ve mentioned were created yesterday, I suspect this is related.

I’ll follow up with you as soon as I have an update about those 543 new encounters.

Hi @ORP-StephanieK

These new cloned encounters should be deleted now (I found 545). We’ve paused pushing through the remaining media assets until we understand the souce of the cloning. I suspect old images were allowed to be matchable in their entirety (i.e. they were already precropped and allowed to skip our detector), and we mistook those for images that had not properly run through our detector. We’ll check for this condition before proceeding any further.

Thanks,
Jason

1 Like

Hi @Anastasia and @jason !

Thanks so much for looking into this so quickly. Glad to see the cloned encounters could be deleted so easily! There were about 10 additional stray ones, which I just removed manually.

We also have a similar issue with encounters from what I assume was a bulk upload in 2020, an issue which I discussed in a call with Tanya last year, but which did not get resolved in the follow up. The following search finds these encounters:

Query Details

Search filter:
Assigned to one of the following usernames: ORP
Location ID is one of the following: North Male
State is one of the following: unapproved
Encounter creation dates between: 2020-05-24 and 2020-06-05

JDOQL portion of the query
SELECT FROM org.ecocean.Encounter WHERE catalogNumber != null && ( submitterID == “ORP” ) && ( locationID == “North Male” ) && ( state == “unapproved” ) && ((dwcDateAddedLong >= 1590278400000) && (dwcDateAddedLong <= 1591315200000))

It brings up Matching encounters : 4149
0 identified and unique
4149 unidentified
0 daily duplicates

All of these unidentified encounters are duplicates of correctly processed encounters, even though the audit trail does not say so. I have been removing them on the side whenever I was waiting for matches of new encounters to process and have gotten rid of about 2500 already. A bulk delete would make life a lot easier though, if that could be done?

Best,
Steph

Hi @ORP-StephanieK

I’m sorry about the long wait. I haven’t forgotten about this and hope to have an update for you this week.

1 Like

The 4149 duplicate encounters have been deleted. Thanks for your patience!

Brilliant, thank you very much!

1 Like