Questions about the influence of "Location Id" and "Viewpoint" on the matching process within a project

paul_lallement · August 13, 2021, 5:28am

Flukebook / Mac Os High Sierra 10.13.6 / Firefox, Chrome and Opera / Admin

We have some questions about how the matching process works within a project.

Indeed, the “Location ID” affects the number of candidates considered by the matching process. This is quite confusing as within a project we would expect that matching is done irrespective of the location ID. How can we make sure that encounters are systematically match against all encounters within a project (even if they have different location ID)? Could you please explain precisely how the matching process works within a project framework ?

Since the “viewpoint” have also an influence on the number of candidates, we would like to check what viewpoint has been given to each annotation. In our case, it doesn’t show up for Fluke annotations although it is displayed for dorsal fins (https://community.wildme.org/uploads/default/original/1X/54bb904003a9b04d89314f4f997751cec21bebc4.jpeg ). Could that be fixed ? Or does it means that Fluke don’t have a viewpoint?

Thank you in advance for your answers, that will help clarifying the process, as we want to make sure there is no bias when conducting matching within the Indocet project.

Best regards,

Violaine and Paul

jason · August 14, 2021, 12:03am

Hi @paul_lallement

We are reviewing this.

Thank you,
Jason

jason · August 16, 2021, 5:47pm

Flukes do not use viewpoint. Confirming that here.

Dorsal fins do use viewpoint (left/right), and those should appear when you roll over the annotation.

jason · August 16, 2021, 6:03pm

Hi @paul_lallement

Just looked at the code. Summary:

If you kick off a match for an Encounter from the Project page, matching will occur within the project but locationID will NOT be considered. I verified this by looking at the query itself in the database.

and under the covers, the database is queried like this:

SELECT FROM org.ecocean.Annotation WHERE matchAgainst  && project.id == '63715b78-4334-43b3-993e-7b27f07fb68c' && project.encounters.contains(enc)  && iaClass.equals('whale_sperm+fluke')  && acmId != null && enc.catalogNumber != '01442db1-7b2d-4408-88f0-e79cf08b955e' && enc.annotations.contains(this) && enc.specificEpithet == 'macrocephalus' VARIABLES org.ecocean.Encounter enc; org.ecocean.Project project

If you kick off a match from the Encounter page, you have the option of filtering by location ID and project will NOT be considered.

In all cases, filtering occurs by species and iaClass (the type of annotation, so that we don’t compare fins to flukes), such as “whale_fluke”. While you can set a viewpoint for a fluke, it is explicitly left out for humpbacks.

Thanks,
Jason

paul_lallement · August 23, 2021, 10:18am

Jason,

I would like to thank you for your time in responding and for checking the code. Our understanding of how the matching process works within a project in now enhanced.

However, a final grey area remains regarding the variation in the number of candidates across matches within a project (examples : https://www.flukebook.org/iaResults.jsp?taskId=a4adcf59-a3fa-4e1c-84a2-91d0fdb592df&projectIdPrefix=Indocet-Mn-##### (3425 candidates) ; https://www.flukebook.org/iaResults.jsp?taskId=85a45735-a6d3-44cf-a7e8-d243140d1212&projectIdPrefix=Indocet-Mn-##### (4450 candidates)). Indeed, for encounters having the same characteristics (i.e. species and iaClass) and being part of the same project, the number of candidates considered by the matching process differs. Thus, three questions arise :

What exactly does the number of “candidates” refer to exactly ?
Why the number of candidates change depending on the image being matched ?
How can we be sure that all encounters (i.e. all encounters/images included in a project) are considered for each matching process ?

Thank you in advance for your answers which will help to clarify these last grey areas.

Best regards,

Paul and Violaine

jason · August 26, 2021, 3:05am

Really good question.

Candidates are the number of annotations (literally the green bounding boxes in the Encounter images) that we tried to match against. These are filtered - as previously discussed - by species, location ID, feature type (e.g., whale_fluke), and even project depending on user selection.

If I’m given two random match results that have significantly different candidate numbers, as you included, it is safe to assume they asked different questions, such as one filtering by project and the other not.

If I ask the same question of two different flukes from Reunion from these Encounters:

https://www.flukebook.org/encounters/encounter.jsp?number=057bb834-b5f8-42f8-8155-607be1bba686

https://www.flukebook.org/encounters/encounter.jsp?number=0880fcac-e0ec-476d-9610-15bb01282d43

then the candidate match results are going to be similar:

https://www.flukebook.org/iaResults.jsp?taskId=3bb68101-49d6-4a5e-b0f3-2e331b8d7e98

https://www.flukebook.org/iaResults.jsp?taskId=bae521fd-2383-4c4d-a5f5-25d6b63587c3

One has 3878 candidates and the other has 3879, showing good consistency in what we tried to match against. However, they are off by 1. Why? Because one of the Encounters has two annotations, and we don’t try to match against the same Encounter (therefore there were only 3878 whale_fluke candidates to try to match against in Reunion). So asking the same question can lead to slightly different numbers.

Thanks,
Jason

paul_lallement · August 26, 2021, 7:59am

Jason,

Thank you for your reactivity and for your precious answers.

Although the number of candidates is the same when encounters are filtered by Location ID, as shown in your example with La Réunion, it seems that it doesn’t work when matching is conducted within a project (rather than matching random individuals from the encounter page).

Indeed, when matching encounters (for ex: 1st and 2nd links below) within the same project (Indocet-Mn project, c.f. 3rd link below), the number of candidates is respectively 3425 and 4450, while they should be matched against the same number of candidates. Do you have any idea of why the number of candidates differs in this specific case (i.e. within a project framework)?

Thank you in advance for your helpful answers.

Best regards,

Paul and Violaine

Links :

jason · August 27, 2021, 3:32am

Hi @paul_lallement

That first link is from October 2020 (see timestamp in the results)

…almost one year apart from the second link (2021-07-15). Match results are snapshots of what matched at the time. Re-running that fluke yesterday produced yet a different result:

https://www.flukebook.org/iaResults.jsp?taskId=8857dbb8-b8d1-43c9-8ccb-24057c8d6144

Between those two dates, data may have been added, removed, and modified. It’s not a good comparison.

If I run two new matches from the project:

https://www.flukebook.org/iaResults.jsp?taskId=f955902d-faa4-48eb-a64e-6f4f8a0f2f6e&projectIdPrefix=Indocet-Mn-#####

https://www.flukebook.org/iaResults.jsp?taskId=4c961ea4-2976-4544-be12-4cd7513e9ba9&projectIdPrefix=Indocet-Mn-#####

The numbers are consistent (4900 and 4901).

paul_lallement · September 6, 2021, 11:57am

Jason,

Thank you for your response.

Following these explanations, we re-run the matching process on several encounters on 2021-09-06 and within the same projet (c.f. Indocet’s Mn project at this URL : Flukebook | Login). Here is how we proceeded (as indicated in WildMe Docs) : after clicking on “Start match” (the button turned red), we clicked on “Match results”.
Following these steps, when clicking on “view” latest results, the match results were not updated and the old timestamp was still being displayed. Also, the identification status (initially “pending”) has not been updated (c.f. screen shot below).

Is the identification status supposed to change to “complete” on this page when the matching process is done ? Do you know why the matching process gets stuck at this stage ? We also tried with a VPN, but it didn’t work any better. Further, could you please explain what are the different “identification status” (some encounters have been matched and have a project ID, but still have an “undefined” identification status) ?

In addition, as mentioned in a previous post (c.f. Automatic detection error - #3 by paul_lallement), the matching time increases considerably after 2 matches and the message “attempted to match” is displayed either for some algorithm results or for all of them (c.f. https://www.flukebook.org/iaResults.jsp?taskId=d30ec6a7-4184-4fb1-af73-34914a3a398f&projectIdPrefix=Indocet-Mn-#####). Any idea of where it might come from?

Thank you in advance for your precious answers.

Best regards,

Paul and Violaine

jason · September 12, 2021, 4:32pm

Hi Paul,

Every time you try to match a fluke, you can choose multiple algorithms, so for a single match with four algorithms selected, the system will need to get through four in a row before going on to the next Encounter and its matching. You can speed up matching by selecting fewer algorithms per Encounter. Otherwise, you will notice a slow down in successive Encounters as the system processes multiple algorithms for each Encounter. Other users may also be requesting matching jobs, and this may impact your response time.

When watching a match results page, it will attempt to periodically refresh to see if matching results are done. After multiple attempts, it will time out. You can always refresh the page to restart the checking. If you have kicked off multiple Encounter matching jobs, it may take awhile before match results appear.

Thanks,
Jason

jason · September 12, 2021, 4:54pm

Hi @paul_lallement

I have filed ticket WB-1785 to make some improvements here. Two things that would help.

A status button letting you know the “Start Match” button kicked off the new matching job. I confirmed it is working, but it is not providing any feedback, which it should.
Connecting the new “Start Match” matching process to the “Match Results” button. I confirmed that “Start Match” is working, but because it is an asynchronous process, the “Match Results” button takes some time to get linked to the new set of match results. I’ll see if this can be linked better. Until then, refreshing the page will eventually show the new task matched to the “Match Results” button. Alternatively, kicking off the match from the Encounter page itself may provide better feedback.

Thanks,
Jason

jason · November 22, 2021, 9:58pm

A post was split to a new topic: Setting new project ID

jason · November 22, 2021, 9:59pm

WB-1785 was deployed some time ago, and I am now marking as resolved.