Hi @BTran
Status update from yesterday: the bulk import detection and ID that you kicked off processed without issue.
https://www.grouperspotter.org/imports.jsp?taskId=3dc20ed3-77a0-4ca1-a9cf-e6cbe69e558f
Clicking through the import’s encounters, I am seeing clean detections and ID results. This lends some evidence to the idea that we should only do detection and ID for one bulk import at a time.
Looking at your examples of missing results.
Example 1:
Failed result:
https://www.grouperspotter.org/iaResults.jsp?taskId=dd45e340-2435-4a80-a7c8-9487b63b0693
successful Re-run result via “Start another match”:
https://www.grouperspotter.org/iaResults.jsp?taskId=7c1b1275-6975-4518-97e1-a6d6bc2073ff
Example 2:
Failed result:
https://www.grouperspotter.org/iaResults.jsp?taskId=1d29fca1-1af7-43aa-881d-826c0db7b0f8
successful re-run result via “Start another match”:
https://www.grouperspotter.org/iaResults.jsp?taskId=8f87cd5b-ed9c-4711-91fa-8df7f80ed186
Example 3:
https://www.grouperspotter.org/iaResults.jsp?taskId=ac346fc7-f60e-4b4b-aeb9-c5ec1b1fc4a9
actually did return a result
So overall, I am seeing matching working as designed.
With the missing matches, I believe we are seeing the result of the system being overloaded in the past with multiple, simultaneous bulk imports sent to detection and ID. As we discussed yesterday, only one bulk import should be run through detection and ID at a time because it also has implications for the question: what are we matching against? If multiple bulk imports are at interim stages going the the ML pipeline, the number of things to match against is constantly changing and we can’t be sure what we matched against because the ongoing processing is changing the number of things we match against constantly (e.g., adding new annotations every minute).
So we have two workarounds we can pursue:
- If the number of missing results is low, these can simply be re-run manually using the menu option “start another match”
- If the number of missing results is high, we can reset and re-run each bulk import in the order they were updated.
What would you prefer?