Large backlogs in machine learning queue in Flukebook

What Wildbook are you working in?
Flukebook
What is the entire URL out of the browser, exactly where the error occurred?
Flukebook (This is one image that has been waiting in the queue for 3-4 hours)
Can you describe what the issue is you’re experiencing?
The machine learning queue is frequently backed up with 200+ images, so importing a single images takes hours/days. I either need to bulk import hundreds of my own images so that I’m not waiting for days for a single image to go through the queue, or there needs to be a limit to how many images can be bulk imported at once so that the queue doesn’t get such a large backlog. I’m not sure what the best solution is.
Can you provide steps on how to reproduce what you’re experiencing?
Start match on any image that hasn’t gone through the machine learning queue yet (ex. 13-0366-LSI-M), and you will be met with “249 images in machine learning queue. Time to completion is averaging 186 minutes. Your time may be faster or slower.”
If this is a bulk import report, send the spreadsheet to services@wildme.org with the email subject line matching your bug report

Hi @achantal14

The backlog has been cleared, and the queue is currently empty. I see that the maximum turnaround time on a single job was 6 hours, with most completing in somewhere between 2-4 due to the heavy load that has now been resolved. There was a lot of traffic moving through the machine learning queue, but all of it completed successfully. We are looking to add more scalability through load balancing later this year or early next.

Thanks,
Jason

1 Like

@jason Hi Jason! It looks like there’s another backlog. The images I sent to analysis yesterday morning still haven’t gone through. The results page has said that there are 8 images in the queue for the past ~30 hours. Would you possibly mind taking a look into this? Thank you!!

Update 8/24: The machine learning queue is now empty, but images I sent for analysis 3 hours ago still haven’t finished (page says “Waiting for results. The machine learning queue is working”).

Hi @achantal14

We put a little more horsepower behind the queue and restarted one job that was, and it has now cleared up. The queue is empty, and all jobs should be done. Please let me know if it looks that way on your end too.

Thanks,
Jason

Hi @jason I sent an image (14-0758-D-LSI-M) to analysis last night around 6:30PM and I’m still getting the “Waiting for results. The machine learning queue is working” message.

Hi @achantal14

OK, I see the issue. There are 621 jobs that have not moved to the machine learning server. I am investigating.

Thanks,
Jason

1 Like

Hi @achantal14

I restarted our queue server, and I see jobs now flowing to the machine learning server. However, I will need to resend those 621 jobs that were stalled. Were these largely yours? Are they part of a large bulk import that I can reissue to ID?

Thanks,
Jason

Hi @jason. I only send about 20 images (at maximum) to analysis at a time, so I don’t think most of those are mine. Thank you for your help in resolving this!