Historical Data issue

clsims · October 18, 2023, 5:07pm

Good morning. Hoping to get back into testing the algorithm with the 2019 S09 data in Full Frame (non-cropped), however, we still have to resolve the issue of the original ‘historical’ cropped and rotated data that is no longer in Flukebook.
Shall I just re-import the cropped and rotated image files and xls?
Remember, I tried to do just the xls, but the image files were not found to match the file names I used.
It is possible they are there on the server, but I need the correct file names to link them… otherwise, I can just import again.
thanks,
Christy

Anastasia · October 18, 2023, 7:08pm

I’m meeting with Jason later today to figure out the next steps on this. Thanks for your continued patience!

Anastasia · October 18, 2023, 11:16pm

If you decide to re-import it, it’s important to NOT run detection on the cropped and rotated images. Like we discussed in this thread the new beluga detector was trained on real-world data so when it sees perfectly cropped and rotated images, it doesn’t know what to make of it and can’t place annotations correctly.

I checked with Jason whether it’s possible to overwrite a bulk import and it doesn’t seem likely. One of two scenarios likely occurred:

If this was the same import we discussed during last week’s call, I’m almost certain we confirmed it could be deleted so that it could be re-imported with the re-named spreadsheet. Otherwise, it may have been deleted prior to then. Jason was able to confirm it was originally uploaded Sept 26, 2023. I’ll send you the spreadsheet from that first upload (2019 S09WildbookStandardFormat) so you can compare it with the most recent one you shared last week (WildbookStandardFormat_2019_S09_Historical).
When it was originally imported, it was never committed from the review page. This makes the spreadsheet visible to us on the backend, but it wouldn’t show up in your import history if the results weren’t committed.

That said, since there’s no record of this 2019 import in your account, it’s safe to import the data you have and treat it as a fresh start.

clsims · October 24, 2023, 7:45pm

@Anastasia @jwaite
Hi there, we imported the remaining historical data (cropped and rotated images) and I’m not sure next steps to integrate into our dataset.
We don’t need to run through detection, because already cropped and rotated. But when I click on the data from the bulk import table, I’m not sure what to do next. There is no option to 'send to matching".
We did pull in with the ID #, but I see that it has added a “.0” to end of each ID #. Is there still a way to synch this with the current data? We would then like to run some matching now that all of the historical data is uploaded and included.

https://www.flukebook.org/import.jsp?taskId=97163c61-4fa3-4224-9ce4-1ca60f2d3767

Anastasia · October 24, 2023, 10:04pm

It can’t match without annotations and annotations are added when detection is run. We’re not running it through detection because the new detector will misplace the automated annotations on these specific images for the reasons mentioned above. You can manually annotate these if you want, but if they’re already known individuals, the imported ID should connect the images to the right individual pages.

This sometimes happens to me, too. I’ve used the guidance from Microsoft Support and it’s helped each time I’ve run into it. Type an apostrophe (‘) before you enter a number, such as ’123456789 or ‘1/23. The apostrophe isn’t displayed in the cell after you press Enter.

Even though it says it’s to fix dates, it works for the randomly included decimals, too.

To fix the IDs, I’d delete this bulk import and re-import with the updated spreadsheet. Double-check in the bulk import preview screen that it’s not still adding decimals to the Marked Individual IDs before you commit the results. If it’s imported correctly, it will associate the new Encounters with the correct individuals.

jwaite · October 25, 2023, 5:35pm

Hi Anastasia,

What if one of these new photos doesn’t have a “match” in the already known individuals, eg. if it is a new individual. Then it won’t get connected to anyone, but also won’t get annotated. Somehow we’ll have to figure out who those are and then manually annotate them? @clsims

Anastasia · October 25, 2023, 5:57pm

An alternative could be to add the original un-cropped, un-rotated photo to the encounters with unknown individuals (so that both versions of the photo are on the same encounter) but review the match candidates from the un-cropped, un-rotated image’s annotation. How practical this is depends on how many unknown individuals there are in this set.

Another option would be to re-do the import so that both the cropped and rotated images are uploaded along with the un-cropped, un-rotated ones in your spreadsheet and run detection and ID normally (info on how to add multiple images to an Encounter via bulk import here). Then you’d review match results on only the un-cropped, un-rotated images and ignore reviewing matches from the cropped set since the annotations will likely be misplaced.

jwaite · October 25, 2023, 5:57pm

For the ID #'s. Since we have 3000 plus entries and I couldn’t figure how to add an apostrophe easily to the whole column, I used the formula “text” to change the number to text. It doesn’t show the apostrophe in the cell, but does say it is text. Do you think this will be sufficient for Flukebook?
@clsims

Anastasia · October 25, 2023, 6:00pm

It should! Make sure to check the table on the Bulk Import Review page before you commit your results. If it sneakily adds in a decimal, you’ll see it there before committing the import.

jwaite · October 25, 2023, 6:01pm

Just to be clear, they are not unknown to us (they have ID’s) but might be new ones to Flukebook.

Anastasia · October 25, 2023, 6:10pm

That clarification helps; thank you! If you’re including IDs for your Encounters and that individual is new to Flukebook, it will get assigned to the correct Marked Individual based on the ID you provide.

Adding this edit in case it’s not clear: Since all of the whales you’re importing will have IDs, even if they’re brand new to Flukebook, they’ll still be recognized as Marked Individuals. So you can disregard my previous suggestion (quoted below) about adding the unedited images to the encounters. I suggested that based on the assumption that these were entirely unknown whales that weren’t identified at all.

Anastasia:

An alternative could be to add the original un-cropped, un-rotated photo to the encounters with unknown individuals (so that both versions of the photo are on the same encounter) but review the match candidates from the un-cropped, un-rotated image’s annotation. How practical this is depends on how many unknown individuals there are in this set.

Another option would be to re-do the import so that both the cropped and rotated images are uploaded along with the un-cropped, un-rotated ones in your spreadsheet and run detection and ID normally (info on how to add multiple images to an Encounter via bulk import here). Then you’d review match results on only the un-cropped, un-rotated images and ignore reviewing matches from the cropped set since the annotations will likely be misplaced.

jwaite · October 25, 2023, 7:47pm

OK - glad to hear that!