Skip to content

Process to handle scan programs #362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 13 tasks
craig-willis opened this issue Sep 28, 2017 · 9 comments
Closed
1 of 13 tasks

Process to handle scan programs #362

craig-willis opened this issue Sep 28, 2017 · 9 comments
Assignees

Comments

@craig-willis
Copy link
Contributor

craig-willis commented Sep 28, 2017

Per discussion with @smarshall-bmr, there are potentially multiple scans and scan programs on a given day. We need a process to capture the scan program information, update the cleaned metadata with the final program name/tag and update downstream extractors to respect this information (e.g., fullfield needs to run on a day for files associated with a program).

We need a way to determine if we have all files for that program in the raw data for the rulechecker.

@smarshall-bmr mentioned that program names may change over time, so we need to map to a standard name.

There is another issue with multiple scans on the same day with the same program. It's unclear whether this is something we need to address.

Completion criteria:

  • Complete set of programs/scripts in Github with documentation
    • what actually exits according to grep
    • access to all of the script copies on the ftp server
  • protocols describing each script (@smarshall-bmr)
  • Information about which programs are experimental vs. standard (i.e., should use in fullfield stitching) (@smarshall-bmr) and what the uses / intent of experimental runs are
  • Update the monitoring process to accumulate counts based on program
  • Update metadata cleaning process to add program information to dataset metadata and standardize any names
  • Update rulechecker to include the program in the unique key. This will result in potentially multiple fullfield images per day.
  • Define process for handling new scan programs going forward
    • need to discuss context, what Stuart's workflow is, and how to integrate checking and pipeline updates
    • process for documenting and preparing for new scripts (require PR before using a new scan - requires approval)
    • anything unrecognized will be categorized as experimental and unprocessed (at least beyond level 1) until recognized
  • Review output
@dlebauer
Copy link
Member

Craig compiled a list of unique scripts that @smarshall-bmr can annotate:

In addition, the metadata field (appears to) provide a link to a copy of the script version run on a particular on an ftp server ftp://10.160.21.2//gantry_data/LemnaTec/ScriptBackup/, but this is not resolving.

the metadata looks like

    "gantry_system_variable_metadata": {
      "...":,
      "Script path on local disk": "C:\\LemnaTec\\StoredScripts\\SWIR_VNIR_Day1.cs",
      "Script copy path on FTP server": "ftp://10.160.21.2//gantry_data/LemnaTec/ScriptBackup/SWIR_VNIR_Day1_7e5b6f22-f990-43fc-92d4-fee2db330d4a.cs",
      "..."
   }
  • @smarshall-bmr where is this ftp server?
  • @jdmaloney do you know where this ftp server is, and could you put these in an appropriate place under raw_data (like ua-mac/raw_data/gantry_data/LemnaTec/ScriptBackup/)?

@max-zilla
Copy link
Contributor

we have the /gantry_data directory accessible on the cache server, but the ScriptBackup is not currently whitelisted for the transfer pipeline. there are 1,687 scripts listed there, e.g.:

SWIR_VNIR_stereoVIS_IR_Sensors_whiteTarget_lightsOFF_a7b474cb-234a-4724-833d-4847cc83d2d4.cs
test_006a98f5-746f-4bbf-9a30-27fcb32cbebf.cs
test_2a0c5379-837c-4889-be6b-38b554158af3.cs
test_6f6f542d-1f1b-411a-a801-df489838e1cd.cs
test_9a0bdead-54fd-4f17-8cbd-fefdbfbf3c4f.cs
Tutorial_53a5cc92-6340-41bb-bd14-e80c960433fd.cs
Tutorial_7600ad45-a660-4e77-97a8-6a09217b000d.cs
Tutorial_ed3471a2-2912-40bf-b15e-bd94ad19be68.cs
VNIR_Stereo_Full_Field_0.02m_47e46752-088b-43dc-9f4a-81b5a77304ba.cs
VNIR_Stereo_Full_Field_0.02m_49925286-0e49-4eeb-9667-0d3d5347f3e0.cs
VNIR_Stereo_Full_Field_0.02m_4e732583-6f17-4b55-b2eb-1fcee83d7fa7.cs
VNIR_Stereo_Full_Field_0.02m_61dd2093-c69c-43c5-b645-184eff39d513.cs
VNIR_Stereo_Full_Field_0.02m_7591c364-f646-4a1a-98c1-cb2570403cf7.cs
VNIR_Stereo_Full_Field_0.02m_9ec4f9b3-f52c-43f9-918a-0bdd9f695f3d.cs
VNIR_Stereo_Full_Field_0.02m_a246d47d-33fa-4ffd-b8d3-803c289ecc69.cs
VNIR_Stereo_Full_Field_0.02m_a372086f-31c4-4fcd-a450-91523ea651e4.cs
VNIR_Stereo_Full_Field_0.02m_c7e5954b-42b7-4dd4-95f4-576fefa16324.cs
VNIR_Stereo_Full_Field_0.02m_e302dfdc-51a0-4cb3-94aa-fe37c1bf92da.cs
VNIR_Stereo_Full_Field_0.02m_e752aad9-44a1-440c-a393-bd2567ec2a47.cs
VNIR_Stereo_Full_Field_0.02m_ea2e8db1-6fde-4855-9d7e-c9dedaf1febd.cs
VNIR_Stereo_Full_Field_0.02m_f2219962-2103-4b05-bc5a-a99d41391974.cs
VNIRtest_4m_cb2741e5-2d62-4998-80aa-295583e50b63.cs

@craig-willis this probably obviates your scan unless we have scripts in your results that dont appear here.

Here is a list I generated.
script_list.txt

@craig-willis
Copy link
Contributor Author

The list I generated has 80 entries.

https://docs.google.com/spreadsheets/d/1tMkPT2jtMgTficfSDx80-RpB6PXuzry2gLEZG_ajvl8/edit

I gather what you've got is specific versions on specific dates? That would be better for true traceability, if we can reference the specific script from the dataset metadata.

@craig-willis
Copy link
Contributor Author

@dlebauer Along with the program descriptions, should we consider linking to the fieldbook spreasheet (or something that references the fieldbook) from the dataset metadata?

https://docs.google.com/spreadsheets/d/1eQSeVMPfrWS9Li4XlJf3qs2F8txmddbwZhjOfMGAvt8/edit#gid=665425213

@dlebauer
Copy link
Member

@craig-willis that would seem reasonable. Not sure the best way to do this. Much of this could be inserted into the BETYdb managements table (and some of it is there). Only issue is that the record keeping hasn't been consistent over the years.

@smarshall-bmr what are your thoughts?

@max-zilla
Copy link
Contributor

max-zilla commented Oct 2, 2017

@craig-willis @robkooper here's an exam question for ya.

right now we trigger the full field extractor by:

  • checking every incoming geotiff file and adding it to the list for that day+sensor
  • if our # of geotiffs for that day+sensor == the number of raw datasets used to generate the geotiffs for that day, trigger
  • (e.g. trigger RGB GeoTIFF full field when RGB Geotiff count == stereoTop raw_data count for that day)

I have added code to account for multiple scans per day in our full field unique key (day+sensor+scan). BUT that breaks our count check. I need a new method to know when to trigger full field without triggering when only half the geotiffs are generated.

Ideas....

  • modify create a custom rulechecker query so I can get count of rules for that day+sensor across all scans, then trigger all scans at once if the sum(all scans) == total raw datasets. this is most straightforward but requires a little coding.

  • bin2tif extractor creates some record in PSQL of scan counts per day that our field mosaic checks against. I dont like this solution much, requires a lot of centralization (which I guess we've already done for rulechecker) and customization.

  • for this year we have some one-off script create a special file with scan counts per day that rulechecker checks, since our reprocessing is all historical. this seems like worst solution to me.

  • Rob's idea that others have mentioned - use coverage sum of polygons to cover full field

@craig-willis
Copy link
Contributor Author

@max-zilla As discussed, your first option seems practical for November.

@smarshall-bmr
Copy link
Collaborator

@max-zilla
Copy link
Contributor

we are accounting for this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants