-
Notifications
You must be signed in to change notification settings - Fork 13
Add hyperspectral data extractor to pipeline #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@yanliu-chn what is the status of this? I know the extractor isn't finished, but can it be added as an extractor and tested out on data so that we have some examples to show? |
Is this related to #86 ? |
@yanliu-chn yes this issue follows #86 - that is to write the converter, this issue is to add it to the pipeline. |
@yanliu-chn - what is the status on this? |
Now that #86 is in good shape, our team will work with Clowder team and Charlie/Jerome to develop the extractor at dataset level. The extractor will watch for the readiness of a couple files in a dataset; call Charlie's script to create netcdf files and write them back to the dataset. |
@yanliu-chn - please have your team update this issue? |
@rachelshekar Thanks for you reminder, I'm assigning this task to Xingchen @Zodiase |
@yanliu-chn I did not realize this was already assigned to you when I spoke with @czender yesterday and offered to look into this. I have committed an initial draft of an extractor for the terraref.sh script, but it doesn't currently work. I haven't tried to really dig into charlie's script, but I have the extractor downloading a _raw and raw.hdr file and invoking the script to encounter this error:
It looks like maybe a hard-coded path is causing problems? In any case, I will stop working on this branch if @Zodiase is going to look into it. e: again, this is a dataset extractor (trigger it by uploading a _raw and a raw.hdr file, like from SWIR or VNIR, to a dataset) so you need to install my pyclowder branch: |
This is a $PATH issue. terraref.sh does this on roger: |
@czender I'm a bit confused by the two parallel threads between #86 and this. I'd like to sort this out: is the steps:
mentioned in OP still required, given
from @yanliu-chn in #86? |
No. The OP shows how to build/install NCO yourself. #86 shows that @yanliu-chn did this on roger, and packaged it appropriately. terraref.sh needs NCO (and thus ncks) on its path. It doesn't matter from where. "module add nco" is one option (on roger). |
@FlyingWithJerome does JsonDealer still produce false positives when looking for required input files? |
@max-zilla @Zodiase @robkooper @czender I have successfully created the docker container of the hyperspectral image conversion. @Zodiase , please use this docker file as your development environment for extractor development. Let me know if you have any questions. I created a branch to add the docker files: |
@czender I'm debugging it. I tested it locally and it's all fine, and it will not warn on ROGER if all the files are there. I'm debugging it on ROGER and a possible reason would be Python OS module since the file checker largely depends on it. I will figure it out |
@FlyingWithJerome that makes sense to me. do you suspect the python used in the extractor is earlier than version 2.7? or something else? |
@czender In my expectation, any version of Python would fine. Since 2.7 is the most common one and numpy is not compatible 3.X version, 2.7 would be the best choice. |
@yanliu-chn Could you rebase your Dockerfile on |
there is no need to. i have included all the stuff the python based did in the current Dockerfile: https://opensource.ncsa.illinois.edu/bitbucket/projects/BD/repos/dockerfiles/browse/clowder/python-base/Dockerfile You can modify my Dockerfile to include rabbitmq config. other than that, it's already in good shape. for the rabbitmq part, please refer to the plantcv extractor https://opensource.ncsa.illinois.edu/bitbucket/projects/BD/repos/dockerfiles/browse/clowder/plantcv/Dockerfile |
please use my Dockerfile for extractor development, as it's been tested to work with terraref.sh. |
@yanliu-chn So there are a few differences that I'm not sure how to handle.
|
|
For 2. I meant what do I do if the line For 3. I meant is there actually a |
In other words and in short, the issue here is your Dockerfile seems to miss things that weren't written by me and I don't know what they do and I'm not sure how to modify your Dockerfile to make sure I don't either miss anything important or conflict with your logic. |
for 2. you don't need to care about entrypoint. we don't need that for the extractor. for 3. the To understand the Dockerfile, https://docs.docker.com/engine/reference/builder/ is a good reference. All you need to do now is:
|
were you able to start my docker container in your Docker environment and test the terraref.sh sample run? |
The Dockerfile and extractor code temporarily stored in my notes were updated and now the extractor can finish running INFO : pyclowder.extractors - Waiting for messages. To exit press CTRL+C
INFO : pyclowder.extractors - Registering extractor...
ERROR : pyclowder.extractors - Error in registering extractor: [Errno 2] No such file or directory: '/home/ubuntu/extractor_info.json'
INFO : pyclowder.extractors - Starting a New Thread for Process Dataset
{'files': [], 'download_bypassed': True, 'channel': <pika.adapters.blocking_connection.BlockingChannel object at 0x7f06af08e110>, 'filelist': [{u'filename': u'0596c17f-2e4c-4d43-9d77-cde8ffbde663_frameIndex.txt', u'date-created': u'Wed Aug 17 22:00:37 UTC 2016', u'contentType': u'text/plain', u'id': u'57b4de85e4b0049e7845295c', u'size': u'10603'}, {u'filename': u'0596c17f-2e4c-4d43-9d77-cde8ffbde663_image.jpg', u'date-created': u'Wed Aug 17 22:00:38 UTC 2016', u'contentType': u'image/jpeg', u'id': u'57b4de86e4b0049e78452963', u'size': u'40117'}, {u'filename': u'0596c17f-2e4c-4d43-9d77-cde8ffbde663_metadata.json', u'date-created': u'Wed Aug 17 22:00:44 UTC 2016', u'contentType': u'application/json', u'id': u'57b4de8ce4b0049e7845296a', u'size': u'1533'}, {u'filename': u'0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw.hdr', u'date-created': u'Wed Aug 17 22:00:52 UTC 2016', u'contentType': u'application/octet-stream', u'id': u'57b4de94e4b0049e78452971', u'size': u'10197'}, {u'filename': u'0596c17f-2e4c-4d43-9d77-cde8ffbde663_settings.txt', u'date-created': u'Wed Aug 17 22:00:54 UTC 2016', u'contentType': u'text/plain', u'id': u'57b4de96e4b0049e78452978', u'size': u'817'}, {u'filename': u'0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw', u'date-created': u'Wed Aug 17 22:01:14 UTC 2016', u'contentType': u'application/octet-stream', u'id': u'57b4deaae4b0049e7845297f', u'size': u'1430208000'}, {u'filename': u'foo.txt', u'date-created': u'Wed Aug 24 21:30:30 UTC 2016', u'contentType': u'text/plain', u'id': u'57be11f6e4b0049e15fe81ef', u'size': u'9'}], 'method': <Basic.Deliver(['consumer_tag=ctag1.58348bf89a444ad3a9a23e78f2fe58e2', 'delivery_tag=1', 'exchange=clowder', 'redelivered=False', 'routing_key=clowder.dataset.file.added'])>, u'secretKey': u'r1ek3rs', 'header': <BasicProperties(['content_type=application\\json', 'correlation_id=b23ae144-f1d1-46d0-a308-bf0b0011a816', 'reply_to=amq.gen-9SGIHuqFCaYqtFhQk89VCg'])>, u'host': u'http://10.211.55.9:9000', u'flags': u'', u'fileSize': u'9', u'intermediateId': u'57be11f6e4b0049e15fe81ef', 'datasetInfo': {u'description': u'', u'created': u'Wed Jul 13 18:03:45 UTC 2016', u'id': u'57868281e4b0049e260eb382', u'authorId': u'57866660292acbb6539f5e85', u'thumbnail': u'None', u'name': u'Hello World'}, 'filename': u'foo.txt', u'id': u'57be11f6e4b0049e15fe81ef', u'datasetId': u'57868281e4b0049e260eb382', 'fileid': u'57be11f6e4b0049e15fe81ef'}
found _frameIndex.txt file: /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_frameIndex.txt
found _image.jpg file: /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_image.jpg
found _raw.hdr file: /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw.hdr
found _raw file: /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw
found _metadata.json file: /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_metadata.json
found _settings.txt file: /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_settings.txt
invoking terraref.sh to create: /home/ubuntu/output/0596c17f-2e4c-4d43-9d77-cde8ffbde663.nc4
Terraref hyperspectral data workflow invoked with:
terraref.sh -d 1 -I /home/ubuntu/input -O /home/ubuntu/output
Hyperspectral workflow scripts in directory /home/ubuntu/computing-pipeline/scripts/hyperspectral
NCO version "4.6.1" from directory /srv/sw/nco-4.6.1/bin
Intermediate/temporary files written to directory /tmp
Final output stored in directory /home/ubuntu/output
Input #00: /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw
trn(in) : /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw
trn(out) : /tmp/terraref_tmp_trn.nc.pid156.fl00.tmp
ncks -O --trr_wxy=955,1600,468 --trr typ_in=NC_USHORT --trr typ_out=NC_USHORT --trr ntl_in=bil --trr ntl_out=bsq --trr_in=/home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /home/ubuntu/computing-pipeline/scripts/hyperspectral/dummy.nc /tmp/terraref_tmp_trn.nc.pid156.fl00.tmp
att(in) : /tmp/terraref_tmp_trn.nc.pid156.fl00.tmp
att(out) : /tmp/terraref_tmp_att.nc.pid156.fl00.tmp
ncatted -O --gaa terraref_script=terraref.sh --gaa terraref_hostname=eb4936de98e3 --gaa terraref_version="4.6.1" -a "Conventions,global,o,c,CF-1.5" -a "Project,global,o,c,TERRAREF" --gaa history="Wed Aug 24 21:31:06 UTC 2016: terraref.sh -d 1 -I /home/ubuntu/input -O /home/ubuntu/output" /tmp/terraref_tmp_trn.nc.pid156.fl00.tmp /tmp/terraref_tmp_att.nc.pid156.fl00.tmp
jsn(in) : /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw
jsn(out) : /tmp/terraref_tmp_jsn.nc.pid156
python /home/ubuntu/computing-pipeline/scripts/hyperspectral/JsonDealer.py /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /tmp/terraref_tmp_jsn.nc.pid156.fl00.tmp
Processing ...
Done.
mrg(in) : /tmp/terraref_tmp_jsn.nc.pid156.fl00.tmp
mrg(out) : /tmp/terraref_tmp_att.nc.pid156.fl00.tmp
ncks -A /tmp/terraref_tmp_jsn.nc.pid156.fl00.tmp /tmp/terraref_tmp_att.nc.pid156.fl00.tmp
clb(in) : /tmp/terraref_tmp_att.nc.pid156.fl00.tmp
clb(out) : /tmp/terraref_tmp_clb.nc.pid156.fl00.tmp
ncap2 -A -S /home/ubuntu/computing-pipeline/scripts/hyperspectral/terraref.nco /tmp/terraref_tmp_att.nc.pid156.fl00.tmp /tmp/terraref_tmp_att.nc.pid156.fl00.tmp;/bin/mv -f /tmp/terraref_tmp_att.nc.pid156.fl00.tmp /tmp/terraref_tmp_clb.nc.pid156.fl00.tmp
ncap2: ERROR nco_malloc() unable to allocate 5720832000 B = 5586750 kB = 5455 MB = 5 GB
ncap2: INFO NCO has reported a malloc() failure. malloc() failures usually indicate that your machine does not have enough free memory (RAM+swap) to perform the requested operation. As such, malloc() failures result from the physical limitations imposed by your hardware. Read http://nco.sf.net/nco.html#mmr for a description of NCO memory usage. The likiest case is that this problem is caused by inadequate RAM on your system, and is not an NCO bug. If so, there are two potential workarounds: First is to process your data in smaller chunks, e.g., smaller or more hyperslabs. The second is to use a machine with more free memory, so that malloc() succeeds.
Large tasks may uncover memory leaks in NCO. This is likeliest to occur with ncap2. ncap2 scripts are completely dynamic and may be of arbitrary length and complexity. A script that contains many thousands of operations may uncover a slow memory leak even though each single operation consumes little additional memory. Memory leaks are usually identifiable by their memory usage signature. Leaks cause peak memory usage to increase monotonically with time regardless of script complexity. Slow leaks are very difficult to find. Sometimes a malloc() failure is the only noticeable clue to their existence. If you have good reasons to believe that your malloc() failure is ultimately due to an NCO memory leak (rather than inadequate RAM on your system), then we would like to receive a detailed bug report.
rip(in) : /tmp/terraref_tmp_clb.nc.pid156.fl00.tmp
rip(out) : /home/ubuntu/output/0596c17f-2e4c-4d43-9d77-cde8ffbde663.nc
/bin/mv -f /tmp/terraref_tmp_clb.nc.pid156.fl00.tmp /home/ubuntu/output/0596c17f-2e4c-4d43-9d77-cde8ffbde663.nc
Cleaning-up intermediate files...
Quick views of last processed data file and its original image (if any):
ncview /home/ubuntu/output/0596c17f-2e4c-4d43-9d77-cde8ffbde663.nc &
panoply /home/ubuntu/output/0596c17f-2e4c-4d43-9d77-cde8ffbde663.nc &
open /home/ubuntu/input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_image.jpg
Elapsed time 1m13s
|
I think this memory issue raises a good question: exactly how much memory is needed, when given a certain size of input? Say for example, given a The test |
@Zodiase @czender |
@Zodiase @FlyingWithJerome @yanliu-chn @dlebauer I reduced the memory required by ncap2 from 5x sizeof(raw) to 4x sizeof(raw). best to allot 4.1x so as not to cut it too close. the footprint cannot be further reduced without significantly slowing the processing. half the memory holds the original image (promoted from short to float) and the other half holds the computed reflectance (float). so 4x raw image is the smallest natural size of the computation without using loops. Note that required memory will increase if final form of hyperspectral calibration includes new (wavelength,x,y) arrays. |
I think I've finished the extractor. See the above PR. However I did find something strange. The file uploaded by the extractor can not be deleted on the Clowder Web UI. It says no permission to delete. |
What is the use case for manually deleting something generated by the
|
@dlebauer Perhaps someone would like to delete an incorrect result (for example, resulted from malloc failures) and rerun the extractor for that dataset? |
Write access to directories used by extractors is very restricted. Only the How would the error be caught? If such an error occurs, can the extractor
|
@Zodiase @dlebauer the reason for this is that the extractors currently upload to Clowder using the API key (not user/password), which is associated with an Anonymous User. @robkooper mentioned that Luigi is working on user-specific keys so we can have the extractor upload files as the Maricopa Site user, for instance (like the raw data) and that user would be able to delete and rerun if necessary. |
The steps for converting the sensor binary output to a netcdf data product are documented (https://github.com/terraref/documentation/blob/master/hyperspectral_data_pipeline.md).
Next steps:
git clone git@github.com:/nco/nco.git cd nco configure --prefix=/usr/local make install.
The text was updated successfully, but these errors were encountered: