Proposed format for hyperspectral and other imaging data #14

dlebauer · 2015-10-07T02:53:31Z

This is a proposal for spectral and imaging data to be provided as HDF-5 / NetCDF-4 data cubes for computing and downloading by end users.

Following CF naming conventions [1], these would be in a netcdf-4 compatible / well behaved hdf format. Also see [2] for example formats by NOAA

Questions to address:

what is the scope of data products can be produced in this format?
what meta-data is required?
what tools are available for converting to and from this format?
what are other options, advantages / disadvantages?

Radiance data

Variables

variable name	units	dim 1	dim 2	dim 3	dim 4	dim 5
surface_bidirectional_reflectance	0-1	lat	lon	time	radiation_wavelength
bandwidth	0-1	lat	lon	time	radiation_wavelength
upwelling_spectral_radiance_in_air	W m-2 m-1 sr-1	lat	lon	time	radiation_wavelength	zenith_angle

note: upwelling_spectral_radiance_in_air may only be an intermediate product (and perhaps isn't exported from some sensors?) so the focus is really on the reflectance as a Level 2 product.

Dimensions

dimension	units	notes
time	hours since 2016-01-01	first dimension
latitude	degrees_north	(or alt. projection_y_coordinate)
longitude	degrees_east	(or alt. porjection_x_coordinate below)
projection_x_coordinate	m	can be mapped to lat/lon with grid_mapping attribute
projection_y_coordinate	m	can be mapped to lat/lon with grid_mapping attribute
radiation_wavelength	m
zenith_angle	degrees
optional
sensor_zenith_angle	degrees
platform_zenith_angle	degrees

[1] http://cfconventions.org/Data/cf-standard-names/29/build/cf-standard-name-table.html
[2] http://www.nodc.noaa.gov/data/formats/netcdf/v1.1/

czender · 2015-10-07T03:50:40Z

The CF metadata look good. In the absence of strong reasons to the contrary, I recommend dimension order as time,lat,lon,wavelength, angle. Keep time as the first dimension if possible.

tedhabermann · 2015-11-16T19:34:32Z

Great discussion of decisions about formats and metadata. While several aspects of the CF conventions are really useful and helpful, there are some fairly important caveats. These include lack of support for groups in the files (typically a requirement these days) and a sometimes difficult process of agreeing on names. These are related to the existing CF community tools and to the focus of the CF community on climate and forecast names.

You might take a look at the CSDMS standard names as an example of an approach that is different than CF (http://csdms.colorado.edu/wiki/CSDMS_Standard_Names#.C2.A0_CSDMS_Standard_Names). It has a standard set of rules for creating and interpreting names. This is more flexible than the community approval mechanism used by CF.

I would suggest that you want a metadata model that supports names from multiple communities rather than just one (ISO does that). It also has the advantage that we have a (proposed) standard approach for adding ISO-compliant metadata to HDF files. That can help ameliorate differences between collection and granule metadata.

czender · 2015-11-16T19:57:40Z

Just noticed that the draft units have slight typos: "degrees north" should be "degrees_north" etc. for UDUnits compatibility.

dlebauer · 2015-11-19T06:35:15Z

@tedhabermann thank you for pointing out CSDMS - is this what you are referring to when you say "we have a (proposed) standard approach for adding ISO-compliant metadata to HDF files"?

Given the diversity of disciplines, it is clear that we will have to support multiple naming conventions. Any advice on how to support multiple vocabularies would be appreciated - I proposed amending our database with a simple 'thesaurus' lookup table (#18 (comment)) and I would appreciate feedback on whether the solution is sufficient and robust, or not, and if there is an existing framework for supporting multiple vocabularies.

To be clear, proposing new variables for CF is not nearly as much of a priority as coming up with a solution for our project that is clearly defined and can be translated later if necessary. To this end, I have liberally begun making up names following the CF guidelines.

I'll take a look at your webinar "Metadata Recommendations, Dialects, Evaluation & Improvement" from last month, and hopefully find some answers as well as a better understanding of these issues.

dlebauer · 2015-11-19T06:35:29Z

@czender fixed - thanks

tedhabermann · 2015-11-21T17:38:53Z

@dlebauer - using standard names is separate from including ISO metadata in the HDF files. See slides on ISO in HDF (http://www.slideshare.net/tedhabermann/granules-and-iso-metadata).

dlebauer · 2015-12-09T21:01:56Z

per discussions with @serbinsh and @ashiklom, will need to provide (at a minimum) either sensitivity or full width / half max for each band as meta-data for downstream analyses and including binning and assimilation.

ashiklom · 2015-12-28T05:18:29Z

Following up on @dlebauer -- I've already written a few functions for spectral convolution for a bunch of common remote sensing platforms like Landsat and MODIS in the PEcAn RTM package Shawn, Toni, and I are developing. From that link, if you go to data/sensor.rsr.RData, you can see how I've been storing this information. The code for generating those data is in R/generate-rsr.R. The way I've put things together by no means has to be canon for this project, but I think it might be a good place to start.

czender · 2016-01-26T00:14:24Z

I looked at the CSDMS conventions for names. I do not see any compelling reason to propose a bunch of names to either them or CF right now. It would slow-down the prototyping. Once the workflow is established we can think about sustainable naming strategies. For now, choosing CF'ish names seems good enough.

dlebauer · 2016-01-26T00:52:11Z

(from #2)

My inclination is to parse all this JSON metadata into an attribute tree in the netCDF4/HDF file.
The file level-0 (root) group would contain a level-1 group called "lemnatec_measurement_metadata", which would contain six level-2 groups "user_given_data"..."measurement_additional_data" and each of those groups would contain group attributes for the fields listed above. We will use the appropriate atomic data type for each of the values encountered, e.g., String for most text, float for 32-bit reals, unsigned byte for boolean,... Some of the "gantry variable data" (like x,y,z location) will need to be variables not (or as well as) attributes, so that their time-varying values can be easily manipulated by data processing tools. They may become record variables with time as the unlimited dimension.

I think you have the right idea of parsing this to attributes, but I will note that the .json files are not designed to meet a standard metadata convention. But presumably a CF-compliant file will? Ultimately, we will want them to be compliant or interoperable with an FGDC-endorsed ISO standard (https://www.fgdc.gov/metadata/geospatial-metadata-standards). Does that sound reasonable?

Regarding gantry variable data like x,y,z location and time, I think it would be useful to store this as a meta-data attribute in addition to either dimensions or variables. When you say 'variables' do you mean to store the single value of x,y,z in the metadata as a set of variables? Ultimately these will be used to define the coordinates of each pixel. This is something that I don't understand well and don't know if there is an easy answer. As I understand it, we could transform the images to a flat xy plane that would allow gridded dimensions, but if we map to xyz then they would be treated as variables. I'd appreciate your thoughts on this and if you want to chat off line let me know.

tedhabermann · 2016-01-26T16:13:58Z

Charlie uses the phrase "propose a bunch of names" in the context of CSDMS and CF, but that is not correct. The approach to naming in CSDMS and CF are fundamentally different. You propose names to CF and there is typically a long and arduous review process, particularly for names that are outside of the context of "Climate and Forecast". In CSDMS, you create names that are consistent with a set of rules. There is no review process and no associated delay. That is why I suggest you think about using the CSDMS approach as the base approach. It gives you control over the names instead of a committee of climate/forecast people.

tedhabermann · 2016-01-26T17:17:40Z

On JSON vs. XML - Both of these representations are important in different contexts. XML is much more prevalent in the metadata world than JSON. If you think only in JSON then you are losing a lot of standard capability.

At HDF we are working with both. What is really critical is the naming of the elements. For sensor metadata an approach based on soft-typing is usually more flexible. Instead of standardizing a set of parameter names standardize a method of describing parameters then use that everywhere. This approach has been used quite a bit in sensorML with a fair amount of success. Hard-typed names are almost always a problem in the long run.

The tools I am working on take ISO compliant XML and transform it into NcML which can be imported into an HDF file. This works with all (AFAIK) ISO metadata standards. Might even work with SensorML (have not looked at that). It gives you a standard set of names and paths for ISO content in your files. That is the important piece here - tools need to know standard paths for any metadata content.

czender · 2016-01-26T18:03:43Z

The CSDMS approach looks fine to me, especially for quantities that are not likely to be covered by CF. And much of the data generated by this project is not typically thought of as covered by CF, though CF is trying to expand its domain. In any case, the "standard" names we are talking about are much too long to be useful to humans. They are generally stored as a "standard_name" or similar attribute of the field they describe. And the field has a much shorter and easier primary name, e.g., T for temperature. It is too early to worry about what the longer standard names will be until we have a workflow and have looked more carefully at what others call similar quantities.

tedhabermann · 2016-01-27T15:41:31Z

Soapbox warning! I appreciate Charlie's point of view on this, but, at the same time, wonder about how many decisions have been made with the logic "it will slow us down" or "lets do this now, then fix it later". I suspect that there have been many such decisions and that many of them have never been fixed as we move on to the next short-term event. These decisions ultimately lead to inoperable datasets that users have to deal with. If we are interested in evolving the culture, we need to do it at the beginning. Then the team gets used to meaningful parameter names instead of abbreviations and we move on from there.

dlebauer · 2016-01-28T05:59:17Z

@czender @tedhabermann

The soapbox here is always open! I really appreciate this discussion and agree with both of you.

Our first task is to put it somewhere safe and make it useable. Our fairly rigid timeline for data product development still gives time for iterative development: alpha release in 2016, beta in 2017, stable in 2018. Our goal is to allow for multiple rounds of feedback. After each release there will be hands-on workshops aimed at getting feedback on whats useful and whats not, and what we should change or create moving forward.

Instead of standardizing a set of parameter names standardize a method of describing parameters then use that everywhere.

While I think CF may cover most of the variables in the sensor data products, I have followed Guidelines for Construction of CF Standard Names to develop (i.e. make up without intention of requesting approval) a list of CF-style names for variables related to ecosystem and plant physiology that aren't Here is the list of proposed names. Changes / comments welcome. It not perfect but hopefully will give us what we need for now.

We will certainly need to support many vocabularies using a thesaurus (many-to-many lookup to with the primary variables table in betydb.org). CSDMS sounds like a good candidate. For data interoperability we will also support ICASA from the USDA and AgMIP (agricultural modeling community), but that doesn't cover sensors.

rjstrand · 2016-02-02T13:50:23Z

Greetings from the Gantry...

Attached is the first crack by Markus Radermacher on the metadata framework for imagery.

Bob

As
2016 02Feb 02 Metadata Example.json.txt

ashiklom · 2016-02-03T20:20:52Z

How do we deal with the relative spectral response (RSR) curves of
different sensors? Embedding that information directly in the JSON might be
cumbersome, since the data are fairly large matrices. A good approach might
be to store the RSR for each sensor in a file that's linked or otherwise
specified in the JSON.

dlebauer · 2016-02-03T20:46:14Z

A good approach might be to store the RSR for each sensor in a file that's linked or otherwise
specified in the JSON.

This is what @robkooper originally suggested, making the calibration docs accessible via web-based API. However, @czender suggested including it in each file to make the file completely self-documenting, with the rationale that while this is lots of information it is much smaller by the data. This also seems sensible.

@rjstrand do you have the spectral response and other calibration data for all of the sensors?

czender · 2016-02-03T23:56:59Z

is the RSR (expected to be) constant-in-time, i.e., same RSR per sensor forever? and how many numbers characterize the RSR at each wavelength?

ashiklom · 2016-02-04T01:03:58Z

Ideally, yeah, it would be constant, but sensors could drift with age,
which might be nice to capture. @serbinsh might have more insight about
that.

As to how many numbers, depends on the precision of calibration. Imaging
spectrometers like AVIRIS sometimes only report full width half max, which
is a single number per wavelength. But then you have to assume some kind of
distribution which won't be quite as precise. I think a decent rule of
thumb might be about twice the reported bandwidth, so for instance, 20
values per band for a 10nm instrument like AVIRIS (though each value is
actually a wavelength-reaponse pairs, so two numbers).

On Wed, Feb 3, 2016, 6:57 PM Charlie Zender notifications@github.com
wrote:

is the RSR (expected to be) constant-in-time, i.e., same RSR per sensor
forever? and how many numbers characterize the RSR at each wavelength?

—
Reply to this email directly or view it on GitHub
#14 (comment)
.

dlebauer · 2016-02-04T02:53:23Z

Here are the spec sheets for the spectrometers for reference

SWIR.pdf
VNIR 2015.pdf

And the sensor suite (also adding a sonic anemometer)

ashiklom · 2016-02-04T03:10:01Z

Is that all the information they provide? If so, it may be worth contacting
the manufacturer for more precise specs. They don't report spectral
resolution (I.e. wavelength nm per band) for the hyperspectral imagers, but
that's a critical piece of information. They may be reported as 1 nm, but
that's probably after interpolation, since even the $100K ASD instruments
@serbinsh has have a true resolution of somewhere between 1 and 10 nm
depending on the wavelength that unfortunately get interpolated to 1nm by
there internal spectrometer firmware. There's a good chance we'll be stuck
just assuming it's 1 nm, which is OK since that's as good as our RTM models
can predict, but I think it's at least worth looking into since our
Bayesian approaches can leverage those observation uncertainties.

On Wed, Feb 3, 2016, 9:53 PM David LeBauer notifications@github.com wrote:

Here are the spec sheets for the spectrometers for reference

SWIR.pdf
https://github.com/terraref/reference-data/files/116816/SWIR.pdf
VNIR 2015.pdf
https://github.com/terraref/reference-data/files/116817/VNIR.2015.pdf

And the sensor suite (also adding a sonic anemometer)

[image: image]
https://cloud.githubusercontent.com/assets/464871/12804349/1ee19a8c-cab8-11e5-9d6e-b6607f60a94f.png
[image: image]
https://cloud.githubusercontent.com/assets/464871/12804350/1effa7ca-cab8-11e5-95fc-666705047964.png

—
Reply to this email directly or view it on GitHub
#14 (comment)
.

solmazhajmohammadi · 2016-02-04T04:02:35Z

Here are datasheets for the Hyperspec imaging sensors:

Headwall VNIR imaging spectrometer 380-1000nm.pdf
Headwal SWIR imaging spectrometer 900-2500nm.pdf

dlebauer · 2016-02-04T04:14:24Z

@solmazhajmohammadi those were the same as attached to my message (in addition to the images of the full sensor suite.

But do you know of or can you get the files from the sensor calibration done by the manufacturer? (I think this was done in December) and any subsequent testing?

solmazhajmohammadi · 2016-02-04T04:32:58Z

@dlebauer Not so far. I'll update you once I get further information.

rjstrand · 2016-02-04T14:53:42Z

@ALL The calibration information for the hyperspec devices was never provided in a digital format. We are in contact with Headwall. Please be patient.

dlebauer · 2016-04-03T02:59:46Z

@markus-radermacher-lemnatec

dlebauer added this to the Data Standards Version 0.1 milestone Oct 7, 2015

dlebauer added the sensors label Oct 8, 2015

dlebauer assigned rjstrand Oct 15, 2015

dlebauer changed the title ~~Proposed format for hyperspectral data~~ Proposed format for hyperspectral and other imaging data Oct 20, 2015

dlebauer mentioned this issue Nov 16, 2015

Review existing standards, conventions, and ontologies. Which should we use, adopt, support, learn from? #5

Closed

3 tasks

ghost removed this from the Data Standards Version 0.1 milestone Nov 16, 2015

dlebauer added the 1 - Ready label Dec 3, 2015

dlebauer mentioned this issue Dec 5, 2015

Define pipeline for converting bin files to NetCDF/HDF5 data products and transferring from MAC to NCSA terraref/computing-pipeline#38

Closed

4 tasks

dlebauer mentioned this issue Jan 26, 2016

Determine meta-data format for raw data from Lemnatec #2

Closed

1 task

dlebauer added meta-data labels Jan 28, 2016

dlebauer mentioned this issue Mar 17, 2016

Changes to data stream from environmental sensors #26

Closed

dlebauer mentioned this issue Apr 12, 2016

Feedback on initial draft of hyperspectral data file format #27

Closed

dlebauer mentioned this issue May 6, 2016

How to calculate downwelling spectral radiances #30

Closed

dlebauer mentioned this issue May 24, 2016

Vocabularies / ontologies to support, and framework for linking synonyms #31

Closed

dlebauer closed this as completed Jul 11, 2016

dlebauer added 4 - Done and removed 1 - Ready labels Jul 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed format for hyperspectral and other imaging data #14

Proposed format for hyperspectral and other imaging data #14

dlebauer commented Oct 7, 2015 •

edited

Loading

czender commented Oct 7, 2015

tedhabermann commented Nov 16, 2015

czender commented Nov 16, 2015

dlebauer commented Nov 19, 2015

dlebauer commented Nov 19, 2015

tedhabermann commented Nov 21, 2015

dlebauer commented Dec 9, 2015

ashiklom commented Dec 28, 2015

czender commented Jan 26, 2016

dlebauer commented Jan 26, 2016

tedhabermann commented Jan 26, 2016

tedhabermann commented Jan 26, 2016

czender commented Jan 26, 2016

tedhabermann commented Jan 27, 2016

dlebauer commented Jan 28, 2016

rjstrand commented Feb 2, 2016

ashiklom commented Feb 3, 2016

dlebauer commented Feb 3, 2016

czender commented Feb 3, 2016

ashiklom commented Feb 4, 2016

dlebauer commented Feb 4, 2016

ashiklom commented Feb 4, 2016

solmazhajmohammadi commented Feb 4, 2016

dlebauer commented Feb 4, 2016

solmazhajmohammadi commented Feb 4, 2016

rjstrand commented Feb 4, 2016

dlebauer commented Apr 3, 2016

Proposed format for hyperspectral and other imaging data #14

Proposed format for hyperspectral and other imaging data #14

Comments

dlebauer commented Oct 7, 2015 • edited Loading

Radiance data

Variables

Dimensions

czender commented Oct 7, 2015

tedhabermann commented Nov 16, 2015

czender commented Nov 16, 2015

dlebauer commented Nov 19, 2015

dlebauer commented Nov 19, 2015

tedhabermann commented Nov 21, 2015

dlebauer commented Dec 9, 2015

ashiklom commented Dec 28, 2015

czender commented Jan 26, 2016

dlebauer commented Jan 26, 2016

tedhabermann commented Jan 26, 2016

tedhabermann commented Jan 26, 2016

czender commented Jan 26, 2016

tedhabermann commented Jan 27, 2016

dlebauer commented Jan 28, 2016

rjstrand commented Feb 2, 2016

ashiklom commented Feb 3, 2016

dlebauer commented Feb 3, 2016

czender commented Feb 3, 2016

ashiklom commented Feb 4, 2016

dlebauer commented Feb 4, 2016

ashiklom commented Feb 4, 2016

solmazhajmohammadi commented Feb 4, 2016

dlebauer commented Feb 4, 2016

solmazhajmohammadi commented Feb 4, 2016

rjstrand commented Feb 4, 2016

dlebauer commented Apr 3, 2016

dlebauer commented Oct 7, 2015 •

edited

Loading