Skip to content

Proposed format for hyperspectral and other imaging data #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dlebauer opened this issue Oct 7, 2015 · 27 comments
Closed

Proposed format for hyperspectral and other imaging data #14

dlebauer opened this issue Oct 7, 2015 · 27 comments
Assignees

Comments

@dlebauer
Copy link
Member

dlebauer commented Oct 7, 2015

This is a proposal for spectral and imaging data to be provided as HDF-5 / NetCDF-4 data cubes for computing and downloading by end users.

Following CF naming conventions [1], these would be in a netcdf-4 compatible / well behaved hdf format. Also see [2] for example formats by NOAA

Questions to address:

  • what is the scope of data products can be produced in this format?
  • what meta-data is required?
  • what tools are available for converting to and from this format?
  • what are other options, advantages / disadvantages?

see also PecanProject/pecan#665

Radiance data

Variables

variable name units dim 1 dim 2 dim 3 dim 4 dim 5
surface_bidirectional_reflectance 0-1 lat lon time radiation_wavelength
bandwidth 0-1 lat lon time radiation_wavelength
upwelling_spectral_radiance_in_air W m-2 m-1 sr-1 lat lon time radiation_wavelength zenith_angle

note: upwelling_spectral_radiance_in_air may only be an intermediate product (and perhaps isn't exported from some sensors?) so the focus is really on the reflectance as a Level 2 product.

Dimensions

dimension units notes
time hours since 2016-01-01 first dimension
latitude degrees_north (or alt. projection_y_coordinate)
longitude degrees_east (or alt. porjection_x_coordinate below)
projection_x_coordinate m can be mapped to lat/lon with grid_mapping attribute
projection_y_coordinate m can be mapped to lat/lon with grid_mapping attribute
radiation_wavelength m
zenith_angle degrees
optional
sensor_zenith_angle degrees
platform_zenith_angle degrees

[1] http://cfconventions.org/Data/cf-standard-names/29/build/cf-standard-name-table.html
[2] http://www.nodc.noaa.gov/data/formats/netcdf/v1.1/

@dlebauer dlebauer added this to the Data Standards Version 0.1 milestone Oct 7, 2015
@czender
Copy link

czender commented Oct 7, 2015

The CF metadata look good. In the absence of strong reasons to the contrary, I recommend dimension order as time,lat,lon,wavelength, angle. Keep time as the first dimension if possible.

@dlebauer dlebauer changed the title Proposed format for hyperspectral data Proposed format for hyperspectral and other imaging data Oct 20, 2015
@tedhabermann
Copy link

Great discussion of decisions about formats and metadata. While several aspects of the CF conventions are really useful and helpful, there are some fairly important caveats. These include lack of support for groups in the files (typically a requirement these days) and a sometimes difficult process of agreeing on names. These are related to the existing CF community tools and to the focus of the CF community on climate and forecast names.

You might take a look at the CSDMS standard names as an example of an approach that is different than CF (http://csdms.colorado.edu/wiki/CSDMS_Standard_Names#.C2.A0_CSDMS_Standard_Names). It has a standard set of rules for creating and interpreting names. This is more flexible than the community approval mechanism used by CF.

I would suggest that you want a metadata model that supports names from multiple communities rather than just one (ISO does that). It also has the advantage that we have a (proposed) standard approach for adding ISO-compliant metadata to HDF files. That can help ameliorate differences between collection and granule metadata.

@czender
Copy link

czender commented Nov 16, 2015

Just noticed that the draft units have slight typos: "degrees north" should be "degrees_north" etc. for UDUnits compatibility.

@ghost ghost removed this from the Data Standards Version 0.1 milestone Nov 16, 2015
@dlebauer
Copy link
Member Author

@tedhabermann thank you for pointing out CSDMS - is this what you are referring to when you say "we have a (proposed) standard approach for adding ISO-compliant metadata to HDF files"?

Given the diversity of disciplines, it is clear that we will have to support multiple naming conventions. Any advice on how to support multiple vocabularies would be appreciated - I proposed amending our database with a simple 'thesaurus' lookup table (#18 (comment)) and I would appreciate feedback on whether the solution is sufficient and robust, or not, and if there is an existing framework for supporting multiple vocabularies.

To be clear, proposing new variables for CF is not nearly as much of a priority as coming up with a solution for our project that is clearly defined and can be translated later if necessary. To this end, I have liberally begun making up names following the CF guidelines.

I'll take a look at your webinar "Metadata Recommendations, Dialects, Evaluation & Improvement" from last month, and hopefully find some answers as well as a better understanding of these issues.

@dlebauer
Copy link
Member Author

@czender fixed - thanks

@tedhabermann
Copy link

@dlebauer - using standard names is separate from including ISO metadata in the HDF files. See slides on ISO in HDF (http://www.slideshare.net/tedhabermann/granules-and-iso-metadata).

@dlebauer
Copy link
Member Author

dlebauer commented Dec 9, 2015

per discussions with @serbinsh and @ashiklom, will need to provide (at a minimum) either sensitivity or full width / half max for each band as meta-data for downstream analyses and including binning and assimilation.

@ashiklom
Copy link
Member

Following up on @dlebauer -- I've already written a few functions for spectral convolution for a bunch of common remote sensing platforms like Landsat and MODIS in the PEcAn RTM package Shawn, Toni, and I are developing. From that link, if you go to data/sensor.rsr.RData, you can see how I've been storing this information. The code for generating those data is in R/generate-rsr.R. The way I've put things together by no means has to be canon for this project, but I think it might be a good place to start.

@czender
Copy link

czender commented Jan 26, 2016

I looked at the CSDMS conventions for names. I do not see any compelling reason to propose a bunch of names to either them or CF right now. It would slow-down the prototyping. Once the workflow is established we can think about sustainable naming strategies. For now, choosing CF'ish names seems good enough.

@dlebauer
Copy link
Member Author

(from #2)

My inclination is to parse all this JSON metadata into an attribute tree in the netCDF4/HDF file.
The file level-0 (root) group would contain a level-1 group called "lemnatec_measurement_metadata", which would contain six level-2 groups "user_given_data"..."measurement_additional_data" and each of those groups would contain group attributes for the fields listed above. We will use the appropriate atomic data type for each of the values encountered, e.g., String for most text, float for 32-bit reals, unsigned byte for boolean,... Some of the "gantry variable data" (like x,y,z location) will need to be variables not (or as well as) attributes, so that their time-varying values can be easily manipulated by data processing tools. They may become record variables with time as the unlimited dimension.

I think you have the right idea of parsing this to attributes, but I will note that the .json files are not designed to meet a standard metadata convention. But presumably a CF-compliant file will? Ultimately, we will want them to be compliant or interoperable with an FGDC-endorsed ISO standard (https://www.fgdc.gov/metadata/geospatial-metadata-standards). Does that sound reasonable?

Regarding gantry variable data like x,y,z location and time, I think it would be useful to store this as a meta-data attribute in addition to either dimensions or variables. When you say 'variables' do you mean to store the single value of x,y,z in the metadata as a set of variables? Ultimately these will be used to define the coordinates of each pixel. This is something that I don't understand well and don't know if there is an easy answer. As I understand it, we could transform the images to a flat xy plane that would allow gridded dimensions, but if we map to xyz then they would be treated as variables. I'd appreciate your thoughts on this and if you want to chat off line let me know.

@tedhabermann
Copy link

Charlie uses the phrase "propose a bunch of names" in the context of CSDMS and CF, but that is not correct. The approach to naming in CSDMS and CF are fundamentally different. You propose names to CF and there is typically a long and arduous review process, particularly for names that are outside of the context of "Climate and Forecast". In CSDMS, you create names that are consistent with a set of rules. There is no review process and no associated delay. That is why I suggest you think about using the CSDMS approach as the base approach. It gives you control over the names instead of a committee of climate/forecast people.

@tedhabermann
Copy link

On JSON vs. XML - Both of these representations are important in different contexts. XML is much more prevalent in the metadata world than JSON. If you think only in JSON then you are losing a lot of standard capability.

At HDF we are working with both. What is really critical is the naming of the elements. For sensor metadata an approach based on soft-typing is usually more flexible. Instead of standardizing a set of parameter names standardize a method of describing parameters then use that everywhere. This approach has been used quite a bit in sensorML with a fair amount of success. Hard-typed names are almost always a problem in the long run.

The tools I am working on take ISO compliant XML and transform it into NcML which can be imported into an HDF file. This works with all (AFAIK) ISO metadata standards. Might even work with SensorML (have not looked at that). It gives you a standard set of names and paths for ISO content in your files. That is the important piece here - tools need to know standard paths for any metadata content.

@czender
Copy link

czender commented Jan 26, 2016

The CSDMS approach looks fine to me, especially for quantities that are not likely to be covered by CF. And much of the data generated by this project is not typically thought of as covered by CF, though CF is trying to expand its domain. In any case, the "standard" names we are talking about are much too long to be useful to humans. They are generally stored as a "standard_name" or similar attribute of the field they describe. And the field has a much shorter and easier primary name, e.g., T for temperature. It is too early to worry about what the longer standard names will be until we have a workflow and have looked more carefully at what others call similar quantities.

@tedhabermann
Copy link

Soapbox warning! I appreciate Charlie's point of view on this, but, at the same time, wonder about how many decisions have been made with the logic "it will slow us down" or "lets do this now, then fix it later". I suspect that there have been many such decisions and that many of them have never been fixed as we move on to the next short-term event. These decisions ultimately lead to inoperable datasets that users have to deal with. If we are interested in evolving the culture, we need to do it at the beginning. Then the team gets used to meaningful parameter names instead of abbreviations and we move on from there.

@dlebauer
Copy link
Member Author

@czender @tedhabermann

The soapbox here is always open! I really appreciate this discussion and agree with both of you.

Our first task is to put it somewhere safe and make it useable. Our fairly rigid timeline for data product development still gives time for iterative development: alpha release in 2016, beta in 2017, stable in 2018. Our goal is to allow for multiple rounds of feedback. After each release there will be hands-on workshops aimed at getting feedback on whats useful and whats not, and what we should change or create moving forward.

Instead of standardizing a set of parameter names standardize a method of describing parameters then use that everywhere.

While I think CF may cover most of the variables in the sensor data products, I have followed Guidelines for Construction of CF Standard Names to develop (i.e. make up without intention of requesting approval) a list of CF-style names for variables related to ecosystem and plant physiology that aren't Here is the list of proposed names. Changes / comments welcome. It not perfect but hopefully will give us what we need for now.

We will certainly need to support many vocabularies using a thesaurus (many-to-many lookup to with the primary variables table in betydb.org). CSDMS sounds like a good candidate. For data interoperability we will also support ICASA from the USDA and AgMIP (agricultural modeling community), but that doesn't cover sensors.

@rjstrand
Copy link

rjstrand commented Feb 2, 2016

Greetings from the Gantry...

Attached is the first crack by Markus Radermacher on the metadata framework for imagery.

  • Bob

As
2016 02Feb 02 Metadata Example.json.txt

@ashiklom
Copy link
Member

ashiklom commented Feb 3, 2016

How do we deal with the relative spectral response (RSR) curves of
different sensors? Embedding that information directly in the JSON might be
cumbersome, since the data are fairly large matrices. A good approach might
be to store the RSR for each sensor in a file that's linked or otherwise
specified in the JSON.

@dlebauer
Copy link
Member Author

dlebauer commented Feb 3, 2016

A good approach might be to store the RSR for each sensor in a file that's linked or otherwise
specified in the JSON.

This is what @robkooper originally suggested, making the calibration docs accessible via web-based API. However, @czender suggested including it in each file to make the file completely self-documenting, with the rationale that while this is lots of information it is much smaller by the data. This also seems sensible.

@rjstrand do you have the spectral response and other calibration data for all of the sensors?

@czender
Copy link

czender commented Feb 3, 2016

is the RSR (expected to be) constant-in-time, i.e., same RSR per sensor forever? and how many numbers characterize the RSR at each wavelength?

@ashiklom
Copy link
Member

ashiklom commented Feb 4, 2016

Ideally, yeah, it would be constant, but sensors could drift with age,
which might be nice to capture. @serbinsh might have more insight about
that.

As to how many numbers, depends on the precision of calibration. Imaging
spectrometers like AVIRIS sometimes only report full width half max, which
is a single number per wavelength. But then you have to assume some kind of
distribution which won't be quite as precise. I think a decent rule of
thumb might be about twice the reported bandwidth, so for instance, 20
values per band for a 10nm instrument like AVIRIS (though each value is
actually a wavelength-reaponse pairs, so two numbers).

On Wed, Feb 3, 2016, 6:57 PM Charlie Zender notifications@github.com
wrote:

is the RSR (expected to be) constant-in-time, i.e., same RSR per sensor
forever? and how many numbers characterize the RSR at each wavelength?


Reply to this email directly or view it on GitHub
#14 (comment)
.

@dlebauer
Copy link
Member Author

dlebauer commented Feb 4, 2016

Here are the spec sheets for the spectrometers for reference

SWIR.pdf
VNIR 2015.pdf

And the sensor suite (also adding a sonic anemometer)

image
image
image

@ashiklom
Copy link
Member

ashiklom commented Feb 4, 2016

Is that all the information they provide? If so, it may be worth contacting
the manufacturer for more precise specs. They don't report spectral
resolution (I.e. wavelength nm per band) for the hyperspectral imagers, but
that's a critical piece of information. They may be reported as 1 nm, but
that's probably after interpolation, since even the $100K ASD instruments
@serbinsh has have a true resolution of somewhere between 1 and 10 nm
depending on the wavelength that unfortunately get interpolated to 1nm by
there internal spectrometer firmware. There's a good chance we'll be stuck
just assuming it's 1 nm, which is OK since that's as good as our RTM models
can predict, but I think it's at least worth looking into since our
Bayesian approaches can leverage those observation uncertainties.

On Wed, Feb 3, 2016, 9:53 PM David LeBauer notifications@github.com wrote:

Here are the spec sheets for the spectrometers for reference

SWIR.pdf
https://github.com/terraref/reference-data/files/116816/SWIR.pdf
VNIR 2015.pdf
https://github.com/terraref/reference-data/files/116817/VNIR.2015.pdf

And the sensor suite (also adding a sonic anemometer)

[image: image]
https://cloud.githubusercontent.com/assets/464871/12804349/1ee19a8c-cab8-11e5-9d6e-b6607f60a94f.png
[image: image]
https://cloud.githubusercontent.com/assets/464871/12804350/1effa7ca-cab8-11e5-95fc-666705047964.png


Reply to this email directly or view it on GitHub
#14 (comment)
.

@solmazhajmohammadi
Copy link

Here are datasheets for the Hyperspec imaging sensors:

Headwall VNIR imaging spectrometer 380-1000nm.pdf
Headwal SWIR imaging spectrometer 900-2500nm.pdf

@dlebauer
Copy link
Member Author

dlebauer commented Feb 4, 2016

@solmazhajmohammadi those were the same as attached to my message (in addition to the images of the full sensor suite.

But do you know of or can you get the files from the sensor calibration done by the manufacturer? (I think this was done in December) and any subsequent testing?

@solmazhajmohammadi
Copy link

@dlebauer Not so far. I'll update you once I get further information.

@rjstrand
Copy link

rjstrand commented Feb 4, 2016

@ALL The calibration information for the hyperspec devices was never provided in a digital format. We are in contact with Headwall. Please be patient.

@dlebauer
Copy link
Member Author

dlebauer commented Apr 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants