Skip to content

Standardize geotiff / image metadata to be consistent w/ netcdf CF approach #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 4 tasks
dlebauer opened this issue Mar 2, 2017 · 9 comments
Open
3 of 4 tasks
Assignees

Comments

@dlebauer
Copy link
Member

dlebauer commented Mar 2, 2017

geoTIFF files should have useful metadata that is consistent with the CF approach used for met and hyperspectral data; should also comply w/ existing OGC standards

Completion Criteria

  • make sure all level 1 image data are in geoTIFF format
  • define standard format for geoTIFF / image files
  • added to extractor
  • old geotiff files updated
@dlebauer
Copy link
Member Author

@yanliu-chn could you please work on defining the geotiff standard format?

@max-zilla
Copy link
Contributor

The code used in terrautils will enforce a standard method for generating geotiffs:
#308

Will need help from others to enforce CF standards however.

@ghost ghost added the help wanted label May 17, 2017
@dlebauer
Copy link
Member Author

Who takes the lead on this and when can it be finished (please add a milestone for May or June or ...)

@max-zilla
Copy link
Contributor

Based on other discussions I think it would make sense for @craig-willis to take the lead on this, but I will talk more about this/terrautils at the meeting today.

@ghost ghost assigned craig-willis and unassigned yanliu-chn May 18, 2017
@ghost ghost removed the help wanted label May 18, 2017
@ghost ghost added this to the June 2017 milestone Jun 1, 2017
@ghost ghost added the sensor/metadata label Jun 1, 2017
@craig-willis
Copy link
Contributor

@dlebauer Is there anything specific you're looking for in terms of metadata? Looking at the EnvironmentLogger and hyperspectral nc files, aside from variables I see primarily sensor information.

@dlebauer
Copy link
Member Author

See also related issue exists for the point cloud data. #257. My comment there was "Goal is for (raster, point cloud) files to differ where it is useful, but have similar interfaces where applicable."

Here are some examples:

  • are files OGC compliant?
  • where applicable, and to the extent practical, is the following information available in a consistent format?
    • coordinate systems variables names, units,
    • time consistent across data products
    • provenance (which algorithms / versions were used; which sensor(s) generated the dataset)
    • qa/qc: assumptions / limitations / tests that have been performed

@craig-willis
Copy link
Contributor

Thanks, @dlebauer.

  • Looking at the various OGC tests, I don't see one specifically for GeoTIFF. I've created a TeamEngine account, which appears to focus on OGC web services. There is a test for GeoPackage, and it is possible to translate GeoTIFF to GeoPackage via the gdal_translate, but I think we'd be testing gdals compliance, not ours. Aside from simply using GDAL 2.x, Is there a specific OGC compliance test that I should be looking at?
  • The GeoTIFF contains the coordinate system information (see below)
  • The time, provenance, and qa/qc information is currently not included and I'm looking into this now.

I've been looking at the CF conventions for time (and I believe you had feedback on the time_utc variable we're currently using). CF conventions define a "time coordinate" (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#time-coordinate), but not a timestamp in the way we've defined. Is it sufficient to use the UTC timestamp with offset ISO-8601 subset? Is the field name "time_utc" problematic?

$ gdalinfo file.tif:
...
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
...

@dlebauer
Copy link
Member Author

dlebauer commented Aug 23, 2017

time_utc

I've always found the CF convention of time (units of <interval> since <reference date>) to be cumbersome, so I have no issue with using a timestamp. The only issue I see with time_utc is that it is more difficult for users to interpret than the local time, which can be represented in ISO-8601 format as YYYY-MM-DDTHH:MM-HH:MM like 2007-04-05T12:30-02:00. My understanding is that this is how we are storing data in the start_time and end_time field in geostreams.

My original vision was that using gdal_translate from .tiff to .nc or .nc to .tiff would generate files with similar structure. So if this were from the FLIR camera, there would be a field with information about the variable represented by the raster layer in the image - name = temperature, units = C, dimensions = lat,lon etc.

But for now, the key will be to have an OGC-compliant file with the required information in external metadata.

What I had in mind for standards compliance was something like what is described in Annex A ("Annex A lists the conformance tests which shall be exercised on any software artifact
claiming to implement GMLCOV for GeoTIFF") of the OGC GeoTIFF standards document
12-100r1_OGC_GML_Application_Schema_-Coverages-_GeoTIFF_Coverage_Encoding_Profile.pdf
.

But we should also focus on what is useful / necessary to meet the end-user needs (which I think can be met with well structured file-associated metadata in Clowder and geostreams).


For reference, here is an overview of the information in a MODIS hdf5 dataset. Like the netcdf, it also contains information about each layer in the file, the bounding box, the processing provenance, quality control &c. https://ladsweb.modaps.eosdis.nasa.gov/api/v1/filespec/collection=6&product=MOD13Q1.

When I ask MODIS for geotiff data these fields do not appear to propagate into metadata that a program like ArcGIS can read (or exif for that matter) so I am not sure if it is dropped. e.g.
GTiff.tar.gz from https://modis.ornl.gov/subsetdata/23Aug2017_17:04:58_019465197L35.958767L-84.287433S25L25_MOD13Q1/

(here is a datset that covers the field scanner https://modis.ornl.gov/subsetdata/23Aug2017_17:34:58_983339455L33.07558L-111.97489S9L9_MYD13Q1/)

@craig-willis
Copy link
Contributor

craig-willis commented Aug 29, 2017

@dlebauer Thanks for the details. A few comments/questions:

  • The FLIR example is helpful. This means that the extractors will need to each provide a set of variables to the create_geotiff method. In conversations with Max, the georeferencing information can be calculated at a higher level (i.e., put in the metadata) -- although we have several examples where it's still calculated local to the extractor (e.g., hyperspectral).
  • The OCG "Annex A" conformance tests seem to be largely related to the GML coverage format. I may be misunderstanding, but the relevant test seems to be A.1.4: "Follow GeoTIFF specification". (Most of the OGC compliance tests seem to revolve around WCS services). I'm still at the point where I think using GDAL means we comply with OGC requirements for GeoTIFF.
  • The MODIS examples are also very helpful, thank you.

@dlebauer dlebauer removed this from the June 2017 milestone Apr 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants