Mass conversion of raw data to plot-level tiles #265

max-zilla · 2017-02-27T19:13:52Z

in brief discussion between @dlebauer and I last Friday, we talked pros & cons of just splitting all our imagery data into plot tiles, like for every sensor we would:

transfer raw data to Roger
level 1 extractors convert raw data into GeoTIFF
we have process that scans a day of GeoTIFFs from a sensor and creates plot-level tiles
level 2+ extractors operate on plot GeoTIFFs
later on we remove original raw files and only keep plot-level

idea being that in step 3 we use a more generalized version of the stereoTop full field mosaic extractor that operates on geoTIFFs. Thus we funnel all extractors -> GeoTIFF format (we can still make JPGs for quick previewing). To kick things off:

PROS

scientists likely to care about plot-level analysis in most cases; easier to work with and understand what you're seeing
easier to compare same plot location across time/sensors w/ aligned imagery
reduce stitching/overlap handling required on data scientist side if we don't perform this clip for them, or reduce computational overhead if we otherwise would dynamically query plots from netCDF files (for example)
we could consistently organize data by sensor x time x plot across entire project structure, not just in geostreams/bety GIS databases (e.g. in Clowder and in filesystem directories)

CONS

muddles data organization somewhat; datasets as organized on Roger disk and in Clowder would no longer apply. instead each plot could include 2+ distinct timestamps that were stitched together to form the plot image you're seeing so we'd have:

raw_data/
  stereoTop/
    2016-02-02/
      2016-02-02_12-32-32-123 Range 23 Pass 6/
        files

Now, maybe if we apply givens about the gantry we could assume we wont have more than 1 capture per plot, per sensor, per minute, so we drop the milliseconds from the timestamp or something. or we sort by plot:

raw_data/
  stereoTop/
    2016-02-02/
      Range 23 Pass 6/
        2016-02-02_12-32-32-123 
          files

impact on storage plans. effectively plot tiles would become defacto "lowest level raw data" that is in the data release, so the actual raw imagery could be archived and deleted. this would be a complex process with all the issues JD and i face elsewhere in terms of making sure a full day of data is present before performing the tiling,
less flexibility if, theoretically, one wanted to query some arbitrary ground footprint later on that is not a specific plot boundary - if we dont implement mechanisms to dynamically query netCDF, for example. if we DON'T precompute the tiles, perhaps we can offer a dynamic way to chop things on the fly.

I think one way or another we need to provide imagery at the plot-level; either pre-clipped, or some means to quickly clip on demand (e.g. for plot-level extractors). so maybe a better question is, what's easier: a bunch of complexity to achieve the clipping above, or something like THREDDS to juggle big netCDFs on the fly? As I write this out, pre-clipping seems like a tall order, but maybe it saves us headaches later on... decided to just toss it on the table.

tagging @jdmaloney @jterstriep @yanliu-chn as FYI also.

The text was updated successfully, but these errors were encountered:

max-zilla · 2017-02-27T19:26:31Z

Note that the script I linked does NOT create plot-level tiles. Basic logic for mosaic extractor is:

convert stereoTop .BIN files to GeoTIFF (this shouldn't be necessary since demosaic already does that - we should update to make it check for those first)
get list of all GeoTIFFs inside the field bounds for one day, and create a VRT (virtual TIF) that combines them all
feed VRT to gdal2tiles_parallel.py which prepares for Google Maps global tiles

The theoretical plot clipping extractor would chop the VRT into new TIFs based on each polygon in our plot definition.

max-zilla · 2017-03-15T15:36:20Z

@robkooper @jdmaloney @yanliu-chn @dlebauer @czender @craig-willis

Just spoke with Rob about moving this forward. I basically suggested:

we have extractors that convert raw sensor images into standard geocoded GeoTIFF format
we have a work script we run on Roger filesystem directly outside Clowder, probably a worker with some hefty memory.
- loop across each sensor x day directory of GeoTIFFs
- stitch the day's images together into one full field image (using VRT, or streaming or something)
- clip full field image into images by plot
- create aggregate metadata for the new plot image (e.g. timestamp down to the minute it was captured, even if multiple captures are blended into one plot)
- organize data in Clowder by plot & date, once you get to single day collection you see a nice list of a single plot-level image for each sensor for that plot
- we no longer store raw data/metadata in Clowder - the plot-level GeoTIFFs are "base level" data products for outside world

Main questions/concerns:

What does this mean for data usage projections i.e. our 10 PB estimate? we need to get a basic sample of the code working on one of the datasets to see how big the plot images are - the stereoTop existing mosaic code isn't a perfect comparison
How quickly can we be confident we have all the images from a day to be able to stitch & chop? 48 hours? 36?
do we need a buffer around plot boundaries/is there concern about the non-overlap between polys in the shapefile? this could be both a boon in terms of overall file size for a day, and a negative if there is anything of interest in between:

Eyeballing it, it feels like 50% of the pixels for the width of the field go away if we just keep the plots.

czender · 2017-03-15T18:01:57Z

Mentioning @hmb1 since he is looking into efficient means of clipping netCDF to shapefiles...

max-zilla · 2017-03-15T18:22:11Z

@czender thanks. forgot to mention that I tagged you because I imagine the hyperspectral data might do this slightly differently than the others... e.g. maybe we bypass the GeoTIFF and modify your script to generate the .nc files per plot instead?

czender · 2017-03-15T18:39:33Z

That's one possibility. If you have an ESRI shapefile or geoTIFF for a plot for which there's a corresponding hyperspectral image, please send it our way for experimentation...

dlebauer · 2017-03-15T18:40:39Z

@max-zilla To answer some questions

What does this mean for data usage projections i.e. our 10 PB estimate?

we are well below the 10PB estimate at this point, and if we do get closer, we can eliminate some of the intermediate data products and then if we still need to save space we can use lossy compression / downscaling

How quickly can we be confident we have all the images from a day to be able to stitch & chop? 48 hours? 36?

Isn't this information available as the difference between collection and transfer dates? (or maybe the more important time range is the beginning to end of transfer of one day's data)

(ignoring outliers caused by transfer downtime)

do we need a buffer around plot boundaries/is there concern about the non-overlap between polys in the shapefile? this could be both a boon in terms of overall file size for a day, and a negative if there is anything of interest in between:

You are correct that ~ 50% of the plot goes away if you just keep the plots. This is because of four rows, only the two in the middle are kept because the outside rows are 'border rows'. While the border rows do provide useful information prior to canopy closure, we have decided to start with these plot boundaries #187. Although the discussion of plot definitions is ongoing (terraref/reference-data#60 (comment)) I suggest we move forward with these plot definitions and exclude the alleys and border rows.

I am still confused about the use of a shapefile. I know we have discussed this before, but I was under the impression that our workflows would be based on OGC compliant data formats, and that we would import the plot boundaries into the PostGIS database where it can be queried in geojson or WKT format.

dlebauer · 2017-03-15T18:40:42Z

@czender @hmb1 are these files sufficiently aligned to the x,y grid that ncks -d x,xmin,xmax -d y,ymin,ymax in.nc out.nc would be sufficient and efficient?

NewcombMaria · 2017-03-15T19:01:26Z

@dlebauer and @max-zilla the 4-row plot arrangement and the issue of the 2 outer rows of each 4-row plot being 'border rows' only applies to Sorghum-season2. Other seasons (Sorghum-season1, Wheat-season3, and the upcoming Sorghum-season4) are all planted as 2-row plots with both rows considered as data-collection rows. The big differences between 2-row plot and 4-row plot arrangements is why Roman suggested making single rows the smallest units and building 'plots' by combining the single-rows of interest.

max-zilla · 2017-03-15T19:21:46Z

@dlebauer we have the plot boundaries in PostGIS and can build on that, I just used the shapefile as representative of the geometry we're working with so I could include that little picture.

@NewcombMaria that is helpful, thanks.

czender · 2017-03-15T19:22:40Z

@dlebauer @hmb1 the rectangles above look rectangular to me so ncks will be efficient and fast on them. i am thinking of how best to use shapefiles at the "plant scale", e.g., how to mask the irregular regions of soil (or leaves) within a plot, as this seems like a useful capability for manipulating the hyperspectral data. we do not yet have a specific request like "can we get a netCDF file with valid values within this shapefile and _FillValues elsewhere", though it seems only a matter of time and we want to stay ahead of the curve...

dlebauer · 2017-03-15T21:37:53Z

@czender: @remotesensinglab is working on per-pixel classifications of sun/shade and plant/soil. That should be easier than irregular polygons (unlike plots, these irregular shapes will change in each image).

@NewcombMaria good point. We should re-visit Roman's proposal moving forward. For now we will keep these plot definitions for Sorghum Season 2.

czender · 2017-03-15T21:47:42Z

OK. then the ncks -d x,xmin,xmax -d y,ymin,ymax in.nc out.nc method should suffice for HS imagery once @FlyingWithJerome commits the branch with the correct downsampled coordinates...

dlebauer · 2017-03-15T21:50:20Z

PS what I envision is that the classification algorithm could add new boolean variables like 'sun_leaf' 'shade_soil', etc to the reflectance index data product. @remotesensinglab does that make sense?

remotesensinglab · 2017-03-16T01:54:44Z

Hi David,
Yes, that is what I have in mind as well. Thanks,

Wasit

max-zilla · 2017-03-16T14:25:01Z

@czender that min/max arg looks perfect. FWIW you can grab the shapefile from my image above here:
https://github.com/terraref/extractors-metadata/tree/master/sensorposition/shp/sorghumexpfall2016v5

...we can (and soon should) use geostreams API to get boundaries of each plot like so:

GET https://terraref.ncsa.illinois.edu/clowder/api/geostreams/sensors?sensor_name=Range 10 Pass 2

RESPONSE
[
  {
    "id": 288,
    "name": "Range 10 Pass 2",
    "created": "2017-02-02T18:04:52Z",
    "type": "Feature",
    "properties": {
      "type": {
        "id": "MAC Field Scanner",
        "title": "MAC Field Scanner",
        "sensorType": 4
      },
      "name": "Range 10 Pass 2",
      "popupContent": "Range 10 Pass 2",
      "region": "Maricopa"
    },
    "geometry": {
      "type": "Point",
      "coordinates": [
        -111.974990376,
        33.0748710865,
        0
      ]
    },
    "min_start_time": "2016-06-30T18:11:40Z",
    "max_end_time": "2016-12-31T21:58:08Z",
    "parameters": [
      "centroid",
      "file_ids",
      "fov",
      "sources"
    ]
  }
]

Right now it just has the centroid point in geometry but I have a script to run soon that'll store the polygon of each plot in these. Range 10 Pass 2 is the plot in this case.

max-zilla · 2017-03-16T14:31:15Z

As a first target of this effort, I would propose:

select a sensor that has working converter to GeoTIFF (https://docs.google.com/spreadsheets/d/1LLiQSFHbEWoo_FkvG1nl-XoSq_lZ_o6VazfxmPfOH9Y/edit#gid=1888797382) - FlirIR would be a good choice as it also serves FLIR - compute plot level means #207 directly
verify converter is embedding appropriate georeference information in GeoTIFF (Standardize geotiff / image metadata to be consistent w/ netcdf CF approach #268)
target one recent day for that sensor and develop script to stitch them together & clip by plot (the main work in early going)
generate a set of clipped plots for that day and check 'em out (any QA concerns, file size, preserved georeference, etc)
verify other GeoTIFF converters also compatible with script
run on one day for other sensor GeoTIFFs

@yanliu-chn do you have suggestions for who could write the script in item 3? Also tagging @ZongyangLi @solmazhajmohammadi as they haven't been mentioned in this thread yet

ghost · 2017-03-16T16:56:15Z

@solmazhajmohammadi , @remotesensinglab @Paheding @pless @ZongyangLi - what is the standard way of handling data for overlapping images? Is the data averaged?

max-zilla · 2017-03-16T17:00:05Z

@yanliu-chn suggests stitching together raw data as non-georeferenced image.

yanliu-chn · 2017-03-29T18:57:06Z

@ZongyangLi Yes.

yanliu-chn · 2017-03-29T19:00:11Z

@ZongyangLi let me explain. gantry->mac will get you the coord in meters; then you convert the coord to latlon (UTM12->EPSG:4326); then use the mac->usda formula to get the correct lat lon.

you can directly adjust the 2.5m in UTM12 along x and y direction if you know how many meters 0.000015258894 degrees of lat and 0.000020308287 degrees of lon are.

ZongyangLi · 2017-03-29T19:28:11Z

@max-zilla I have just uploaded a bounding box method with formula in the repo.

https://github.com/terraref/extractors-stereo-rgb/tree/master/demosaic

max-zilla · 2017-03-29T19:59:34Z

@ZongyangLi this looks close to correct based on the formula I applied above (original is the lower image, fixed is upper):

Again, for:

gantry_system_variable_metadata:
position x [m]: 207
position y [m]: 4.501
position z [m]: 0.631

location in camera box x [m]: 0.877
location in camera box y [m]: 2.276
location in camera box z [m]: 0.578

This seems to agree with our basic plot model:

# GANTRY GEOM (GANTRY CRS) ############
#			      N(x)             
#			      ^                   
#			      |                   
#			      |                   
#			      |                  
#      W(y)<------SE                  
#                                     
# NW: (207.3,	22.135,	5.5)      
# SE: (3.8,	0.0,	0.0)              
#######################################

North corner X is 207.3 and our gantry position is 207; since in this CRS X is along the north/south axis and increases as we move north, we expect to be just below the northern boundary.

Similarly, East Y is 0 (aka the rightmost boundary) and West Y is 22.135; ours is 4.5 so we expect to be roughly 1/5 of the way across the field.

So this looks reasonable, although I notice the left/right images look identical - does that surprise you?

I think next step is to trigger this new version of extractor on a whole day of datasets and bring them all into a map to see how it looks and start to work on stitching. I'm also going to look at the metadata work @craig-willis has done to see if we can pull the parameters in this script from a standard location instead of having them hardcoded.

max-zilla · 2017-03-30T14:13:09Z

@ZongyangLi @yanliu-chn I am generating a full day of demosaic TIFFs here:

/sites/ua-mac/Level_1/demosaic/2017-02-07/

there are ~4,200 of these to generate, might be completed by the time of the TERRA meeting but not sure.

max-zilla · 2017-03-30T16:29:28Z

@pless and @ZongyangLi can focus on the stitching component of this. getting one big GeoTIFF for Feb 7.

Field of view of each sensor is really important. @yanliu-chn formula does not take into account the FOV - we need to solve this problem. We need to have someone like @smarshall-bmr determine true field of view for each sensor. The stereo camera FOV parameter is not true, for example. @craig-willis indicates that Stuart has provided updated metadata for some sensors but not all.

@jterstriep has volunteered to handle the clipping to individual plot boundaries from that big GeoTIFF.

ghost · 2017-03-30T16:35:10Z

FOV issue is #126

max-zilla · 2017-03-31T19:34:09Z

@ZongyangLi @yanliu-chn @jterstriep @smarshall-bmr looks better w/ new calculations, but we can see where we need the FOV correction:

Four images of same basic part of field with different images turned on/off:

Image A:

Image A+B (see split in left tire tread)

Image A+B+C (see another split in center dirt)

Image A+C (gap between images despite areas that should overlap)

In phone discussion with Zongyang, thinking was that FOV correction would correct for this.

dlebauer · 2017-04-06T16:38:46Z

What is a robust, simple approach that we can use at this point for merging images?
Should we differentiate between images before and after canopy closure?

when plants are small and lots of ground is visible, adjust plot boundary for scanner height so that ground footprint is consistent
when canopy is fully closed, use plot boundary at canopy height?
Does it help if we stitch using both the image features and geospatial (field of view) information?

@solmazhajmohammadi could you please update this issue with a summary of your findings to date?

solmazhajmohammadi · 2017-04-06T17:37:12Z

Feature based registration without considering the z height, this is working for registration in small area. When the image area gets bigger, number of the features increases and it is hard to register the images after that. (due to false match)
I did a little bit research on this: ASIFT algorithm seams to be a solution for this problem. This is a great opensource work, which forces the algorithm to only consider affine transformation. So we dont get false arbitrary matches.
http://www.ipol.im/pub/art/2011/my-asift/article_lr.pdf

Paheding · 2017-04-06T19:32:29Z

@solmazhajmohammadi SIFT based algorithms are very robust to image translation, rotation, and scaling, however the SIFT might be still patented. Alternatively, we can use SURF method ( article:
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf). As for mismatched features, RANSAC http://www.cs.columbia.edu/~belhumeur/courses/compPhoto/ransac.pdf method can be used before feature extraction to eliminate outlier features, so that the false matches can be resolved.

solmazhajmohammadi · 2017-04-06T19:56:02Z

@Paheding take a look at the figure 26 on the paper I posted. I have not tried ASIFT in the big plot size images yet, I think this approach warranty the affine transformation between the features. When the number of the registered images get higher RANSAC didnot help on the false matches.

Paheding · 2017-04-06T20:58:49Z

@solmazhajmohammadi Correct, there are some drawbacks of RANSAC. Here is an optimized RANSAC algorithm https://pdfs.semanticscholar.org/6dc8/6d312ca1c5b18e53ecb98fa6b5fc1053e023.pdf, which tends to be better than the classic RANSAC. Source code: http://www.cb.uu.se/~aht/code.html. Combination of the optimized RANSAC and ASIFT would be more robust.
Besides, another quick solution could be make a threshold to restrict the number of registered images.

dlebauer · 2017-05-18T14:03:23Z

Is this ready to be closed? #306 covers some remaining issues.

max-zilla · 2017-05-18T14:07:17Z

Yes, I believe so - current plan is for @jterstriep to develop service to query plots on-demand instead of mass conversion ahead of time. There are other issues where this is being discussed.

dlebauer · 2017-05-18T14:25:32Z

@jterstriep could you link to or create the issue that covers the on demand plot subsetting?

max-zilla added the kind/discussion label Feb 27, 2017

max-zilla assigned dlebauer, robkooper and max-zilla Feb 27, 2017

max-zilla mentioned this issue Mar 15, 2017

FLIR - compute plot level means #207

Closed

7 tasks

dlebauer assigned jterstriep Mar 15, 2017

dlebauer added the help wanted label Mar 15, 2017

max-zilla mentioned this issue Mar 16, 2017

Solution for creation of new data products that depend on multiple inputs #248

Closed

3 tasks

max-zilla mentioned this issue Mar 16, 2017

Conversion of captured images to plot-level tiles #209

Closed

4 tasks

max-zilla added the cyberGIS label Mar 16, 2017

max-zilla assigned yanliu-chn Mar 16, 2017

max-zilla mentioned this issue Mar 16, 2017

Seraching for a set of data with the same gantry position(IR+3D Fusion) #224

Closed

ghost removed the help wanted label Mar 29, 2017

ghost added this to the March 2017 milestone Mar 29, 2017

max-zilla added the help wanted label Mar 30, 2017

ghost removed the help wanted label Mar 30, 2017

max-zilla mentioned this issue Apr 4, 2017

Mismatch between image size and meta-data field of view #126

Closed

2 tasks

ghost mentioned this issue Apr 4, 2017

Add 3D Scanner offset to sensor metadata terraref/reference-data#74

Closed

2 tasks

ghost mentioned this issue Apr 13, 2017

Plots and rows terraref/reference-data#114

Closed

dlebauer mentioned this issue May 3, 2017

How to link the BETYdb's traits information back to Globus's RGB data? terraref/reference-data#112

Closed

2 tasks

max-zilla added the help wanted label May 4, 2017

dlebauer mentioned this issue May 11, 2017

Review of full field stitched data #306

Closed

ghost modified the milestones: May 2017, March 2017 May 17, 2017

max-zilla closed this as completed May 18, 2017

ghost removed the help wanted label May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mass conversion of raw data to plot-level tiles #265

Mass conversion of raw data to plot-level tiles #265

max-zilla commented Feb 27, 2017

max-zilla commented Feb 27, 2017

max-zilla commented Mar 15, 2017 •

edited

Loading

czender commented Mar 15, 2017

max-zilla commented Mar 15, 2017

czender commented Mar 15, 2017

dlebauer commented Mar 15, 2017

dlebauer commented Mar 15, 2017

NewcombMaria commented Mar 15, 2017

max-zilla commented Mar 15, 2017

czender commented Mar 15, 2017

dlebauer commented Mar 15, 2017

czender commented Mar 15, 2017

dlebauer commented Mar 15, 2017

remotesensinglab commented Mar 16, 2017

max-zilla commented Mar 16, 2017 •

edited

Loading

max-zilla commented Mar 16, 2017 •

edited

Loading

ghost commented Mar 16, 2017

max-zilla commented Mar 16, 2017

yanliu-chn commented Mar 29, 2017

yanliu-chn commented Mar 29, 2017

ZongyangLi commented Mar 29, 2017

max-zilla commented Mar 29, 2017

max-zilla commented Mar 30, 2017

max-zilla commented Mar 30, 2017 •

edited

Loading

ghost commented Mar 30, 2017

max-zilla commented Mar 31, 2017

dlebauer commented Apr 6, 2017 •

edited

Loading

solmazhajmohammadi commented Apr 6, 2017

Paheding commented Apr 6, 2017

solmazhajmohammadi commented Apr 6, 2017

Paheding commented Apr 6, 2017

dlebauer commented May 18, 2017

max-zilla commented May 18, 2017

dlebauer commented May 18, 2017

Mass conversion of raw data to plot-level tiles #265

Mass conversion of raw data to plot-level tiles #265

Comments

max-zilla commented Feb 27, 2017

max-zilla commented Feb 27, 2017

max-zilla commented Mar 15, 2017 • edited Loading

czender commented Mar 15, 2017

max-zilla commented Mar 15, 2017

czender commented Mar 15, 2017

dlebauer commented Mar 15, 2017

dlebauer commented Mar 15, 2017

NewcombMaria commented Mar 15, 2017

max-zilla commented Mar 15, 2017

czender commented Mar 15, 2017

dlebauer commented Mar 15, 2017

czender commented Mar 15, 2017

dlebauer commented Mar 15, 2017

remotesensinglab commented Mar 16, 2017

max-zilla commented Mar 16, 2017 • edited Loading

max-zilla commented Mar 16, 2017 • edited Loading

ghost commented Mar 16, 2017

max-zilla commented Mar 16, 2017

yanliu-chn commented Mar 29, 2017

yanliu-chn commented Mar 29, 2017

ZongyangLi commented Mar 29, 2017

max-zilla commented Mar 29, 2017

max-zilla commented Mar 30, 2017

max-zilla commented Mar 30, 2017 • edited Loading

ghost commented Mar 30, 2017

max-zilla commented Mar 31, 2017

dlebauer commented Apr 6, 2017 • edited Loading

solmazhajmohammadi commented Apr 6, 2017

Paheding commented Apr 6, 2017

solmazhajmohammadi commented Apr 6, 2017

Paheding commented Apr 6, 2017

dlebauer commented May 18, 2017

max-zilla commented May 18, 2017

dlebauer commented May 18, 2017

max-zilla commented Mar 15, 2017 •

edited

Loading

max-zilla commented Mar 16, 2017 •

edited

Loading

max-zilla commented Mar 16, 2017 •

edited

Loading

max-zilla commented Mar 30, 2017 •

edited

Loading

dlebauer commented Apr 6, 2017 •

edited

Loading