-
Notifications
You must be signed in to change notification settings - Fork 13
Mass conversion of raw data to plot-level tiles #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that the script I linked does NOT create plot-level tiles. Basic logic for mosaic extractor is:
The theoretical plot clipping extractor would chop the VRT into new TIFs based on each polygon in our plot definition. |
@robkooper @jdmaloney @yanliu-chn @dlebauer @czender @craig-willis Just spoke with Rob about moving this forward. I basically suggested:
Main questions/concerns:
|
Mentioning @hmb1 since he is looking into efficient means of clipping netCDF to shapefiles... |
@czender thanks. forgot to mention that I tagged you because I imagine the hyperspectral data might do this slightly differently than the others... e.g. maybe we bypass the GeoTIFF and modify your script to generate the .nc files per plot instead? |
That's one possibility. If you have an ESRI shapefile or geoTIFF for a plot for which there's a corresponding hyperspectral image, please send it our way for experimentation... |
@max-zilla To answer some questions
we are well below the 10PB estimate at this point, and if we do get closer, we can eliminate some of the intermediate data products and then if we still need to save space we can use lossy compression / downscaling
Isn't this information available as the difference between collection and transfer dates? (or maybe the more important time range is the beginning to end of transfer of one day's data) (ignoring outliers caused by transfer downtime)
You are correct that ~ 50% of the plot goes away if you just keep the plots. This is because of four rows, only the two in the middle are kept because the outside rows are 'border rows'. While the border rows do provide useful information prior to canopy closure, we have decided to start with these plot boundaries #187. Although the discussion of plot definitions is ongoing (terraref/reference-data#60 (comment)) I suggest we move forward with these plot definitions and exclude the alleys and border rows. I am still confused about the use of a shapefile. I know we have discussed this before, but I was under the impression that our workflows would be based on OGC compliant data formats, and that we would import the plot boundaries into the PostGIS database where it can be queried in geojson or WKT format. |
@dlebauer and @max-zilla the 4-row plot arrangement and the issue of the 2 outer rows of each 4-row plot being 'border rows' only applies to Sorghum-season2. Other seasons (Sorghum-season1, Wheat-season3, and the upcoming Sorghum-season4) are all planted as 2-row plots with both rows considered as data-collection rows. The big differences between 2-row plot and 4-row plot arrangements is why Roman suggested making single rows the smallest units and building 'plots' by combining the single-rows of interest. |
@dlebauer we have the plot boundaries in PostGIS and can build on that, I just used the shapefile as representative of the geometry we're working with so I could include that little picture. @NewcombMaria that is helpful, thanks. |
@dlebauer @hmb1 the rectangles above look rectangular to me so ncks will be efficient and fast on them. i am thinking of how best to use shapefiles at the "plant scale", e.g., how to mask the irregular regions of soil (or leaves) within a plot, as this seems like a useful capability for manipulating the hyperspectral data. we do not yet have a specific request like "can we get a netCDF file with valid values within this shapefile and _FillValues elsewhere", though it seems only a matter of time and we want to stay ahead of the curve... |
@czender: @remotesensinglab is working on per-pixel classifications of sun/shade and plant/soil. That should be easier than irregular polygons (unlike plots, these irregular shapes will change in each image). @NewcombMaria good point. We should re-visit Roman's proposal moving forward. For now we will keep these plot definitions for Sorghum Season 2. |
OK. then the |
PS what I envision is that the classification algorithm could add new boolean variables like 'sun_leaf' 'shade_soil', etc to the reflectance index data product. @remotesensinglab does that make sense? |
Hi David, Wasit |
@czender that min/max arg looks perfect. FWIW you can grab the shapefile from my image above here: ...we can (and soon should) use geostreams API to get boundaries of each plot like so:
Right now it just has the centroid point in geometry but I have a script to run soon that'll store the polygon of each plot in these. Range 10 Pass 2 is the plot in this case. |
As a first target of this effort, I would propose:
@yanliu-chn do you have suggestions for who could write the script in item 3? Also tagging @ZongyangLi @solmazhajmohammadi as they haven't been mentioned in this thread yet |
@solmazhajmohammadi , @remotesensinglab @Paheding @pless @ZongyangLi - what is the standard way of handling data for overlapping images? Is the data averaged? |
@yanliu-chn suggests stitching together raw data as non-georeferenced image. |
@ZongyangLi Yes. |
@ZongyangLi let me explain. gantry->mac will get you the coord in meters; then you convert the coord to latlon (UTM12->EPSG:4326); then use the mac->usda formula to get the correct lat lon. you can directly adjust the 2.5m in UTM12 along x and y direction if you know how many meters 0.000015258894 degrees of lat and 0.000020308287 degrees of lon are. |
@max-zilla I have just uploaded a bounding box method with formula in the repo. https://github.com/terraref/extractors-stereo-rgb/tree/master/demosaic |
@ZongyangLi this looks close to correct based on the formula I applied above (original is the lower image, fixed is upper):
This seems to agree with our basic plot model:
North corner X is 207.3 and our gantry position is 207; since in this CRS X is along the north/south axis and increases as we move north, we expect to be just below the northern boundary. Similarly, East Y is 0 (aka the rightmost boundary) and West Y is 22.135; ours is 4.5 so we expect to be roughly 1/5 of the way across the field. So this looks reasonable, although I notice the left/right images look identical - does that surprise you? I think next step is to trigger this new version of extractor on a whole day of datasets and bring them all into a map to see how it looks and start to work on stitching. I'm also going to look at the metadata work @craig-willis has done to see if we can pull the parameters in this script from a standard location instead of having them hardcoded. |
@ZongyangLi @yanliu-chn I am generating a full day of demosaic TIFFs here:
there are ~4,200 of these to generate, might be completed by the time of the TERRA meeting but not sure. |
@pless and @ZongyangLi can focus on the stitching component of this. getting one big GeoTIFF for Feb 7. Field of view of each sensor is really important. @yanliu-chn formula does not take into account the FOV - we need to solve this problem. We need to have someone like @smarshall-bmr determine true field of view for each sensor. The stereo camera FOV parameter is not true, for example. @craig-willis indicates that Stuart has provided updated metadata for some sensors but not all. @jterstriep has volunteered to handle the clipping to individual plot boundaries from that big GeoTIFF. |
FOV issue is #126 |
@ZongyangLi @yanliu-chn @jterstriep @smarshall-bmr looks better w/ new calculations, but we can see where we need the FOV correction: Four images of same basic part of field with different images turned on/off: Image A+B (see split in left tire tread) Image A+B+C (see another split in center dirt) Image A+C (gap between images despite areas that should overlap) In phone discussion with Zongyang, thinking was that FOV correction would correct for this. |
What is a robust, simple approach that we can use at this point for merging images?
@solmazhajmohammadi could you please update this issue with a summary of your findings to date? |
Feature based registration without considering the z height, this is working for registration in small area. When the image area gets bigger, number of the features increases and it is hard to register the images after that. (due to false match) |
@solmazhajmohammadi SIFT based algorithms are very robust to image translation, rotation, and scaling, however the SIFT might be still patented. Alternatively, we can use SURF method ( article: |
@Paheding take a look at the figure 26 on the paper I posted. I have not tried ASIFT in the big plot size images yet, I think this approach warranty the affine transformation between the features. When the number of the registered images get higher RANSAC didnot help on the false matches. |
@solmazhajmohammadi Correct, there are some drawbacks of RANSAC. Here is an optimized RANSAC algorithm https://pdfs.semanticscholar.org/6dc8/6d312ca1c5b18e53ecb98fa6b5fc1053e023.pdf, which tends to be better than the classic RANSAC. Source code: http://www.cb.uu.se/~aht/code.html. Combination of the optimized RANSAC and ASIFT would be more robust. |
Is this ready to be closed? #306 covers some remaining issues. |
Yes, I believe so - current plan is for @jterstriep to develop service to query plots on-demand instead of mass conversion ahead of time. There are other issues where this is being discussed. |
@jterstriep could you link to or create the issue that covers the on demand plot subsetting? |
in brief discussion between @dlebauer and I last Friday, we talked pros & cons of just splitting all our imagery data into plot tiles, like for every sensor we would:
idea being that in step 3 we use a more generalized version of the stereoTop full field mosaic extractor that operates on geoTIFFs. Thus we funnel all extractors -> GeoTIFF format (we can still make JPGs for quick previewing). To kick things off:
PROS
CONS
Now, maybe if we apply givens about the gantry we could assume we wont have more than 1 capture per plot, per sensor, per minute, so we drop the milliseconds from the timestamp or something. or we sort by plot:
I think one way or another we need to provide imagery at the plot-level; either pre-clipped, or some means to quickly clip on demand (e.g. for plot-level extractors). so maybe a better question is, what's easier: a bunch of complexity to achieve the clipping above, or something like THREDDS to juggle big netCDFs on the fly? As I write this out, pre-clipping seems like a tall order, but maybe it saves us headaches later on... decided to just toss it on the table.
tagging @jdmaloney @jterstriep @yanliu-chn as FYI also.
The text was updated successfully, but these errors were encountered: