Skip to content

Insert plot level height histogram into Clowder geostreams; height into BETYdb #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
ZongyangLi opened this issue Dec 8, 2016 · 67 comments
Closed
2 of 3 tasks
Assignees
Labels
bety/application sensor/laser3d issues relating to scanner3DTop laser scanner
Milestone

Comments

@ZongyangLi
Copy link
Contributor

ZongyangLi commented Dec 8, 2016

Description

We have scripts to generate plot level height histogram on Roger. The next step is to create a pipeline for this extractor.

Completion Criteria

  • @solmazhajmohammadi need to provide a parameter of "the position of the calibration object"
    • we are now making our assumptions of where is the (0,0,0) point in the field coordinate system.
    • from @solmazhajmohammadi we know that "We have used a special object to calibrate the point clouds. (0,0,0) point in the point cloud is somewhere in middle of the field, we will provide the transformation matrix to the gantry coordinate system in metadata"
  • make sure the correlation between hand measurements and predictions is high enough
    • correlation in 9/8, 9/15, 9/22, 9/29 is around 90% with our assumptions
    • we found that in 8/31, almost all of plots have a same highest height level at lv. 83, it seems point above lv. 83 was 'disappeared'.
  • add to pipeline
    • insert histogram into geostreams database as attribute
    • need a set of raw data from clowder as input, like Create full field stitched mosaic  #85
    • insert estimate of canopy_height into BETYdb with appropriate method
@dlebauer
Copy link
Member

@ZongyangLi @rmgarnett @pless

For this extractor, I would suggest that we write the summary stats (histogram) into the metadata and insert a few statistics into BETYdb. For example, we have inserted a trait called '95th quantile height'.

But the key trait from the point cloud is the height estimate calibrated to field measurements. This trait will have the same name as the trait that Maria measured, i.e. 'canopy_height'. I think it would make sense for this extractor to use the calibrated model that Roman developed in #175.

@dlebauer
Copy link
Member

@rmgarnett what are the (slope, intercept) parameters from the model in #175?

When estimating height at the plot level, can we also estimate uncertainty?

@rmgarnett
Copy link

[hand height] = 28.2cm + 0.661 * [89th height percentile]

The RMSE/MAE gives a rough estimate of L2/L1 uncertainty. I will do a more thorough analysis in January now that all height distributions are extracted.

@rmgarnett rmgarnett reopened this Dec 22, 2016
@dlebauer
Copy link
Member

@rmgarnett I suspect RMSE scales with height?

From your plot it is hard to tell how the data are distributed b/c of overlapping points. But I gather strongly right-skewed. I wonder if log transforming x and y would be appropriae, if it would more evenly weight the smaller values. The small plants are important too!

img_2204

@ghost ghost added sensor/laser3d issues relating to scanner3DTop laser scanner laser 3D bety/application labels Jan 3, 2017
@ZongyangLi
Copy link
Contributor Author

@dlebauer
I have got all height distribution data for season 2 from 8/8 to 11/25, and I created 90th and 95th height percentile csv file, according to @rmgarnett 's research.
90th percentile
95th percentile
Scanner3DTop data in Season 2 is much better than those in Season 1, but still data from 10/13 to 11/04 are unexpected, there are just a few points in those days ply files.

I am wondering if point cloud files might be fixed in those days, if not, what's your opinion of putting them into BETYdb.

@dlebauer
Copy link
Member

dlebauer commented Jan 5, 2017

@solmazhajmohammadi could you please check into whether we can recover useful data from 10/13 to 11/04?

@ZongyangLi we need to discuss with @rmgarnett about how to implement this extractor.

@dlebauer
Copy link
Member

@rmgarnett have you made any progress on adding uncertainty?

@rmgarnett
Copy link

rmgarnett commented Jan 10, 2017 via email

@dlebauer
Copy link
Member

@ZongyangLi you can go ahead and insert the data that you have. We can create another issue for adding uncertainty to the height calculations (moving forward this should be done by default ... )

@solmazhajmohammadi
Copy link

@dlebauer @ZongyangLi, for the data from 10/13 to 11/04, the png files have not been collected correctly, but we can get the height information from the scans that it is done at ~5m

@solmazhajmohammadi
Copy link

@smarshall-bmr can you please scan the checker boards to find the pointcloud origin?

@ZongyangLi
Copy link
Contributor Author

@solmazhajmohammadi, are you saying to estimate the plot level height base on the highest points in the remaining 3d data? That might be different from what we have done before, because we are using all point cloud data to create a height histogram and calculate quantiles data to make predictions.

@solmazhajmohammadi
Copy link

@ZongyangLi This could be an option, otherwise the data has been collected with a wrong setting, so we are not able to recover it.

@rmgarnett
Copy link

I have been reinvestigating the hand measurements using @ZongyangLi's most-recent data. The final model may differ from what's written above, but it will be the same form. I presume the extractor will be easy to modify if we wish to change the model slightly?

@dlebauer
Copy link
Member

dlebauer commented Jan 13, 2017 via email

@rmgarnett
Copy link

Perfect.

@ZongyangLi
Copy link
Contributor Author

@dlebauer
When I do that, I receive below error report:
"mean": "31.505", "local_datetime": "2016-08-08T12:00:00", "access_level": "2", "error": "bad date specification; see error output", "species": { "scientificname": "Sorghum bicolor" },
"site": { "sitename": "MAC Field Scanner Field Plot 1603 Season 2", "error": "match not found" },
"method": { "name": "height Estimation from 3D Scanner using formula: [hand height] = 28.2cm + 0.661 * [89th height percentile]", "error": "match not found" },

@dlebauer
Copy link
Member

dlebauer commented May 8, 2017

@ZongyangLi

  1. The date looks correct - what is the error output?
  2. The sitename needs to match a record in the database terraref.ncsa.illinois.edu/bety/sites
  3. the method name needs to match a name in the methods table terraref.ncsa.illinois.edu/bety/methods

@ZongyangLi
Copy link
Contributor Author

@dlebauer

  1. the error is 'bad data specification', I think it's because of error 2 and error 3.
  2. This is a plot number definition problem I mentioned here: Insert plot level height histogram into Clowder geostreams; height into BETYdb #210 (comment), here I upload a totally 1728 plots data, maybe this makes error 2.
  3. the method name comes from you comment here: Insert plot level height and / or percent cover into BETYdb #193 (comment), I failed to access the methods table.

@dlebauer
Copy link
Member

dlebauer commented May 9, 2017

  1. Hmm. error 1 is the only one that says 'see error output'. @gsrohde any ideas?
  2. For season 1, Plot names for the 1728 plots are the same as for the 864 plots, but with E, W appended. We don't have the same 1728 plots specified for season 2; I can create the 1728 plots for Season 2 or we could go with the 864 plots.
  3. sorry I was not clear. for the method, please create a new record here: https://terraref.ncsa.illinois.edu/bety/methods/new. Can you access that page?

@ZongyangLi
Copy link
Contributor Author

@dlebauer I successfully create a new method called: 'Scanner 3d ply data to height', and error 3 goes away. Since the 1728 plots still not work, I used the 864 plots definition to finish the operation.

One another error comes out:
"model_validation_errors": "{:mean=>[\"The value of mean for the height trait must be at most 120.\"]}",

By the way, @solmazhajmohammadi gave us the origin in point cloud data last week, I am creating new height estimate data using the new metadata.

Also, we are creating algorithm to use stereo top image to do the 3d reconstruction, and trying to use this stereo 3d model to recover the missing height data from 2016/10/11 to 2016/11/04, once we get the new available data, I will resend them to BETYdb.

@dlebauer
Copy link
Member

dlebauer commented May 9, 2017

@ZongyangLi thanks,

  • the error you observed is caused by the fact that each variable has a valid range, and in the case of 'height' the valid range is [0, 120]. And the variable height has units in meters (as you can tell, these ranges are generally very generous; the tallest known tree is 115m).
  • however, I think that we should be using the variable `canopy_height, since this is the variable that Maria recorded, and I assume that these are the data used to fit the model (please confirm)
  • See also Merge plant_height, panicle_height, and spike_height reference-data#118 for disambiguating terms related to height.

In summary:

  • if the model is trained only on Maria's canopy_height measurements, we should call the output canopy_height.
  • if it is trained on both canopy_height and either panicle_height or its synonym spike_height, we should call the variable plant_height (plant_height = max(canopy_height, panicle_height).
  • I am assuming that the algorithm does not differentiate plants with or without fruit.

@gsrohde
Copy link

gsrohde commented May 10, 2017

@dlebauer and @ZongyangLi The message should probably say "See error list" instead. The error list contains the key-value pair date_data_errors=>[\"You can't have a local_datetime attribute on a trait-group's defaults element if no default site has been specified.\"]. But this itself is unclear except when an XML file, not a CSV file, is being uploaded. And I think the error itself is bogus in this case. I'm making an issue for this.

@dlebauer
Copy link
Member

dlebauer commented May 11, 2017

  • change to plant_height

@ghost ghost modified the milestones: May 2017, February 2017 May 17, 2017
@ghost ghost added the help wanted label Jun 15, 2017
@ghost
Copy link

ghost commented Jun 15, 2017

Currently this extractor generates two numpy files:

ls /projects/arpae/terraref/sites/ua-mac/Level_1/scanner3DTop_plant_height/2017-05-06/2017-05-06__05-05-58-934
scanner3DTop - 2017-05-06__05-05-58-934 highest.npy  scanner3DTop - 2017-05-06__05-05-58-934 histogram.npy

In #303 we mention pushing the max value to geostreams - this isn't implemented yet.

Additionally we should convert the .npy files to an image or geotiff.

@ghost ghost removed the help wanted label Jun 15, 2017
@ZongyangLi
Copy link
Contributor Author

@max-zilla I have just uploaded recent codes to https://github.com/terraref/extractors-3dscanner/tree/master/plant_height
And I know it's really a mess code, I need to take some time to make it much more beautiful.

@max-zilla
Copy link
Contributor

@ZongyangLi I ran a PLY file from feb through the plant_height extractor - I'm making a pull request to show @nickheyek that writes .tif instead of .npy and pushes some values to geostreams.

The "highest" numpy array looks like this for sample file:

>>> f = r"/Users/mburnette/globus/111fb573-a351-4bf8-9594-aedddc53a850__Top-heading-east_0.ply"                       
>>> plydata = PlyData.read(f)
>>> hist, highest = full_day_to_histogram.gen_height_histogram(plydata, False)
>>> highest
array([[ 1051.20507812],
       [ 1042.48278809],
       [ 1005.7802124 ],
       [ 1016.98535156],
       [ 1013.16113281],
       [ 1053.45422363],
       [ 1031.78356934],
       [ 1024.53540039],
       [ 1019.98809814],
       [ 1020.35986328],
       [ 1007.30383301],
       [  948.19873047],
       [  958.54162598],
       [  989.56195068],
       [    0.        ],
       [    0.        ]])
>>> len(highest)
16

What exactly do these 16 numbers represent? I see the "hist" array is also length 16, but each of its members is itself an array:

>>> hist
array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]])
>>> len(hist)
16

If I were to summarize the "highest" numbers into geostreams, what would you recommend? Something like this is hard to interpret:

{
"highest": [[ 1051.20507812],
       [ 1042.48278809],
       [ 1005.7802124 ],
       [ 1016.98535156],
       [ 1013.16113281],
       [ 1053.45422363],
       [ 1031.78356934],
       [ 1024.53540039],
       [ 1019.98809814],
       [ 1020.35986328],
       [ 1007.30383301],
       [  948.19873047],
       [  958.54162598],
       [  989.56195068],
       [    0.        ],
       [    0.        ]]
}

Can I select the maximum number of this array as a simple "tallest plant in this plot" number?

@max-zilla max-zilla self-assigned this Jun 22, 2017
@ZongyangLi
Copy link
Contributor Author

  • What exactly do these 16 numbers represent?
    Every ply file supposes to cover one row between east boundary and west boundary in the field. At the very beginning, field were divided into 16 column, these 16 numbers represent for the highest point in the column, and the unit is mm. Please use the most recent updated function named
    gen_height_histogram_for_Roman(plydata, scanDirection, out_dir, sensor_d, center_position)
    This will consider a east or west side sensor, and add the newest origin offset in both side, and make the ground plane as hist[0], and use a 32 column definition.

With these numbers recorded, I would integrate all scanned result to create a full field highest and histogram file, using the according json file.

  • And what for the 'hist'?
    hist records the height distribution, 16 lengths of hist has the same definition as 'highest'. For 400 numbers in each element, it is bins to record how many points in that height level. you can treat col 0 as ground plane in the field, each height level equals to 1 cm in the field. So number(for example 5) in the 400 bin means there are 5 points in height level 400, which means 4 meters in the real world.

  • Recommended way of summarizing data:

  1. use 'gen_height_histogram_for_Roman' to create hist and highest array, use json file to determine the plot number for each bin, then change to whatever unit that might fit the database.
  2. Notice that each plot might made up of several ply files, so you might create several different number for each plot for one day by a separated process.
  3. create different quantiles data with 'hist' for each plot and use @rmgarnett 's formula to create an estimate height. reference codes: https://github.com/terraref/extractors-3dscanner/blob/master/plant_height/draw_field_scanned_in_grid.py#L195

@dlebauer
Copy link
Member

dlebauer commented Jun 22, 2017

Would it make sense to use the same workflow as we are using with other image data, i.e.:

  • full field stitch (optional ...)
  • plot subset
  • analysis,

so that each result represents a single plot?

@max-zilla
Copy link
Contributor

max-zilla commented Jun 22, 2017

i believe we've discussed stitching point clouds into plots - I will try to track down comments from @solmazhajmohammadi

  • if we simply merge, there will be messy stuff where two passes overlap
  • we could also just pick one if two overlap
  • or, we merge plot histograms afterwards

convert point cloud origin to gantry coordinate system before merging, otherwise they will all be on the same spot. pdal transformation matrix

@max-zilla max-zilla modified the milestones: July 2017, May 2017 Jun 22, 2017
@dlebauer
Copy link
Member

dlebauer commented Jul 6, 2017

  • I don't think that the files overlap (after R and L ply are merged into a single las)
  • I think using the stitch then subset approach is okay.

@solmazhajmohammadi can you confirm that independent passes do not overlap each other.

@dlebauer
Copy link
Member

dlebauer commented Jul 6, 2017

I think we can close this issue and open a new one with low priority that will deal with full field merge then split.

@solmazhajmohammadi
Copy link

@dlebauer to confirm the independencies between passes, we need to apply the transformation to all the ply files from a single day and merge them together.
The transformation is available in Issue 44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bety/application sensor/laser3d issues relating to scanner3DTop laser scanner
Projects
None yet
Development

No branches or pull requests

6 participants