-
Notifications
You must be signed in to change notification settings - Fork 13
Convert met data in clowder to netcdf #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Zodiase TODO: schedule a conversation with @robkooper on the browndog workflow and in pecan standard. To learn which lib or tools are avail for outputting the netcdf. |
@Zodiase - can you please update this issue? |
GeoStream message hooks for triggering the extractor? |
@czender Could you tell me how to save a JSON document into a netCDF file? The JSON document would look like:
|
@rachelshekar It makes more sense to manually trigger the processing job or have it triggered periodically with a cron job for example. |
@Zodiase, @robkooper had some thoughts on the geostreams database --> netcdf conversion, with respect to making this consistent with the PEcAn and BrownDog utilities. |
@Zodiase There is no generic JSON->netCDF converter. We custom wrote one for the Environmental Logger that might give you some ideas... |
@dlebauer |
@Zodiase @robkooper the environmental logger JSON->netCDF converter uses netcdf4-python: Many of the variables it handles are the same as met station data. This converter might be more complex than met stations would require, or it might not. Much of the "complexity" is needed to craft well-designed netCDF files. If you post a complete example of a met station JSON file, then @FlyingWithJerome could look at what it would take to adapt this code to work on those files as well. |
@robkooper @Zodiase If you have a sample JSON file, we can look at it together and figure out a best solution. |
@robkooper we discussed developing download.CLOWDER and met2cf.CLOWDER functions in the PEcAn.data.atmosphere package so that it will fit within the existing BrownDog / PEcAn met workflow (?). That has a lot of tools for unit conversion and standard formatting of netcdf files. Is it otherwise? |
@FlyingWithJerome The sample JSON is above here. The format is final except some additional minor fields added (such as the extractor info and a full copy of the raw data). @max-zilla When you are testing the first-stage extractor, could you get a copy of one output JSON document and post it here? |
@dlebauer thanks for the reminder. @Zodiase yes please look at PEcAn code and use R to download the data and process it. To do this there are 2 parts you will need to implement: download.clowder which just does a query and downloads the geostream to disk, see for example https://github.com/PecanProject/pecan/blob/master/modules/data.atmosphere/R/download.Ameriflux.R met2CF.clowder which converts the downloaded data to netcdf, for example see https://github.com/PecanProject/pecan/blob/master/modules/data.atmosphere/R/met2CF.Ameriflux.R#L78 |
@Zodiase can you give me write access to the extractors-meterological repo? Also we don't have to change this now but there's a typo in repo name (I updated filenames in my local branch im going to push):
|
@max-zilla I've granted write access to the developer group. You should be able to push now. |
@Zodiase do I have this right? there are two extractors for meteorological data being discussed:
So it looks like you've written the first one. I don't totally understand the second one, why is it dependent on the first one? I'll try to test out the first one - it's in the same place as the geospatial metadata extractor, where we currently only have PostGIS on the dev instance but hope to have time to deploy on production soon. |
@max-zilla This issue depends on the first one because it pulls the JSON data stored in the geostream and saves them to some netcdf file. The JSON data in geostream should align to PEcAn format standards (conversion done in the first extractor). If it's simpler to directly convert .dat files into netcdf then that's certainly better. I don't know. Wouldn't that repeat the same PEcAn standard conversion again? |
@Zodiase yeah, we wouldn't want to redo the PEcAn conversion. Could we just do both at the same time? Loop over 24 DAT files, send to datapoint for each line, while also saving to netCDF? Then at the end we did both the geostreams + netCDF at once? It doesn't make for perfectly modular extractor, but since the extractors would need to be chained together anyway (one depends on the other) I think that's OK. The PlantCV extractor is similar in that it generates metadata AND pushes metadata to BETYdb in same process. I have not looked closely at your code yet so maybe this is difficult for obvious reasons... |
@max-zilla Everything could be put into one extractor. I suggested splitting the whole objective into two issues so they could be implemented in parallel. But they both ended up assigned to me so it makes no difference. |
@Zodiase let's focus on the tech issue here (only). You said .dat->netcdf needs to get JSON data from geostream in order to convert .dat to netcdf? @max-zilla meant since the JSON data you send to geostream is already generated in the first extractor, we can use that to do .dat->netcdf conversion. So the only difference is whether we put the 2nd extractor logic inside of the first one since the first one has all the data needed. Am I right ? How much work is needed to do the extractor part of coding for .dat->netcdf conversion? |
@yanliu-chn @Zodiase I think we're on the same page here. I will test the current .dat -> geostream function on clowder-dev to make sure it works, since Xingchen was getting 502s. I'll commit my branch and that can be new 'base' version. Then, we can add .dat -> netcdf in that same extractor, building on top off working version. I should have an update on this tomorrow morning. |
@Zodiase just made a pull request, got it working on my local instance but haven't tested on clowder-dev yet. asked you to review it generates a LOT of datapoints :) |
@Zodiase what is the status of this issue? I think we discussed the solution in Dec. |
@dlebauer Yes and at the end of the discussion we agreed that if I could get the JSON data into R, it would take very little time for you to come up with the conversion algorithm (to netCDF). And I think my comment above shows how to get the data in JSON, and I pointed a R library that does both downloading and parsing of JSON together. Would it help if I show you a code example for downloading and parsing the JSON data? I didn't include it in my comment above thinking that you might know a better utility for doing that. |
@Zodiase got it. I think this is correct. But a few things before getting started
|
To get the sensor ID from a sensor name, you could use something like: Sample result for the aforementioned query: [
{
"id": 2,
"name": "UAMAC",
"created": "2016-09-27T15:04:15Z",
"type": "Feature",
"properties": {
"type": {
"id": "uamac",
"title": "uamac",
"sensorType": 6
},
"name": "UAMAC",
"popupContent": "UAMAC",
"region": "Arizona"
},
"geometry": {
"type": "Point",
"coordinates": [
-112.047642,
33.2918,
0
]
},
"min_start_time": null,
"max_end_time": null,
"parameters": [
null
]
}
] Similarly, stream searching could be done with [
{
"id": 14,
"name": "weather station",
"created": "2016-10-26T19:35:09Z",
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0,
0,
0
]
},
"sensor_id": "1",
"start_time": null,
"end_time": null,
"params": null
}
] I'm not sure if there's any data points on the dev instance for you to test with, but they should conform the schema indicated here: #156 (comment). |
Want to push data to geostreams, and have code to go from geostreams to PEcAn netcdf format. |
Here is a pull request for the download.clowder and met2cf.clowder functions into PEcAn PecanProject/pecan#1268 Next steps
|
@dlebauer - can this be closed? |
Yes, these functions have been added to the PEcAn.data.atmosphere package. https://github.com/PecanProject/pecan/tree/develop/modules/data.atmosphere/R Thanks @infotroph! |
All met and environmental data should be provided on ROGER, ideally via an opendap / thredds interface #155
This depends on #156
Should use similar BrownDog and leverage PEcAn project met workflow.
completion Criteria
The text was updated successfully, but these errors were encountered: