Skip to content

Convert met data in clowder to netcdf #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
dlebauer opened this issue Sep 22, 2016 · 38 comments
Closed
3 tasks done

Convert met data in clowder to netcdf #173

dlebauer opened this issue Sep 22, 2016 · 38 comments
Assignees
Milestone

Comments

@dlebauer
Copy link
Member

dlebauer commented Sep 22, 2016

All met and environmental data should be provided on ROGER, ideally via an opendap / thredds interface #155

This depends on #156

Should use similar BrownDog and leverage PEcAn project met workflow.

completion Criteria

  • Add functions to PEcAn
    • download.CLOWDER
    • met2CF.Clowder
@ghost ghost added this to the September 2016 milestone Sep 23, 2016
@ghost ghost assigned max-zilla Sep 23, 2016
@ghost ghost added 1 - Ready and removed 1 - Ready labels Sep 23, 2016
@ghost ghost assigned yanliu-chn and Zodiase and unassigned max-zilla Sep 29, 2016
@yanliu-chn
Copy link

@Zodiase TODO: schedule a conversation with @robkooper on the browndog workflow and in pecan standard. To learn which lib or tools are avail for outputting the netcdf.

@ghost
Copy link

ghost commented Oct 19, 2016

@Zodiase - can you please update this issue?

@ghost
Copy link

ghost commented Oct 19, 2016

GeoStream message hooks for triggering the extractor?

@Zodiase
Copy link
Contributor

Zodiase commented Oct 26, 2016

@czender Could you tell me how to save a JSON document into a netCDF file? The JSON document would look like:

{
  "stream_id":"???",
  "sensor_id":"???",
  "end_time":"2016-08-30T00:06:24",
  "created":"2016-10-13T16:55:03.295514",
  "geometry":{
    "type":"Point",
    "coordinates":[
      "???",
      "???",
      "???"
    ]
  },
  "start_time":"2016-08-30T00:06:24",
  "sensor_name":"???",
  "type":"Feature",
  "id":"???",
  "properties":{
    "precipitation_rate":0.0,
    "wind_speed":2.45,
    "surface_downwelling_shortwave_flux_in_air":0.0,
    "northward_wind":1.0354147412647137,
    "relative_humidity":27.48,
    "air_temperature":299.89,
    "eastward_wind":2.2204540782397926,
    "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
  }
}

@Zodiase
Copy link
Contributor

Zodiase commented Oct 26, 2016

@rachelshekar It makes more sense to manually trigger the processing job or have it triggered periodically with a cron job for example.

@dlebauer
Copy link
Member Author

@Zodiase, @robkooper had some thoughts on the geostreams database --> netcdf conversion, with respect to making this consistent with the PEcAn and BrownDog utilities.

@czender
Copy link
Contributor

czender commented Oct 27, 2016

@Zodiase There is no generic JSON->netCDF converter. We custom wrote one for the Environmental Logger that might give you some ideas...

@Zodiase
Copy link
Contributor

Zodiase commented Oct 27, 2016

@dlebauer
@robkooper and I had talked and if I remember correctly the only conclusion is to use netCDF4-python for operating netCDF files. But I know next-to-nothing about netCDF format and don't know how the source data should be stored. I just learned the term "Dataset", "Dimension" and "Group" but I have yet any idea what they are used for. Could you or somebody give me an idea how exactly the data should be saved into a netCDF? I think it would be a lot more efficient for somebody who's at least familiar with netCDF to come up with the exact code to store the source data into netCDF files.

@czender
Copy link
Contributor

czender commented Oct 27, 2016

@Zodiase @robkooper the environmental logger JSON->netCDF converter uses netcdf4-python:

https://github.com/terraref/extractors-environmental/blob/master/environmentlogger/environmental_logger_json2netcdf.py

Many of the variables it handles are the same as met station data. This converter might be more complex than met stations would require, or it might not. Much of the "complexity" is needed to craft well-designed netCDF files. If you post a complete example of a met station JSON file, then @FlyingWithJerome could look at what it would take to adapt this code to work on those files as well.

@FlyingWithJerome
Copy link
Member

@robkooper @Zodiase
Sorry, the environmental_logger_json2netcdf.py in the computing-pipeline master branch is the most up-to-date.

If you have a sample JSON file, we can look at it together and figure out a best solution.

@dlebauer
Copy link
Member Author

@robkooper we discussed developing download.CLOWDER and met2cf.CLOWDER functions in the PEcAn.data.atmosphere package so that it will fit within the existing BrownDog / PEcAn met workflow (?). That has a lot of tools for unit conversion and standard formatting of netcdf files. Is it otherwise?

@Zodiase
Copy link
Contributor

Zodiase commented Oct 27, 2016

@FlyingWithJerome The sample JSON is above here. The format is final except some additional minor fields added (such as the extractor info and a full copy of the raw data).

@max-zilla When you are testing the first-stage extractor, could you get a copy of one output JSON document and post it here?

@robkooper
Copy link
Member

@dlebauer thanks for the reminder. @Zodiase yes please look at PEcAn code and use R to download the data and process it. To do this there are 2 parts you will need to implement:

download.clowder which just does a query and downloads the geostream to disk, see for example https://github.com/PecanProject/pecan/blob/master/modules/data.atmosphere/R/download.Ameriflux.R

met2CF.clowder which converts the downloaded data to netcdf, for example see https://github.com/PecanProject/pecan/blob/master/modules/data.atmosphere/R/met2CF.Ameriflux.R#L78

@max-zilla
Copy link
Contributor

@Zodiase can you give me write access to the extractors-meterological repo?

Also we don't have to change this now but there's a typo in repo name (I updated filenames in my local branch im going to push):

extractors-meterological
vvv
extractors-meteorological

@Zodiase
Copy link
Contributor

Zodiase commented Oct 27, 2016

@max-zilla I've granted write access to the developer group. You should be able to push now.

@max-zilla
Copy link
Contributor

@Zodiase do I have this right? there are two extractors for meteorological data being discussed:

  • read 24 .dat files and add each record to geostream
  • convert the .dat files to netcdf (?)

So it looks like you've written the first one. I don't totally understand the second one, why is it dependent on the first one?

I'll try to test out the first one - it's in the same place as the geospatial metadata extractor, where we currently only have PostGIS on the dev instance but hope to have time to deploy on production soon.

@Zodiase
Copy link
Contributor

Zodiase commented Nov 1, 2016

@max-zilla This issue depends on the first one because it pulls the JSON data stored in the geostream and saves them to some netcdf file. The JSON data in geostream should align to PEcAn format standards (conversion done in the first extractor). If it's simpler to directly convert .dat files into netcdf then that's certainly better. I don't know. Wouldn't that repeat the same PEcAn standard conversion again?

@max-zilla
Copy link
Contributor

@Zodiase yeah, we wouldn't want to redo the PEcAn conversion.

Could we just do both at the same time? Loop over 24 DAT files, send to datapoint for each line, while also saving to netCDF? Then at the end we did both the geostreams + netCDF at once?

It doesn't make for perfectly modular extractor, but since the extractors would need to be chained together anyway (one depends on the other) I think that's OK. The PlantCV extractor is similar in that it generates metadata AND pushes metadata to BETYdb in same process.

I have not looked closely at your code yet so maybe this is difficult for obvious reasons...

@Zodiase
Copy link
Contributor

Zodiase commented Nov 1, 2016

@max-zilla Everything could be put into one extractor. I suggested splitting the whole objective into two issues so they could be implemented in parallel. But they both ended up assigned to me so it makes no difference.

@yanliu-chn
Copy link

@Zodiase let's focus on the tech issue here (only). You said .dat->netcdf needs to get JSON data from geostream in order to convert .dat to netcdf? @max-zilla meant since the JSON data you send to geostream is already generated in the first extractor, we can use that to do .dat->netcdf conversion.

So the only difference is whether we put the 2nd extractor logic inside of the first one since the first one has all the data needed. Am I right ?

How much work is needed to do the extractor part of coding for .dat->netcdf conversion?

@max-zilla
Copy link
Contributor

@yanliu-chn @Zodiase I think we're on the same page here.

I will test the current .dat -> geostream function on clowder-dev to make sure it works, since Xingchen was getting 502s. I'll commit my branch and that can be new 'base' version.

Then, we can add .dat -> netcdf in that same extractor, building on top off working version.

I should have an update on this tomorrow morning.

@max-zilla
Copy link
Contributor

max-zilla commented Nov 2, 2016

@Zodiase just made a pull request, got it working on my local instance but haven't tested on clowder-dev yet. asked you to review

it generates a LOT of datapoints :)

@ghost ghost added the meterological data label Jan 3, 2017
@ghost ghost modified the milestones: December 2016, January 2017 Jan 12, 2017
@dlebauer
Copy link
Member Author

dlebauer commented Feb 2, 2017

@Zodiase what is the status of this issue? I think we discussed the solution in Dec.

@Zodiase
Copy link
Contributor

Zodiase commented Feb 2, 2017

@dlebauer Yes and at the end of the discussion we agreed that if I could get the JSON data into R, it would take very little time for you to come up with the conversion algorithm (to netCDF). And I think my comment above shows how to get the data in JSON, and I pointed a R library that does both downloading and parsing of JSON together. Would it help if I show you a code example for downloading and parsing the JSON data? I didn't include it in my comment above thinking that you might know a better utility for doing that.

@dlebauer
Copy link
Member Author

dlebauer commented Feb 3, 2017

@Zodiase got it. I think this is correct. But a few things before getting started

  1. The test data / example api query you posted above no longer works (because I think the database was rebuilt).
  2. The first step will be to find the sensor_id given the sitename (sensor name?) can you post the example queries for this?

@Zodiase
Copy link
Contributor

Zodiase commented Feb 3, 2017

@dlebauer

To get the sensor ID from a sensor name, you could use something like: https://terraref.ncsa.illinois.edu/clowder-dev/api/geostreams/sensors?sensor_name=UAMAC, the returned JSON should always be an array/list of 0 or more matched items. Each item should have an id property and a name property. When working on the Meteorological extractor I remember @max-zilla saying that the searching part isn't much important since in production these sensors will be manually created by somebody, and the IDs should be fixed.

Sample result for the aforementioned query:

[
  {
    "id": 2,
    "name": "UAMAC",
    "created": "2016-09-27T15:04:15Z",
    "type": "Feature",
    "properties": {
      "type": {
        "id": "uamac",
        "title": "uamac",
        "sensorType": 6
      },
      "name": "UAMAC",
      "popupContent": "UAMAC",
      "region": "Arizona"
    },
    "geometry": {
      "type": "Point",
      "coordinates": [
        -112.047642,
        33.2918,
        0
      ]
    },
    "min_start_time": null,
    "max_end_time": null,
    "parameters": [
      null
    ]
  }
]

Similarly, stream searching could be done with https://terraref.ncsa.illinois.edu/clowder-dev/api/geostreams/streams?stream_name=weather%20station, and the result for this example should look like:

[
  {
    "id": 14,
    "name": "weather station",
    "created": "2016-10-26T19:35:09Z",
    "type": "Feature",
    "properties": {},
    "geometry": {
      "type": "Point",
      "coordinates": [
        0,
        0,
        0
      ]
    },
    "sensor_id": "1",
    "start_time": null,
    "end_time": null,
    "params": null
  }
]

I'm not sure if there's any data points on the dev instance for you to test with, but they should conform the schema indicated here: #156 (comment).

@ghost ghost modified the milestones: January 2017, February 2017 Feb 13, 2017
@dlebauer dlebauer assigned harshagrawal28 and unassigned yanliu-chn Mar 2, 2017
@ghost ghost unassigned Zodiase Mar 9, 2017
@ghost ghost added the help wanted label Mar 16, 2017
@max-zilla
Copy link
Contributor

Want to push data to geostreams, and have code to go from geostreams to PEcAn netcdf format.

@dlebauer
Copy link
Member Author

dlebauer commented Mar 16, 2017

Here is a pull request for the download.clowder and met2cf.clowder functions into PEcAn PecanProject/pecan#1268

Next steps

  1. determine how to integrate into PEcAn met workflow (deal with this in PEcAn PR Clowder geostreams met data to CF download and convert functions in Python  PecanProject/pecan#1268) (@robkooper and @dlebauer)
  2. make Environmental sensor and met stations consistent with this
  • datalogger --> geostreams --> netcdf (rather than datalogger --> geostreams + netcdf in one script)

@ghost
Copy link

ghost commented May 17, 2017

@dlebauer - can this be closed?

@dlebauer
Copy link
Member Author

Yes, these functions have been added to the PEcAn.data.atmosphere package. https://github.com/PecanProject/pecan/tree/develop/modules/data.atmosphere/R

Thanks @infotroph!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants