-
Notifications
You must be signed in to change notification settings - Fork 13
Extractors for meteorological data #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm a bit confused about the goal of this issue. From the description I see these independent threads:
(1) doesn't seem to be what an extractor should be responsible for; how is the extractor going to be triggered? For (2), what are the requirements for the processing? How is (3) related to extractors? I'm not sure how and where do I start. Any pointers? |
For MAC weather station
The API is documented here: https://terraref.ncsa.illinois.edu/clowder/assets/docs/api/index.html#!/datasets/addMetadata But ask @max-zilla, @robkooper and @caicai89- for details. The file format and variable names / units should follow specifications for PEcAn here: https://pecan.gitbooks.io/pecan-documentation/content/developers_guide/Adding-an-Input-Converter.html |
For schema see #130 Roughly, the schema looks like:
|
Here is an example of a record from one time point from here: https://greatlakesmonitoring.org/clowder/api/geostreams/datapoints?geocode=40.4868888889%2C-84.4817222222%2C0&since=2008-09-22+05%3A00%3A00&until=2014-07-03+19%3A00%3A00&format=json: {
id: 1863734,
created: "2014-11-04T00:48:22Z",
start_time: "2008-09-22T10:00:00Z",
end_time: "2008-09-22T10:00:00Z",
properties: {
source: "http://www.heidelberg.edu/sites/default/files/dsmith/files/ChickasawData.xlsx",
srp - load: 0.4982,
Silica,
mg / L: 9.09,
Sulfate,
mg / L: 261.2,
nitrogen - load: 0.03,
Chloride,
mg / L: 268.6,
phosphorus - load: 0.684,
SS,
mg / L(suspended solids): 12.3,
TKN,
mg / L(Total Kjeldahl nitrogen): 1.193
},
type: "Feature",
geometry: {
type: "Point",
coordinates: [-84.4817222222,
40.4868888889,
0
]
},
stream_id: "7263",
sensor_id: "899",
sensor_name: "Chickasaw"
}, |
Here is the geostreams schema |
Just keep in mind you do not have access to the database, all operations have to be done through the API. We discussed this in the past and the thinking is to have sites represented as sensors in clowder (ua-mac, ksu, etc). Then have each sensor represented as a stream (VNIR, MET, stereo) and finally have each dataset, or in this case the actual values be represented in the datapoints. |
@rachelshekar #115 just covers the environmental logger that is on the lemnatec gantry / scanner this is for met data more generally. Goal is to get it into a consistent format in Clowder then create an extractor that converts Clowder datastream to netcdf |
@robkooper I was mostly wanting the sql schema file to define the data model (since it is slightly different from the erd diagram above). |
goal is to insert data into postGIS and convert to netCDF via extractor |
@max-zilla Do you know what the extractor should subscribe to in order to monitor new met data files? I was thinking about subscribing to any new files in any dataset (with |
@robkooper Could you explain a bit more about where (a specific dataset?) the extractor should get data from and where the output should go to? |
@Zodiase for the weather station at UA-MAC outside the scanner, the extractor should get data from the 'weather' direcotory (terraref/sites/ua-mac/raw_data/weather/) and insert into the geostreams API. |
@dlebauer Do you know how to use |
You will need to register with clowder and say you are interested in a specific mimetype of files. The file will be downloaded and you are given a pointer to the file on disk. You can now work with that file and write the results in any location and notify clowder (using pyclowder) about this. This might be a good point to add some functionality to pyclowder2 to deal with geosrteams and make it easier for you. |
@robkooper I know about the overall process but I don't know how exactly shall I save the results back to clowder. How can I use I think what @dlebauer wants is to process those |
@Zodiase for the TERRA project our extractors have to be slightly more careful than others, because we want to write the output files to a specific location on Roger. However I don't think that matters here since we don't have output files, just insertion into geostreams database.
I know @caicai89- has been looking at geostreams API, and I need to update Clowder to support more complex geometries than points here - #157. I am not going to get to this until next week. |
@max-zilla this issue does not require inserting polygons ... are the other geostreams api endpoints available (I don't see them here: https://terraref.ncsa.illinois.edu/clowder/assets/docs/api/index.html. |
@dlebauer Could you help me understand the format of the First 7 lines of any
The data part looks like a typical CSV file and it looks like there are 11 columns. But what are the 4 lines above the data? Which one should I use as the column header? I tried to make sense of these 4 lines and it looks to me the second line should be the column header and the third line is unit? The first and the fourth make no sense to me. |
@Zodiase I am not positive, but I think this might help: https://www.manualslib.com/manual/538296/Campbell-Cr9000.html?page=43 Doesn't really explain the first line - I think that's just some information on the sensor/weather station that collected the data. If you look on page 42 of that link (the one before) I think it describes these, Station Name, Logger Serial Number, etc. The fourth line looks like a description of how data was collected:
|
That looks correct
|
@dlebauer So the extractor code I've worked on so far is able to be triggered and parse the raw input files without issues. Now the next step is to compose the JSON output you want. So what I understand is that such a data row |
{
"id": 12345,
"created": "2016-08-30 00:06:24 -08:00Z",
"start_time": "2016-08-30 00:06:24 -08:00Z",
"end_time": "2016-08-30 00:06:24 -08:00Z",
"properties": {
"source": "http://terraref.ncsa.illinois.edu/clowder/datasets/xyz123abc456",
"air_temperature, K": "285.12",
"relative_humidity, %": "27.37",
"surface_downwelling_shortwave_flux_in_air, W m-2":26.74,
"surface_downwelling_photosynthetic_photon_flux_in_air, mol m-2 s-1":"0.02674",
"wind_to_direction, degrees": 65,
"wind_speed, m/s":2.45,
"precipitation_rate, mm/s":0
},
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-111.975071584772
33.074518691823,
353.38
]
},
"stream_id": "123",
"sensor_id": "123",
"sensor_name": "UA-MAC F13 Weather Station"
},
|
I've implemented the aggregation logic and the code is currently in this branch: https://github.com/terraref/extractors-meterological/tree/5-min-aggregation @max-zilla Could you test it? I only added some testing code in To change aggregation options:
The test data I played with yields such result: [
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:06:24-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:10:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.6207870370370374,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":0.07488770951583902,
"relative_humidity":26.18560185185185,
"air_temperature":300.17606481481516,
"eastward_wind":1.571286062845733,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:10:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:15:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.4256666666666669,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-0.05141511827670856,
"relative_humidity":24.226333333333386,
"air_temperature":300.8981666666665,
"eastward_wind":1.394382855930334,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:15:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:20:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.3858783783783772,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-0.09425296463470188,
"relative_humidity":23.29226351351351,
"air_temperature":301.213952702703,
"eastward_wind":1.348590540556527,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:20:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:25:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":0.8310000000000005,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-0.35657497924484793,
"relative_humidity":22.633933333333335,
"air_temperature":301.50973333333326,
"eastward_wind":0.7049300737104702,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:25:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:30:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":0.6694000000000001,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-0.585180649157013,
"relative_humidity":25.478600000000007,
"air_temperature":301.2232333333329,
"eastward_wind":0.30741648387327564,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:30:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:35:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":0.6296666666666666,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-0.42173249926348644,
"relative_humidity":26.469933333333355,
"air_temperature":300.85969999999907,
"eastward_wind":0.45458948531155813,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:35:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:40:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":0.8663333333333328,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-0.6006981174489593,
"relative_humidity":24.133233333333333,
"air_temperature":300.97440000000034,
"eastward_wind":0.5790642074746596,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:40:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:45:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.1200666666666672,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-1.0444193473063164,
"relative_humidity":21.460900000000024,
"air_temperature":301.59006666666653,
"eastward_wind":0.3707760504240207,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:45:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:50:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.3106333333333342,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-1.249505862534591,
"relative_humidity":21.709133333333313,
"air_temperature":301.60549999999927,
"eastward_wind":0.38198168724184367,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:50:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:55:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.297633333333334,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-1.253133336504686,
"relative_humidity":21.457600000000024,
"air_temperature":301.69336666666703,
"eastward_wind":0.324976158201803,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:55:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T01:00:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.3804999999999998,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-1.3556587631934873,
"relative_humidity":21.25273333333331,
"air_temperature":301.7047000000008,
"eastward_wind":0.23843479144932786,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T01:00:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T01:05:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.5816666666666679,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-1.5763178363241495,
"relative_humidity":22.110499999999984,
"air_temperature":301.4501999999997,
"eastward_wind":0.11952470035541446,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T01:05:00-07:00",
"type":"Feature",
"end_time":"2016-08-30T01:08:23-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.682058823529412,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":-1.6719726912984594,
"relative_humidity":23.09779411764704,
"air_temperature":301.1047058823543,
"eastward_wind":-0.14332925981518027,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
}
] Notice all the data entries are in clean 5-minute chunks, except for the first one and the last one (since some data in other datasets may belong to the same 5-minute chunks. |
I'm pulling this code today while updating the extractor and will test, @Zodiase |
@max-zilla - please update |
@Zodiase I integrated your code and deployed to the extractor VM, but the hardware failure this week delayed my ability to test. Assigning this to myself so I can close once I confirm everything's good but it's 99% ready. |
Remember UIUC and Kansas, and Charlie may have code pull from netCDF already. |
@max-zilla How are datasets from UIUC and Kansas different from MAC datasets? Would the current message subscription ( |
@Zodiase i have not seen the UIUC/Kansas datasets yet.... I put that comment there as a note during the meeting, but i think the netCDF note was not for this specific extractor but for other met data we might see. |
Data flow should be raw --> geostreams --> netCDF The raw--> netCDF developed alongside the hyperspectral extractor is a special case The raw --> geostreams extractor may need special handling for each data source. We should open separate issues for each additional source. later we can write the geostreams to netCDF extractor and we will only need one. |
@robkooper @dlebauer this extractor raises a question given our discussions yesterday. If we want to store the geostream info by plot as discussed, we'll probably want a "plot' that covers the entire field as well somehow (or a marker to the side of the field) to indicate the met data is not specific to a plot, but instead to the entire location. I thought about adding the met datapoint to EVERY plot, but that would make the visualiztion of that metric busy and confusing I think. Having a synthetic "full field" plot that is not in the lookup shapefile (so things are only assigned to it if we engineer them to do so) could be handy for other reasons eventually also. |
I agree, having a plat that is the whole site for the met data should work. |
we have a plot that is the whole site that we can use.
…On Thu, Jan 26, 2017 at 4:31 PM, Rob Kooper ***@***.***> wrote:
I agree, having a plat that is the whole site for the met data should work.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAcX5ydvJ2aAlQnh9os_MIECe-eIBs-Kks5rWR66gaJpZM4JtN81>
.
|
This is complete and running with geostream component & 5-minute aggregations. We can use #173 to discuss the netCDF portion. |
Description
We have one script to process the environmental logger data.
Further Suggestions / Request for Feedback
@robkooper Can we use BrownDog / PEcAn infrastructure?
How to upload these files? They are small, not necessarily worth setting up Globus endpoint just for these, if they can be downloaded via FTP.
(Appended below the task list and useful information)
Tasks
*.dataset.files.added
).dat
files present)"TIMESTAMP"
- Convert to ISO 8601 (and use asstart_time
andend_time
)"RECORD"
- Discard"BattV"
- Discard"PTemp_C"
- Discard"AirTC"
- Convert to Kelvin"RH"
- Ensure unit is percent"Pyro"
- Direct use"PAR_ref"
- Direct use"WindDir"
- GivenWS_ms
is also present, convert intoeastward_wind
andnorthward_wind
."WS_ms"
- Ensure unit is meters per second"Rain_mm_Tot"
- Direct usestream_id
sensor_id
Assignsensor_name
sensor
,station
andstream
Raw data structure (
.dat
files)The text was updated successfully, but these errors were encountered: