Skip to content

Add extractor for data from EnvironmentLogger --> geostreams database #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dlebauer opened this issue Feb 2, 2017 · 77 comments
Closed
Assignees
Milestone

Comments

@dlebauer
Copy link
Member

dlebauer commented Feb 2, 2017

  • The data are transferred outside the core lemnatec pipeline -> JD, where are these?
  • Otherwise the conversion should be similar.
@Zodiase
Copy link
Contributor

Zodiase commented Feb 2, 2017

@dlebauer Who would be the person that knows where the data are and what is the schema/format of them?

@jdmaloney
Copy link
Contributor

jdmaloney commented Feb 2, 2017

@Zodiase If you have access to Roger or the Terraref Globus Endpoint, the files are located at:
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/weather --> from Roger perspective
or
/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/weather --> Globus perspective

I'm not sure of schema/format, but that is where the files are if you want to look at them.

@Zodiase
Copy link
Contributor

Zodiase commented Feb 9, 2017

@dlebauer @jdmaloney I roughly scanned the data files and noticed an inconsistency in file size/data point count and in some cases the naming conventions.

For example, 2017-01-26/WeatherStation_SecData_2017_01_26_1802.dat is more than 10 times the size of 2017-01-27/WeatherStation_SecData_2017_01_27_0915.dat. And while most folders contain files named WeatherStation_SecData_*_*_*_*.dat, some folders only contain one file WeatherStation_SecData.dat, but then there is file 2016-08-09/WeatherStation_SecData.dat sitting next to 2016-08-09/WeatherStation_SecData_2016_08_09_2234.dat and some others.

Is there a pattern to follow here? For the previous met extractor I wrote, it relies on the datasets having 24 partitioned .dat files containing non-overlapping data.

@jdmaloney
Copy link
Contributor

@Zodiase I think there was a change in naming format there. 2016_08_09 was early on in the data collection and I think its naming scheme should be ignored, it's only for those first few days (new schema starts at 2016_08_10) and has been consistent since then. In terms of the size variance I am not sure. I'm just grabbing the files off the Network Storage device they have down at Maricopa, how the files are generated and what goes into them I do not know.

@dlebauer
Copy link
Member Author

dlebauer commented Feb 9, 2017

@Andrade-Pedro can you comment on this? I believe this is the second met station you set up. Also can you give approximate coordinates

@dlebauer
Copy link
Member Author

dlebauer commented Feb 9, 2017

@Andrade-Pedro I think JD answered but coordinates would be nice to have

@Andrade-Pedro
Copy link

Ground weather station coordinates: 33.074457 N, 111.975163 W

@Zodiase
Copy link
Contributor

Zodiase commented Feb 16, 2017

@jdmaloney Wait I just noticed the data files you pointed to me has already been handled by the first Met extractor in #156. The path you gave me is exactly the same as the one in this comment: #156 (comment)

@max-zilla
Copy link
Contributor

@jdmaloney what data is David talking about "transferred outside core lemnatec pipeline"? We move the weather station data via pipeline. Does he mean the lightning data that we're doing the append-to-single-file stuff?

@dlebauer
Copy link
Member Author

dlebauer commented Feb 16, 2017 via email

@jdmaloney
Copy link
Contributor

@Zodiase Here is the path to the other data:

/gpfs/largeblockFS/projects/arpae/terraref/sites/ua-mac/raw_data/EnvironmentLogger

@Zodiase
Copy link
Contributor

Zodiase commented Feb 22, 2017

Source File Schema

Document (with example values)

{
  "environment_sensor_fixed_infos": {
    "par_sensor": {
      "manufacturer": "www.apogeeinstruments.com",
      "model": "SQ214",
      "location in gantry system": "top of gantry"
    },
    "co2_sensor": {
      "sensor manufacturer": "Vaisala",
      "model": "Carbocap CO2 Probe GMP343 A1C1B0N0N0B",
      "sensor serial number": "L3420008",
      "additional info": "SO 5530060878",
      "calbration date": "2015.08.18",
      "location in gantry system": "camera box",
      "analog digital interface": "WAGO 750-478"
    },
    "weather_station": {
      "manufacturer": "www.thiesclima.com",
      "article nr": "4.9200.00.000",
      "analog digital interface": "wago 750-478",
      "location in gantry system": "top of gantry"
    },
    "spectrometer": {
      "manufacturer": "www.oceanoptics.com",
      "model": "STS-VIS",
      "location in gantry system": "top of gantry"
    }
  },
  "environment_sensor_readings": Array.<SensorReading>
}

SensorReading

{
  "timestamp": "2017.01.10-00:32:02",
  "weather_station": {
    "sunDirection": {
      "value": "357.5829340495",
      "unit": "degrees",
      "rawValue": "9.93285927915281"
    },
    "airPressure": {
      "value": "1017.8899502548",
      "unit": "hPa",
      "rawValue": "8.38038270210883"
    },
    "brightness": {
      "value": "1.0364391003",
      "unit": "kilo Lux",
      "rawValue": "0.00183111056855983"
    },
    "relHumidity": {
      "value": "75.0328073977",
      "unit": "relHumPerCent",
      "rawValue": "7.50328073976867"
    },
    "temperature": {
      "value": "11.542710654",
      "unit": "DegCelsius",
      "rawValue": "5.15427106540117"
    },
    "windDirection": {
      "value": "315.9984130375",
      "unit": "degrees",
      "rawValue": "8.77773369548631"
    },
    "precipitation": {
      "value": "0.0406736656",
      "unit": "mm/h",
      "rawValue": "0.00396740623187964"
    },
    "windVelocity": {
      "value": "1.5326395459",
      "unit": "m/s",
      "rawValue": "0.255439924314097"
    }
  },
  "sensor par": {
    "value": "0",
    "unit": "umol/(m^2*s)",
    "rawValue": "4"
  },
  "sensor co2": {
    "value": "480.039067269",
    "unit": "ppm",
    "rawValue": "11.6806250763033"
  },
  "spectrometer": {
    "maxFixedIntensity": "16383",
    "integration time in us": "5000",
    "wavelength": Array.<Number>,
    "spectrum": Array.<Number>
  }
}

@Zodiase
Copy link
Contributor

Zodiase commented Feb 22, 2017

Conversion

For each in $.environment_sensor_readings:

$.timestamp
> $.start_time

$.weather_station.airPressure(hPa)
> $.properties.air_pressure(Pa)

$.weather_station.relHumidity(relHumPerCent)
> $.properties.relative_humidity(%)

$.weather_station.temperature(DegCelsius)
> $.properties.air_temperature(K)

$.weather_station.windDirection(degrees) + $.weather_station.windVelocity(m/s)
> $.properties.eastward_wind(m/s) + $.properties.northward_wind(m/s)

$.weather_station.precipitation(mm/h)
> $.properties.precipitation_flux(kg m-2 s-1)

@Zodiase
Copy link
Contributor

Zodiase commented Feb 22, 2017

@dlebauer I've listed the conversion I'm sure about. There are a couple other fields I'm not sure about. Could you provide a mapping like you did in #156 (comment)?

@dlebauer
Copy link
Member Author

dlebauer commented Feb 22, 2017

@Zodiase

  • sunDirection --> zenith_angle (degree) @czender can you please confirm?
  • for wind velocity --> if you are converting to north and east vectors make sure to use appropriate trigonometry
  • sensor par --> surface_downwelling_photosynthetic_photon_flux_in_air (mol m-2 s-1)
  • sensor co2 --> mole_fraction_of_carbon_dioxide_in_air (umol/mol)
    • the units of this can be 'umol / mol' or 'ppm'

For the spectrometer fields, these will be used in the hyperspectral workflow.

@dlebauer dlebauer changed the title Add extractor for data from second MAC met station --> geostreams database Add extractor for data from EnvironmentLogger --> geostreams database Feb 22, 2017
@dlebauer
Copy link
Member Author

@czender and @FlyingWithJerome where did we leave off with the creation of the Environmental logger files as netcdf? Should we leave those in netcdf files? If so, is there an issue (or should I create a new one) to make those as CF standard names?

@dlebauer dlebauer added this to the February 2017 milestone Feb 22, 2017
@FlyingWithJerome
Copy link
Member

@dlebauer I doubled checked the Environmental Logger Script in the computing pipeline, and the answer to the naming issue is yes. There is a TODO token in the line 252 for standard names.

@Zodiase @czender
If you would like to know how to convert the units, you may check the Environmental Logger Script. There is a large global dictionary "_UNIT_DICTIONARY" for you to refer to.

@czender
Copy link
Contributor

czender commented Feb 23, 2017

@Zodiase sunDirection is computed (not measured) from time/location. it is not zenith angle. i'm not sure exactly what it is yet. the sensor documentation does not describe it. and the angle takes values that exceed 90 degrees during daytime, so it is neither a zenith angle nor an altitude. please leave it as sunDirection until we understand it. @solmazhajmohammadi or @TinoDornbusch do you know how thiesclima defines sunDirection?

@czender
Copy link
Contributor

czender commented Feb 23, 2017

@FlyingWithJerome please note that the environmental logger script was moved to a new repository many months ago: https://github.com/terraref/extractors-environmental
It does not make sense to maintain the version in computing-pipeline. Please work with new repository and deprecate the old.

@FlyingWithJerome
Copy link
Member

@czender Yes, I had already cloned the new repo

@ghost
Copy link

ghost commented Feb 23, 2017

New functionality from PiClowder 2 will help with this. Xingchen will follow up with Max about this.

@ghost ghost removed the help wanted label Feb 23, 2017
@robkooper
Copy link
Member

I would prefer to use auth with username/password. That way it gets assigned to a user instead of the system user (which is what happens when you use the key).

@Zodiase
Copy link
Contributor

Zodiase commented Mar 9, 2017

I'm reading the existing environmental logger extractor code. Since I'm not that familiar with Python so I might have missed something, but so far from this portion I don't think that extractor is doing any unit conversion other than renaming the units to CF standards. For example the conversion from Celsius to Kelvin. When I was working on the meterological extractor I took it as a requirement to convert all values to CF variable names and units suggested in this table, using "preferred variables" as much as possible (which means breaking windVelocity and windDirection into eastward_wind and northward_wind. So far I don't see the environmental logger extractor code doing any of these. Did I miss anything?

@max-zilla
Copy link
Contributor

@Zodiase I believe you are correct in both cases - the conversion is for the key only (not the value), and it needs to be both like the met extractor.

@FlyingWithJerome
Copy link
Member

@max-zilla @Zodiase
You are correct. Numerical correction on these values will come with geostreaming in the next patch.

@FlyingWithJerome
Copy link
Member

FlyingWithJerome commented Mar 9, 2017

@max-zilla
But in this way, the API key will be hard coded in the driver (aka terra_environmentlogger.py)?

@czender
Copy link
Contributor

czender commented Mar 9, 2017

@Zodiase thanks for reporting this. A 273 degree temperature bias is worth fixing. @FlyingWithJerome please make sure to convert value not just units. Double-check to make sure there are no missing factors of 10, 100, 1000 missing from the values of any converted quantity. Please note any discrepancies between our data names/standard_names and the table Xingchen points to so we can discuss these Monday.

@FlyingWithJerome
Copy link
Member

FlyingWithJerome commented Mar 10, 2017

@max-zilla
The Geostream API had worked fine yesterday, but today it raises HTTPError 500 again; is it under maintenance right now?

@max-zilla
Copy link
Contributor

@FlyingWithJerome the postgres service was restarted, Clowder needed to be restarted as well to reconnect. Should work now.

@FlyingWithJerome
Copy link
Member

@max-zilla

I double checked the API documentation in the clowder and cannot find the parts for Geostream. I would like to know the the host address I can POST to, and whether it supports JSON-Encoded data (seems that it does not support Python json.dumps() encoder).

@max-zilla
Copy link
Contributor

@FlyingWithJerome here are two examples of extractors that post to Geostreams:
https://github.com/terraref/extractors-meterological/blob/master/datparser/terra_met_datparser.py#L244
https://github.com/terraref/extractors-metadata/blob/master/sensorposition/terra_sensorposition.py#L162

See create_sensor, get_sensor_id, create_stream, get_stream_id also.

Basically for geostreams...

  1. "sensor" is top level - ignore the word sensor because it's confusing (from another project). For TERRA, each "sensor" is a plot in the field.
  2. "stream" is each instrument x each plot. so if Plot="Range 1 Pass 1" then a stream might be "VNIR - Range 1 Pass 1".
  3. "datapoint" is a single measurement at a single point in time, e.g. reflectance. these go into a stream.

Big idea is we have a field of plots (1) and each plot has a list of instruments (2) that each have a bunch of datapoints (3) indicating what that instrument captured in that plot.

@FlyingWithJerome
Copy link
Member

@max-zilla

Thanks, these two examples make more sense to me. I will double check if I have the pyclowder dependencies on my home machines and do some little researches on them

@max-zilla
Copy link
Contributor

@FlyingWithJerome make sure you use the newer PyClowder 2: https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse

python setup.py install --force

The --force will overwrite any pyclowder 1 files if you already had them installed.

@dlebauer
Copy link
Member Author

@FlyingWithJerome there is some confusion on this side - have you taken on this issue? If so, that is great. Or is there anything that @Zodiase can help with?

One other thought - instead of creating one file per day, can this netcdf file just be appended in time, perhaps creating a new one each year?

@FlyingWithJerome
Copy link
Member

@dlebauer
The unit conversion issue had already been fixed; I'm right now polishing the code based on Craig's feedback and adding geostream to the pipeline. Max had provided me some useful references on Geostream API.

@ghost
Copy link

ghost commented Apr 4, 2017

@FlyingWithJerome - can you give an update on this issue? Can it be closed?

@FlyingWithJerome
Copy link
Member

@max-zilla
Hi Max, I would like to know how to fill in the "geom," "type," and "region" for create_sensor(), since the Environmental Logger has no region, type or geometry. If I had all of them as None, I will get a 400 Bad Request.
BTW, "connector" can be just a random Connector instance?

@FlyingWithJerome
Copy link
Member

@rachelshekar I'm working on this; will be finished in a not-so-far future

@max-zilla
Copy link
Contributor

max-zilla commented Apr 5, 2017

@FlyingWithJerome for the met extractor, we are using a location off to the side of the field (on the building nearby) to represent "full field" data:

https://github.com/terraref/extractors-meterological/blob/master/datparser/terra_met_datparser.py

def process_message(self, connector, host, secret_key, resource, parameters):
self.sensor_name = "Full Field"
main_coords = [ -111.974304, 33.075576, 0]

sensor_id = get_sensor_id(host, secret_key, self.sensor_name)
		if not sensor_id:
			sensor_id = create_sensor(host, secret_key, self.sensor_name, {
				"type": "Point",
				# These are a point off to the right of the field
				"coordinates": main_coords
			})


stream_name = self.sensor_name + " - Weather Station"
		stream_id = get_stream_id(host, secret_key, stream_name)
		if not stream_id:
			stream_id = create_stream(host, secret_key, sensor_id, stream_name, {
				"type": "Point",
				"coordinates": main_coords
			})

I would replace "Weather Station" with EnvironmentLogger but otherwise you can use the same Full Field sensor we're using for the weather data.

The connector is passed into the process_message method for you.

@FlyingWithJerome
Copy link
Member

FlyingWithJerome commented Apr 5, 2017

@max-zilla
In PyClowder2 the create_sensor() takes 7 arguments, but you only pass 4?

def create_sensor(connector, host, key, sensorname, geom, type, region):
    ...

@max-zilla
Copy link
Contributor

@FlyingWithJerome ah yes, didn't realize the example function was custom before pyClowder 2 had geostreams.

body = {
		"name": name,
		"type": "point",
		"geometry": {
				"type": "Point",
				# These are a point off to the right of the field
				"coordinates": main_coords
			},
		"properties": {
			"popupContent": name,
			"type": {
				"id": "MAC Field Scanner",
				"title": "MAC Field Scanner",
				"sensorType": 4
			},
			"name": name,
			"region": "Maricopa"
		}
	}

So for using PyC2:

geom = {
"type": "Point",
# These are a point off to the right of the field
"coordinates": main_coords
}

type = {
"id": "MAC Field Scanner",
"title": "MAC Field Scanner",
"sensorType": 4
}

region = "Maricopa"

@FlyingWithJerome
Copy link
Member

@max-zilla
These are extremely helpful! Thanks!

@dlebauer
Copy link
Member Author

dlebauer commented Apr 5, 2017

Just to be clear - the EnvirontmentLogger and WeatherStation should be listed as distinct locations (sensors).

  • The WeatherStation coordinates are 33.074457 N, 111.975163 W Add extractor for data from EnvironmentLogger --> geostreams database #252 (comment)
  • The EnvironmentLogger is on top of the gantry system so it is moving around. For the EnvironmentLogger I think it would be better to have a polygon that circumscribes the gantry than to have a point off to the side (though for mapping purposes it might be preferable to keep it off to the side).

@max-zilla
Copy link
Contributor

@dlebauer noted. we can change that location/geometry at any time fortunately, will make a note.

@FlyingWithJerome
Copy link
Member

@dlebauer
I think the EL records have the weather station data, but has no position data (i.e., seems that we cannot trace the EL Logger position like what we do on hyperspectral cameras). Only the weather station in EL records is in a fixed position, and every sensors else (sensor par, sensor co2, etc.) are movable?

@max-zilla
Copy link
Contributor

@FlyingWithJerome i think that's correct - that's why David suggests using a polygon to represent the entire field or something else off to the side if there are overlap issues. We can experiment with that a bit later.

@FlyingWithJerome
Copy link
Member

@max-zilla
Sounds like a plan. I will finish the geostream stuff these days and send to David and you for review

@ghost
Copy link

ghost commented Apr 26, 2017

@FlyingWithJerome can this issue be closed?

@max-zilla
Copy link
Contributor

@rachelshekar yes I merged @FlyingWithJerome 's pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants