Skip to content

Apparent issue with timestamp handling & solar position calculations #1237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PeterGrant opened this issue May 20, 2021 · 4 comments · Fixed by #1285
Closed

Apparent issue with timestamp handling & solar position calculations #1237

PeterGrant opened this issue May 20, 2021 · 4 comments · Fixed by #1285
Labels
Milestone

Comments

@PeterGrant
Copy link

In my project I'm examining air temperature and solar data, so my following comments apply to those values. They do not apply to other data returned by the model.

The solar position algorithm appears to ignore timezone in timezone aware timestamps when calculating irradiance data. Specifically, the .get_processed_data() function returns data with mixed handling of timezones. The dataframe returned has the timestamp in the format "UTC+tz". The ambient air temperature data matches this timestamp, with the highest temperature around, in local time, 17:00 and the minimum temperature around 05:00. The solar data does not match this timestamp, with irradiation beginning around, in UTC, 07:00 and ending around 19:00. This corresponds to irradiation data being > 0 from midnight to noon in local time.

I'm not sure if this is a bug or if it's me not understanding how timesteps are handled in pvlib-python, but the forecast I'm getting from .get_processed_data() does not appear to be correct.

To Reproduce
You can replicate this by running the code in the docs/tutorials/forecast.ipynb notebook. Steps:

  1. Run the first two setup cells to import as needed and input the location and time information.
  2. Run the first cell in the HRRR section to initialize the HRRR model (fm = HRRR())
  3. Run the fourth cell in the HRRR section to get processed data (data = fm.get_processed_data(latitude, longitude, start, end))
  4. Run the 9th cell in the HRRR section to print the processed, sorted data (data[sorted(data.columns)]).

The output will appear as follows:

Tutorial_Output

The data shows solar irradiation data from 07:00-07:00 to 19:00-07:00. This makes sense if the timestamp is printed in local time, and does not if the timestamp is printed in UTC. The air temperature data shows a minimum at 12:00-07:00 and a maximum at 22:00-07:00. This makes sense if the timestamp is in UTC and does not if the data timestamp is in local time.

The following description, code, and solution were provided by Cliff Hansen @ Sandia:

I think there is a bug in this line:

self.time = pd.DatetimeIndex(pd.Series(times), tz=self.location.tz)

This code produces the output Peter showed, with the air temperature and GHI patterns offset.

import pandas as pd
from pvlib.forecast import HRRR

latitude = 32.2
longitude = -110.9
tz = 'America/Phoenix'

start = pd.Timestamp('2021-05-19 00:00:00', tz=tz)
end = start + pd.Timedelta(days=7) # 7 days

fm = HRRR()
data = fm.get_processed_data(latitude, longitude, start, end)
data[sorted(data.columns)]

plt.plot(data['temp_air']*30)
plt.plot(data['ghi'])

Replacing that line with

self.time = pd.DatetimeIndex(pd.Series(times), tz=self.location.tz)

    self.time = pd.DatetimeIndex(pd.Series(times)).tz_localize('UTC').tz_convert(self.location.tz)

produces time traces that make sense.

@MichaelHopwood
Copy link
Contributor

Thanks for the catch @PeterGrant!
@cwhanse do you add this bug fix in the next milestone?

@cwhanse
Copy link
Member

cwhanse commented May 27, 2021

Thanks for the catch @PeterGrant!
@cwhanse do you add this bug fix in the next milestone?

Yes, but I'd like confirmation first from someone else that it is a bug. I'm not smart enough to look directly at the forecast data and verify that the problem is with how pvlib is handling the timezone.

@kandersolar
Copy link
Member

I do not use pvlib.forecast myself, but I poked around the forecast netcdf and agree that it seems like a bug -- timestamps are local time but should be UTC:

import pandas as pd
from pvlib.forecast import HRRR

latitude = 32.2
longitude = -110.9
tz = 'America/Phoenix'
start = pd.Timestamp('2021-07-19 00:00:00', tz=tz)
end = start + pd.Timedelta(days=3)

fm = HRRR()
data = fm.get_data(latitude, longitude, start, end, close_netcdf_data=False)
print(fm.netcdf_data['time'].units)  # 'Hour since 2021-07-18T00:00:00Z'
print(fm.netcdf_data['time'][0][0])  # first value is 31 -> first timestamp should be 2021-07-19T07:00:00Z
print(data.index[0])  # 2021-07-19 07:00:00-07:00 (local, not utc)

Is this issue specific to HRRR? If so, the fix proposed above might not be appropriate, as I think it would affect all forecast providers, not just HRRR.

@spaceguy152
Copy link

@kanderso-nrel I think it affects other models as well. Recently I was working with GFS and found the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants