You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my project I'm examining air temperature and solar data, so my following comments apply to those values. They do not apply to other data returned by the model.
The solar position algorithm appears to ignore timezone in timezone aware timestamps when calculating irradiance data. Specifically, the .get_processed_data() function returns data with mixed handling of timezones. The dataframe returned has the timestamp in the format "UTC+tz". The ambient air temperature data matches this timestamp, with the highest temperature around, in local time, 17:00 and the minimum temperature around 05:00. The solar data does not match this timestamp, with irradiation beginning around, in UTC, 07:00 and ending around 19:00. This corresponds to irradiation data being > 0 from midnight to noon in local time.
I'm not sure if this is a bug or if it's me not understanding how timesteps are handled in pvlib-python, but the forecast I'm getting from .get_processed_data() does not appear to be correct.
To Reproduce
You can replicate this by running the code in the docs/tutorials/forecast.ipynb notebook. Steps:
Run the first two setup cells to import as needed and input the location and time information.
Run the first cell in the HRRR section to initialize the HRRR model (fm = HRRR())
Run the fourth cell in the HRRR section to get processed data (data = fm.get_processed_data(latitude, longitude, start, end))
Run the 9th cell in the HRRR section to print the processed, sorted data (data[sorted(data.columns)]).
The output will appear as follows:
The data shows solar irradiation data from 07:00-07:00 to 19:00-07:00. This makes sense if the timestamp is printed in local time, and does not if the timestamp is printed in UTC. The air temperature data shows a minimum at 12:00-07:00 and a maximum at 22:00-07:00. This makes sense if the timestamp is in UTC and does not if the data timestamp is in local time.
The following description, code, and solution were provided by Cliff Hansen @ Sandia:
Thanks for the catch @PeterGrant! @cwhanse do you add this bug fix in the next milestone?
Yes, but I'd like confirmation first from someone else that it is a bug. I'm not smart enough to look directly at the forecast data and verify that the problem is with how pvlib is handling the timezone.
I do not use pvlib.forecast myself, but I poked around the forecast netcdf and agree that it seems like a bug -- timestamps are local time but should be UTC:
importpandasaspdfrompvlib.forecastimportHRRRlatitude=32.2longitude=-110.9tz='America/Phoenix'start=pd.Timestamp('2021-07-19 00:00:00', tz=tz)
end=start+pd.Timedelta(days=3)
fm=HRRR()
data=fm.get_data(latitude, longitude, start, end, close_netcdf_data=False)
print(fm.netcdf_data['time'].units) # 'Hour since 2021-07-18T00:00:00Z'print(fm.netcdf_data['time'][0][0]) # first value is 31 -> first timestamp should be 2021-07-19T07:00:00Zprint(data.index[0]) # 2021-07-19 07:00:00-07:00 (local, not utc)
Is this issue specific to HRRR? If so, the fix proposed above might not be appropriate, as I think it would affect all forecast providers, not just HRRR.
In my project I'm examining air temperature and solar data, so my following comments apply to those values. They do not apply to other data returned by the model.
The solar position algorithm appears to ignore timezone in timezone aware timestamps when calculating irradiance data. Specifically, the .get_processed_data() function returns data with mixed handling of timezones. The dataframe returned has the timestamp in the format "UTC+tz". The ambient air temperature data matches this timestamp, with the highest temperature around, in local time, 17:00 and the minimum temperature around 05:00. The solar data does not match this timestamp, with irradiation beginning around, in UTC, 07:00 and ending around 19:00. This corresponds to irradiation data being > 0 from midnight to noon in local time.
I'm not sure if this is a bug or if it's me not understanding how timesteps are handled in pvlib-python, but the forecast I'm getting from .get_processed_data() does not appear to be correct.
To Reproduce
You can replicate this by running the code in the docs/tutorials/forecast.ipynb notebook. Steps:
The output will appear as follows:
The data shows solar irradiation data from 07:00-07:00 to 19:00-07:00. This makes sense if the timestamp is printed in local time, and does not if the timestamp is printed in UTC. The air temperature data shows a minimum at 12:00-07:00 and a maximum at 22:00-07:00. This makes sense if the timestamp is in UTC and does not if the data timestamp is in local time.
The following description, code, and solution were provided by Cliff Hansen @ Sandia:
I think there is a bug in this line:
pvlib-python/pvlib/forecast.py
Line 418 in 7eae1fc
This code produces the output Peter showed, with the air temperature and GHI patterns offset.
import pandas as pd
from pvlib.forecast import HRRR
latitude = 32.2
longitude = -110.9
tz = 'America/Phoenix'
start = pd.Timestamp('2021-05-19 00:00:00', tz=tz)
end = start + pd.Timedelta(days=7) # 7 days
fm = HRRR()
data = fm.get_processed_data(latitude, longitude, start, end)
data[sorted(data.columns)]
plt.plot(data['temp_air']*30)
plt.plot(data['ghi'])
Replacing that line with
self.time = pd.DatetimeIndex(pd.Series(times), tz=self.location.tz)
produces time traces that make sense.
The text was updated successfully, but these errors were encountered: