Apparent issue with timestamp handling & solar position calculations #1237

PeterGrant · 2021-05-20T13:53:48Z

In my project I'm examining air temperature and solar data, so my following comments apply to those values. They do not apply to other data returned by the model.

The solar position algorithm appears to ignore timezone in timezone aware timestamps when calculating irradiance data. Specifically, the .get_processed_data() function returns data with mixed handling of timezones. The dataframe returned has the timestamp in the format "UTC+tz". The ambient air temperature data matches this timestamp, with the highest temperature around, in local time, 17:00 and the minimum temperature around 05:00. The solar data does not match this timestamp, with irradiation beginning around, in UTC, 07:00 and ending around 19:00. This corresponds to irradiation data being > 0 from midnight to noon in local time.

I'm not sure if this is a bug or if it's me not understanding how timesteps are handled in pvlib-python, but the forecast I'm getting from .get_processed_data() does not appear to be correct.

To Reproduce
You can replicate this by running the code in the docs/tutorials/forecast.ipynb notebook. Steps:

Run the first two setup cells to import as needed and input the location and time information.
Run the first cell in the HRRR section to initialize the HRRR model (fm = HRRR())
Run the fourth cell in the HRRR section to get processed data (data = fm.get_processed_data(latitude, longitude, start, end))
Run the 9th cell in the HRRR section to print the processed, sorted data (data[sorted(data.columns)]).

The output will appear as follows:

The data shows solar irradiation data from 07:00-07:00 to 19:00-07:00. This makes sense if the timestamp is printed in local time, and does not if the timestamp is printed in UTC. The air temperature data shows a minimum at 12:00-07:00 and a maximum at 22:00-07:00. This makes sense if the timestamp is in UTC and does not if the data timestamp is in local time.

The following description, code, and solution were provided by Cliff Hansen @ Sandia:

I think there is a bug in this line:

pvlib-python/pvlib/forecast.py

Line 418 in 7eae1fc

self.time = pd.DatetimeIndex(pd.Series(times), tz=self.location.tz)

This code produces the output Peter showed, with the air temperature and GHI patterns offset.

import pandas as pd
from pvlib.forecast import HRRR

latitude = 32.2
longitude = -110.9
tz = 'America/Phoenix'

start = pd.Timestamp('2021-05-19 00:00:00', tz=tz)
end = start + pd.Timedelta(days=7) # 7 days

fm = HRRR()
data = fm.get_processed_data(latitude, longitude, start, end)
data[sorted(data.columns)]

plt.plot(data['temp_air']*30)
plt.plot(data['ghi'])

Replacing that line with

self.time = pd.DatetimeIndex(pd.Series(times), tz=self.location.tz)

    self.time = pd.DatetimeIndex(pd.Series(times)).tz_localize('UTC').tz_convert(self.location.tz)

produces time traces that make sense.

The text was updated successfully, but these errors were encountered:

MichaelHopwood · 2021-05-27T19:50:43Z

Thanks for the catch @PeterGrant!
@cwhanse do you add this bug fix in the next milestone?

cwhanse · 2021-05-27T20:04:08Z

Thanks for the catch @PeterGrant!
@cwhanse do you add this bug fix in the next milestone?

Yes, but I'd like confirmation first from someone else that it is a bug. I'm not smart enough to look directly at the forecast data and verify that the problem is with how pvlib is handling the timezone.

kandersolar · 2021-07-21T23:24:23Z

I do not use pvlib.forecast myself, but I poked around the forecast netcdf and agree that it seems like a bug -- timestamps are local time but should be UTC:

import pandas as pd
from pvlib.forecast import HRRR

latitude = 32.2
longitude = -110.9
tz = 'America/Phoenix'
start = pd.Timestamp('2021-07-19 00:00:00', tz=tz)
end = start + pd.Timedelta(days=3)

fm = HRRR()
data = fm.get_data(latitude, longitude, start, end, close_netcdf_data=False)
print(fm.netcdf_data['time'].units)  # 'Hour since 2021-07-18T00:00:00Z'
print(fm.netcdf_data['time'][0][0])  # first value is 31 -> first timestamp should be 2021-07-19T07:00:00Z
print(data.index[0])  # 2021-07-19 07:00:00-07:00 (local, not utc)

Is this issue specific to HRRR? If so, the fix proposed above might not be appropriate, as I think it would affect all forecast providers, not just HRRR.

spaceguy152 · 2021-08-17T14:20:52Z

@kanderso-nrel I think it affects other models as well. Recently I was working with GFS and found the same issue.

wholmgren mentioned this issue Aug 17, 2021

fix ForecastModel.get_data handling of timezones #1285

Merged

7 tasks

wholmgren added this to the 0.9.0 milestone Aug 17, 2021

wholmgren added the bug label Aug 17, 2021

wholmgren closed this as completed in #1285 Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apparent issue with timestamp handling & solar position calculations #1237

Apparent issue with timestamp handling & solar position calculations #1237

PeterGrant commented May 20, 2021

MichaelHopwood commented May 27, 2021

cwhanse commented May 27, 2021

kandersolar commented Jul 21, 2021

spaceguy152 commented Aug 17, 2021

Apparent issue with timestamp handling & solar position calculations #1237

Apparent issue with timestamp handling & solar position calculations #1237

Comments

PeterGrant commented May 20, 2021

self.time = pd.DatetimeIndex(pd.Series(times), tz=self.location.tz)

MichaelHopwood commented May 27, 2021

cwhanse commented May 27, 2021

kandersolar commented Jul 21, 2021

spaceguy152 commented Aug 17, 2021