Skip to content

read_gbq() raises AttributeError for TIMESTAMP values out of bounds for pd.Timestamp #468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RevolutionTech opened this issue Jan 18, 2022 · 3 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@RevolutionTech
Copy link

It appears that pandas-gbq does not appropriate handle TIMESTAMP values that are out of the range of what pd.Timestamp can support. As a result, any query that contains a TIMESTAMP column with a value outside of the supported range will raise an AttributeError.

Environment details

  • OS type and version: Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:20 PDT 2021; root:xnu-7195.141.6~3/RELEASE_ARM64_T8101
  • Python version: Python 3.7.11
  • pip version: pip 21.2.4 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
  • pandas-gbq version:
Name: pandas-gbq
Version: 0.16.0
Summary: Google BigQuery connector for pandas
Home-page: https://github.com/googleapis/python-bigquery-pandas
Author: pandas-gbq authors
Author-email: googleapis-packages@google.com
License: BSD-3-Clause
Location: /usr/local/lib/python3.7/site-packages
Requires: pandas, google-auth, setuptools, google-auth-oauthlib, pyarrow, numpy, google-cloud-bigquery, pydata-google-auth
Required-by: 

Steps to reproduce

  1. Construct a query that returns a TIMESTAMP with a date earlier than pd.Timestamp.min.
  2. Call pandas_gbq.read_gbq() with that SQL query.

Code example

import pandas_gbq

sql = 'SELECT TIMESTAMP("1677-09-21");'
pandas_gbq.read_gbq(sql)  # raises AttributeError

Stack trace

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 837, in read_gbq
    dtypes=dtypes,
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 471, in run_query
    user_dtypes=dtypes,
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 527, in _download_results
    df = pandas_gbq.timestamp.localize_df(df, schema_fields)
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/timestamp.py", line 41, in localize_df
    if field["type"].upper() == "TIMESTAMP" and df[column].dt.tz is None:
  File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 5478, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/accessor.py", line 181, in __get__
    accessor_obj = self._accessor(obj)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 506, in __new__
    raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values

Expected result

pd.read_sql() handles out-of-bounds timestamps gracefully by reverting to the object dtype:

>>> import pandas as pd
>>> import sqlalchemy as sa
>>>
>>> username = ...
>>> password = ...
>>> host = ...
>>> port = ...
>>> database = ...
>>>
>>> conn_string = sa.engine.url.URL("postgresql", username=username, password=password, host=host, port=port, database=database)
>>> engine = sa.create_engine(conn_string)
>>> conn = engine.connect()
>>>
>>> sql = "SELECT timestamp '1677-09-22';"
>>> df = pd.read_sql(con=conn, sql=sql)
>>> df.dtypes
timestamp    datetime64[ns]
dtype: object
>>>
>>> sql = "SELECT timestamp '1677-09-21';"
>>> df = pd.read_sql(con=conn, sql=sql)
>>> df.dtypes
timestamp    object
dtype: object

I would expect pandas-gbq to have the same behaviour.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jan 18, 2022
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jan 18, 2022
@tswast tswast added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels Jan 18, 2022
@tswast tswast self-assigned this Jan 18, 2022
@tswast
Copy link
Collaborator

tswast commented Jan 18, 2022

This should be fixed by #444, but I need to cut a release with it.

@RevolutionTech
Copy link
Author

Yep, looks like the main branch is working perfectly. Thank you @tswast!

@tswast
Copy link
Collaborator

tswast commented Jan 19, 2022

0.17.0 has been released with this fix to pypi https://pypi.org/project/pandas-gbq/0.17.0/

Conda package should be available soon as well. conda-forge/pandas-gbq-feedstock#35

@tswast tswast closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants