Skip to content

ValueError: setting an array element with a sequence. #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MtDersvan opened this issue Feb 17, 2018 · 3 comments
Closed

ValueError: setting an array element with a sequence. #123

MtDersvan opened this issue Feb 17, 2018 · 3 comments

Comments

@MtDersvan
Copy link

INFO:
pandas_gbq.__version__ : '0.3.0', '0.3.1'
python: 3.6.2, 3.5.2

SNIPPET:

from pandas.io import gbq
df  = gbq.read_gbq(
    """
    #standardSQL
    SELECT embedding_v1
    FROM `{TABLE_ID}` LIMIT 10
    """.format(TABLE_ID='patents-public-data.google_patents_research.publications'),
    dialect='standard',
    project_id='XXXXXXX',
    configuration={'query': {'useQueryCache': True}}
    )

ERROR:

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
  File "/Users/xxxxxx/tf/lib/python3.6/site-packages/pandas/io/gbq.py", line 99, in read_gbq
    **kwargs)
  File "/Users/xxxxxx/tf/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 828, in read_gbq
    final_df = _parse_data(schema, rows)
  File "/Users/xxxxxx/tf/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 729, in _parse_data
    page_array[row_num][col_num] = field_value
ValueError: setting an array element with a sequence.

ISSUE:
'patents-public-data.google_patents_research.publications' - a public dataset.
'embedding_v1' - a repeated float field.
Google BigQuery tools parses this query without a problem, but pandas-gbq outputs the upper-mentioned issue.
Maybe related to #101

@tswast
Copy link
Collaborator

tswast commented Feb 20, 2018

Interesting. Maybe a regression from #101? Or maybe those tests were insufficient?

@jasonqng
Copy link
Contributor

jasonqng commented Feb 21, 2018

Confirmed it indeed fails and only on arrays with floats and timestamps. For example:

gbq.read_gbq("select [1.1,2.2,3.3]","project",dialect="standard")

Same for timestamps, for which it fails as well:

gbq.read_gbq("select 
    [TIMESTAMP_SECONDS(1),
    TIMESTAMP_SECONDS(2),
    TIMESTAMP_SECONDS(3)]","project",dialect="standard")

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jasonng/anaconda2/lib/python2.7/site-packages/pandas/io/gbq.py", line 99, in read_gbq
    **kwargs)
  File "/Users/jasonng/anaconda2/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 828, in read_gbq
    final_df = _parse_data(schema, rows)
  File "/Users/jasonng/anaconda2/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 729, in _parse_data
    page_array[row_num][col_num] = field_value
ValueError: Could not convert object to NumPy datetime

Which leads us to the custom dtype handling we're doing here in _parse_data:

    dtype_map = {'FLOAT': np.dtype(float),
                 'TIMESTAMP': 'M8[ns]'}

    fields = schema['fields']
    col_types = [field['type'] for field in fields]
    col_names = [str(field['name']) for field in fields]
    col_dtypes = [
        dtype_map.get(field['type'].upper(), object)
        for field in fields
    ]

I assume this has something to do with handling NULLS as indicated by the link at the top of the function, but I wonder if we can strip this out safely or handle it differently?

@MtDersvan
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants