ValueError: setting an array element with a sequence. #123

MtDersvan · 2018-02-17T19:52:40Z

INFO:
pandas_gbq.__version__ : '0.3.0', '0.3.1'
python: 3.6.2, 3.5.2

SNIPPET:

from pandas.io import gbq
df  = gbq.read_gbq(
    """
    #standardSQL
    SELECT embedding_v1
    FROM `{TABLE_ID}` LIMIT 10
    """.format(TABLE_ID='patents-public-data.google_patents_research.publications'),
    dialect='standard',
    project_id='XXXXXXX',
    configuration={'query': {'useQueryCache': True}}
    )

ERROR:

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
  File "/Users/xxxxxx/tf/lib/python3.6/site-packages/pandas/io/gbq.py", line 99, in read_gbq
    **kwargs)
  File "/Users/xxxxxx/tf/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 828, in read_gbq
    final_df = _parse_data(schema, rows)
  File "/Users/xxxxxx/tf/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 729, in _parse_data
    page_array[row_num][col_num] = field_value
ValueError: setting an array element with a sequence.

ISSUE:
'patents-public-data.google_patents_research.publications' - a public dataset.
'embedding_v1' - a repeated float field.
Google BigQuery tools parses this query without a problem, but pandas-gbq outputs the upper-mentioned issue.
Maybe related to #101

The text was updated successfully, but these errors were encountered:

tswast · 2018-02-20T00:54:15Z

Interesting. Maybe a regression from #101? Or maybe those tests were insufficient?

jasonqng · 2018-02-21T21:50:25Z

Confirmed it indeed fails and only on arrays with floats and timestamps. For example:

gbq.read_gbq("select [1.1,2.2,3.3]","project",dialect="standard")

Same for timestamps, for which it fails as well:

gbq.read_gbq("select 
    [TIMESTAMP_SECONDS(1),
    TIMESTAMP_SECONDS(2),
    TIMESTAMP_SECONDS(3)]","project",dialect="standard")

Error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jasonng/anaconda2/lib/python2.7/site-packages/pandas/io/gbq.py", line 99, in read_gbq
    **kwargs)
  File "/Users/jasonng/anaconda2/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 828, in read_gbq
    final_df = _parse_data(schema, rows)
  File "/Users/jasonng/anaconda2/lib/python2.7/site-packages/pandas_gbq/gbq.py", line 729, in _parse_data
    page_array[row_num][col_num] = field_value
ValueError: Could not convert object to NumPy datetime

Which leads us to the custom dtype handling we're doing here in _parse_data:

    dtype_map = {'FLOAT': np.dtype(float),
                 'TIMESTAMP': 'M8[ns]'}

    fields = schema['fields']
    col_types = [field['type'] for field in fields]
    col_names = [str(field['name']) for field in fields]
    col_dtypes = [
        dtype_map.get(field['type'].upper(), object)
        for field in fields
    ]

I assume this has something to do with handling NULLS as indicated by the link at the top of the function, but I wonder if we can strip this out safely or handle it differently?

MtDersvan · 2018-02-23T13:26:11Z

Thanks!

tswast mentioned this issue Feb 21, 2018

Faster dataframe construction #128

Merged

jasonqng mentioned this issue Feb 21, 2018

Fix array of floats bug #134

Merged

tswast closed this as completed Feb 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: setting an array element with a sequence. #123

ValueError: setting an array element with a sequence. #123

MtDersvan commented Feb 17, 2018

tswast commented Feb 20, 2018

jasonqng commented Feb 21, 2018 •

edited

Loading

MtDersvan commented Feb 23, 2018

ValueError: setting an array element with a sequence. #123

ValueError: setting an array element with a sequence. #123

Comments

MtDersvan commented Feb 17, 2018

tswast commented Feb 20, 2018

jasonqng commented Feb 21, 2018 • edited Loading

MtDersvan commented Feb 23, 2018

jasonqng commented Feb 21, 2018 •

edited

Loading