fail to read INT64 values on Windows #119

blose · 2018-02-13T10:15:41Z

On pandas-gbq-0.3.0 it fails to get INT64 values:
new_df = gbq.read_gbq( "SELECT 12345678901234567 as iii", project_id=project_id, dialect="standard" )

both on standard and legacy dialects. Querying from the real database does the same.

Requesting query... ok.
Job ID: 82d0024d-3e31-4518-a6d5-755cf96f108c
Query running...
Query done.
Processed: 0.0 B Billed: 0.0 B
Standard price: $0.00 USD

Retrieving results...
Got 1 rows.

OverflowError Traceback (most recent call last)
in ()
7 top10_active_users_df = gbq.read_gbq(
8 "SELECT 12345678901234567 as iii",
----> 9 project_id=project_id
10 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\gbq.py in read_gbq(query, project_id, index_col, col_order, reauth, verbose, private_key, dialect, **kwargs)
98 private_key=private_key,
99 dialect=dialect,
--> 100 **kwargs)
101
102

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas_gbq\gbq.py in read_gbq(query, project_id, index_col, col_order, reauth, verbose, private_key, auth_local_webserver, dialect, **kwargs)
875 field['mode'] != 'repeated':
876 final_df[field['name']] =
--> 877 final_df[field['name']].astype(type_map[field['type'].upper()])
878
879 connector.print_elapsed_seconds(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
89 else:
90 kwargs[new_arg_name] = new_arg_value
---> 91 return func(*args, **kwargs)
92 return wrapper
93 return _deprecate_kwarg

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
3408 # else, only a single dtype is given
3409 new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 3410 **kwargs)
3411 return self._constructor(new_data).finalize(self)
3412

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, **kwargs)
3222
3223 def astype(self, dtype, **kwargs):
-> 3224 return self.apply('astype', dtype=dtype, **kwargs)
3225
3226 def convert(self, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
3089
3090 kwargs['mgr'] = self
-> 3091 applied = getattr(b, f)(**kwargs)
3092 result_blocks = _extend_blocks(applied, result_blocks)
3093

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in astype(self, dtype, copy, errors, values, **kwargs)
469 def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
470 return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 471 **kwargs)
472
473 def _astype(self, dtype, copy=False, errors='raise', values=None,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in _astype(self, dtype, copy, errors, values, klass, mgr, raise_on_error, **kwargs)
519
520 # _astype_nansafe works fine with 1-d only
--> 521 values = astype_nansafe(values.ravel(), dtype, copy=True)
522 values = values.reshape(self.shape)
523

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy)
623 elif arr.dtype == np.object_ and np.issubdtype(dtype.type, np.integer):
624 # work around NumPy brokenness, #1987
--> 625 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
626
627 if dtype.name in ("datetime64", "timedelta64"):

pandas_libs\lib.pyx in pandas._libs.lib.astype_intsafe()

pandas/_libs/src\util.pxd in util.set_value_at_unsafe()

OverflowError: Python int too large to convert to C long

The text was updated successfully, but these errors were encountered:

max-sixty · 2018-02-13T13:54:39Z

I'm guessing you're on a 32bit platform, in which case the integer is too big. Easiest solution is to convert to float in your query, or divide by a trillion

blose · 2018-02-14T06:22:30Z

It seems that I have 64-bit anaconda installation.

import platform
platform.architecture()

('64bit', 'WindowsPE')

Some anaconda installation info

(base) C:\Users\maxim.belov>conda info

     active environment : base
    active env location : C:\Users\maxim.belov\AppData\Local\Continuum\anaconda3
            shell level : 1
       user config file : C:\Users\maxim.belov\.condarc
 populated config files : C:\Users\maxim.belov\.condarc
          conda version : 4.4.4
    conda-build version : 3.0.27
         python version : 3.6.3.final.0
       base environment : C:\Users\maxim.belov\AppData\Local\Continuum\anaconda3  (writable)
           channel URLs : https://repo.continuum.io/pkgs/main/win-64
                          https://repo.continuum.io/pkgs/main/noarch
                          https://repo.continuum.io/pkgs/free/win-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/win-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/win-64
                          https://repo.continuum.io/pkgs/pro/noarch
                          https://repo.continuum.io/pkgs/msys2/win-64
                          https://repo.continuum.io/pkgs/msys2/noarch
          package cache : C:\Users\maxim.belov\AppData\Local\Continuum\anaconda3\pkgs
                          C:\Users\maxim.belov\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\maxim.belov\AppData\Local\Continuum\anaconda3\envs
                          C:\Users\maxim.belov\AppData\Local\conda\conda\envs
                          C:\Users\maxim.belov\.conda\envs
               platform : win-64
             user-agent : conda/4.4.4 requests/2.18.4 CPython/3.6.3 Windows/10 Windows/10.0.16299
          administrator : False
             netrc file : None
           offline mode : False

I have 64-bit identifiers there - float or division will not work, as it loses some digits. As for the temporary solution, I just convert them to strings inside the query in a way like:

CAST(long_long_id AS STRING)

But this seems to be a rather ill act

max-sixty · 2018-02-14T13:49:16Z

What do you get if you run this?

In [1]: import sys

In [2]: sys.maxsize
Out[2]: 9223372036854775807

That said, I think this may be a python & platform limitation - can you find instances of these numbers working elsewhere from BQ? If that's right, you may get more visibility on your question on SO decoupled from BQ

blose · 2018-02-14T15:21:34Z

Everything is okay with python itself, 64-bit integers are treated as they should:

import sys
print(sys.maxsize)

testint = 123456789012345678
testint *= 2
testint >>= 1
print(testint)

I got as expected

9223372036854775807
123456789012345678

can you find instances of these numbers working elsewhere from BQ?

I have no idea how to reproduce this outside of pandas-gbq. When I collect the data using bigquery web interface, export it via csv, and import it with pandas, everything works perfect.

In any case, when I get a value greater than 2^32-1 it fails with the same error:

SELECT CAST(POW(2,31)-1 as INT64) is okay
SELECT CAST(POW(2,31) as INT64) fails

it also does not depend, whether the numbers are calculated as above, or retrieved from a regular table, only the values of integers do matter.

max-sixty · 2018-02-14T15:46:07Z

Thanks, v helpful response.

The weird thing is that mine works fine:

In [15]: new_df = gbq.read_gbq( "SELECT 1234567890123456789 as iii", project_id=project_id, dialect="standard" )
Requesting query... ok.
Job ID: 04292a42-7ac0-4bdb-a06a-c14c2083c335
Query running...
Query done.
Processed: 0.0 B Billed: 0.0 B
Standard price: $0.00 USD

Retrieving results...
Got 1 rows.

Total time taken 1.61 s.
Finished at 2018-02-14 10:43:58.

In [16]: new_df
Out[16]:
                   iii
0  1234567890123456789

...so I suspect it's a platform related issue.

Can you replicate if you attempt to make a dataframe (without BQ) using a large int? The stack trace doesn't look like it's anything BQ-specific, and you'll get much more help if we can turn this into a generic pandas question

blose · 2018-02-14T15:57:43Z

Hmm...

from io import StringIO
import pandas as pd
TESTDATA=StringIO("""col1
    123456789012345678
    """)
df = pd.read_csv(TESTDATA, sep=";")
print(df)
print(df.dtypes)

I get

                 col1
0  123456789012345678
col1    int64
dtype: object

Maybe, this is an issue with type mapping?

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas_gbq\gbq.py in read_gbq(query, project_id, index_col, col_order, reauth, verbose, private_key, auth_local_webserver, dialect, **kwargs)
875 field['mode'] != 'repeated':
876 final_df[field['name']] =
--> 877 final_df[field['name']].astype( type_map [field['type'].upper()])
878
879 connector.print_elapsed_seconds(

max-sixty · 2018-02-14T16:03:02Z

Right - how about In [6]: df.astype(int) - I think that should do the same thing pandas_gbq is doing?

blose · 2018-02-14T16:08:20Z

Yes, for some reasons, the 64bit values from BQ are treated as int32

query_df = gbq.read_gbq(
    "SELECT CAST(POW(2,31)-1 as INT64) ",
    project_id=project_id,
    dialect="standard"
)
query_df.dtypes

f0_    int32
dtype: object

Do you have any ideas how it may happen, or, alternatively, how to force a type for a column before the query?

max-sixty · 2018-02-14T16:12:17Z

OK at least progress

What did the line above return? df from your previous comment

blose · 2018-02-14T16:22:14Z

from io import StringIO

import pandas as pd

TESTDATA=StringIO("""col1
    123456789012345678
    """)

df = pd.read_csv(TESTDATA, sep=";")
print(df)
print("-----------------")
print(df.dtypes)
print("-----------------")
print(df.astype(int))

I get

                 col1
0  123456789012345678
-----------------
col1    int64
dtype: object
-----------------
         col1
0 -1506741426

max-sixty · 2018-02-14T16:23:35Z

OK - so it doesn't throw the same error, but it does break (i.e. it wraps). Am I reading that correctly?

blose · 2018-02-14T16:29:07Z

Yes, if I use pandas without GBQ

int64 are treated well
explicit conversion to int leads to silent integer overflow

If i use pandas with GBQ

int64 SQL type is mapped into int32 python type for some reasons
it throws an error if an overflow occurs

max-sixty · 2018-02-14T17:11:25Z

OK good synopsis

I would take the latter bug and push it to pandas directly (or maybe it's already out there). It sounds like something Windows-specific, given I don't get it in on Mac

I'm sorry not to be more helpful directly to your problem though. Thanks for pushing this issue this far

max-sixty · 2018-08-21T17:37:56Z

Closed by #121

blose mentioned this issue Feb 15, 2018

BUG: Explicitly use 64bit integers when converting from BQ types #121

Merged

tswast mentioned this issue Feb 15, 2018

Run tests on Windows #122

Closed

tswast changed the title ~~fail to read INT64 values~~ fail to read INT64 values on Windows Feb 23, 2018

max-sixty closed this as completed Aug 21, 2018

tswast mentioned this issue Nov 30, 2021

Run tests in Windows googleapis/gapic-generator-python#1814

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to read INT64 values on Windows #119

fail to read INT64 values on Windows #119

blose commented Feb 13, 2018

max-sixty commented Feb 13, 2018

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018 •

edited

Loading

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

blose commented Feb 14, 2018 •

edited

Loading

max-sixty commented Feb 14, 2018 •

edited

Loading

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

max-sixty commented Aug 21, 2018

fail to read INT64 values on Windows #119

fail to read INT64 values on Windows #119

Comments

blose commented Feb 13, 2018

max-sixty commented Feb 13, 2018

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018 • edited Loading

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

blose commented Feb 14, 2018 • edited Loading

max-sixty commented Feb 14, 2018 • edited Loading

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

blose commented Feb 14, 2018

max-sixty commented Feb 14, 2018

max-sixty commented Aug 21, 2018

max-sixty commented Feb 14, 2018 •

edited

Loading

blose commented Feb 14, 2018 •

edited

Loading

max-sixty commented Feb 14, 2018 •

edited

Loading