-
Notifications
You must be signed in to change notification settings - Fork 125
fail to read INT64 values on Windows #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm guessing you're on a 32bit platform, in which case the integer is too big. Easiest solution is to convert to float in your query, or divide by a trillion |
It seems that I have 64-bit anaconda installation. import platform
platform.architecture()
Some anaconda installation info
I have 64-bit identifiers there - float or division will not work, as it loses some digits. As for the temporary solution, I just convert them to strings inside the query in a way like: CAST(long_long_id AS STRING) But this seems to be a rather ill act |
What do you get if you run this?
That said, I think this may be a python & platform limitation - can you find instances of these numbers working elsewhere from BQ? If that's right, you may get more visibility on your question on SO decoupled from BQ |
Everything is okay with python itself, 64-bit integers are treated as they should: import sys
print(sys.maxsize)
testint = 123456789012345678
testint *= 2
testint >>= 1
print(testint) I got as expected
I have no idea how to reproduce this outside of pandas-gbq. When I collect the data using bigquery web interface, export it via csv, and import it with pandas, everything works perfect. In any case, when I get a value greater than 2^32-1 it fails with the same error:
it also does not depend, whether the numbers are calculated as above, or retrieved from a regular table, only the values of integers do matter. |
Thanks, v helpful response. The weird thing is that mine works fine: In [15]: new_df = gbq.read_gbq( "SELECT 1234567890123456789 as iii", project_id=project_id, dialect="standard" )
Requesting query... ok.
Job ID: 04292a42-7ac0-4bdb-a06a-c14c2083c335
Query running...
Query done.
Processed: 0.0 B Billed: 0.0 B
Standard price: $0.00 USD
Retrieving results...
Got 1 rows.
Total time taken 1.61 s.
Finished at 2018-02-14 10:43:58.
In [16]: new_df
Out[16]:
iii
0 1234567890123456789 ...so I suspect it's a platform related issue. Can you replicate if you attempt to make a dataframe (without BQ) using a large int? The stack trace doesn't look like it's anything BQ-specific, and you'll get much more help if we can turn this into a generic pandas question |
Hmm... from io import StringIO
import pandas as pd
TESTDATA=StringIO("""col1
123456789012345678
""")
df = pd.read_csv(TESTDATA, sep=";")
print(df)
print(df.dtypes) I get
Maybe, this is an issue with type mapping? ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas_gbq\gbq.py in read_gbq(query, project_id, index_col, col_order, reauth, verbose, private_key, auth_local_webserver, dialect, **kwargs) |
Right - how about |
Yes, for some reasons, the 64bit values from BQ are treated as int32 query_df = gbq.read_gbq(
"SELECT CAST(POW(2,31)-1 as INT64) ",
project_id=project_id,
dialect="standard"
)
query_df.dtypes
Do you have any ideas how it may happen, or, alternatively, how to force a type for a column before the query? |
OK at least progress What did the line above return? |
from io import StringIO
import pandas as pd
TESTDATA=StringIO("""col1
123456789012345678
""")
df = pd.read_csv(TESTDATA, sep=";")
print(df)
print("-----------------")
print(df.dtypes)
print("-----------------")
print(df.astype(int)) I get
|
OK - so it doesn't throw the same error, but it does break (i.e. it wraps). Am I reading that correctly? |
Yes, if I use pandas without GBQ
If i use pandas with GBQ
|
OK good synopsis I would take the latter bug and push it to pandas directly (or maybe it's already out there). It sounds like something Windows-specific, given I don't get it in on Mac I'm sorry not to be more helpful directly to your problem though. Thanks for pushing this issue this far |
Closed by #121 |
On pandas-gbq-0.3.0 it fails to get INT64 values:
new_df = gbq.read_gbq( "SELECT 12345678901234567 as iii", project_id=project_id, dialect="standard" )
both on standard and legacy dialects. Querying from the real database does the same.
The text was updated successfully, but these errors were encountered: