-
Notifications
You must be signed in to change notification settings - Fork 125
BUG: Fix uploading of dataframes containing int64 and float64 columns #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes googleapis#116 and googleapis#96 by loading data in CSV chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks superb!
pandas_gbq/_load.py
Outdated
return six.BytesIO(body) | ||
|
||
|
||
def encode_chunks(dataframe, chunksize): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the multiple chunks, rather than using a single chunk? Is it a memory issue? A UI / status bar updating issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because previously chunksize was required and I wasn't ready to make it option. I've just added a commit to this PR to make it optional. We'll want to update the default in Pandas after we release a package with this change.
@@ -0,0 +1,26 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this file name currently has two underscores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'm aware. I'm following the convention that the filename should be test_
+ the filename of the file under test.
Codecov Report
@@ Coverage Diff @@
## master #117 +/- ##
==========================================
+ Coverage 28.25% 30.99% +2.74%
==========================================
Files 4 8 +4
Lines 1561 1626 +65
==========================================
+ Hits 441 504 +63
- Misses 1120 1122 +2
Continue to review full report at Codecov.
|
Full Travis build with system tests running at https://travis-ci.org/tswast/pandas-gbq/builds/339826702 |
Also, fixes lint errors.
cc219bf
to
64ff345
Compare
Ah, Travis uncovered a potential problem with using CSV.
I think I probably need to include the schema definition in the load job, since we want to be able to upload a data frame even if the columns are out of order or there are extra columns. |
Yes, for sure. I think you can do that fairly easily with (we have this but can't remember why we do the obj construction) def _bq_schema(df):
schema_dict = _gbq._generate_bq_schema(df)
schema = [bigquery.schema.SchemaField(x['name'], x['type'])
for x in schema_dict['fields']]
return schema |
I imagine you had to do the object construction as a workaround for googleapis/google-cloud-python#4456 |
Okay, I think I got it this time. Full build in-progress at https://travis-ci.org/tswast/pandas-gbq/builds/340616570 |
e28bce8
to
b8c933d
Compare
Congrats @tswast ! Thanks for pushing this through! |
Yeah! I'll plan to do a release this week to get all of these |
Fixes #116 and #96 by loading data in CSV chunks.