-
Notifications
You must be signed in to change notification settings - Fork 125
Encode before uploading #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #108 +/- ##
===========================================
- Coverage 73.92% 28.26% -45.66%
===========================================
Files 4 4
Lines 1507 1560 +53
===========================================
- Hits 1114 441 -673
- Misses 393 1119 +726
Continue to review full report at Codecov.
|
Travis passes. Not sure what's going on with codecov, but I think the PR should be good |
pandas_gbq/tests/test_gbq.py
Outdated
self.destination_table + test_id), | ||
project_id=_get_project_id()) | ||
|
||
assert result['num_rows'][0] == test_si |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: test_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, perhaps a test that downloads the uploaded dataframe and verifies that the special characters are preserved would be nice?
Currently fails in my Python 2 env (error below) because the returned object from
Above tweak works for me, and both your tests now pass. Verified data is good in GUI and reading back via pandas-gbq.
Nice catch and fix! Error:
|
@jasonqng thank you v much. Even if a bit of a hack, I think it's good for the moment |
pandas_gbq/tests/test_gbq.py
Outdated
project_id=_get_project_id()) | ||
|
||
assert result['num_rows'][0] == test_size | ||
tm.assert_series_equal(result['string'], df['string']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want to either sort the result
dataframe by integer
or do an order by
in the query to ensure that the assertion will pass (otherwise, the rows of result could be in a different order than your original df
).
@jasonqng on second try, unfortunately that solution didn't work for Python3. I've pushed something that's not elegant either, but it works. The one thing that doesn't work is the comparison when testing in Py2. If you look at the df it produces, it looks good, but comparing the strings is not successful. So I've skipped a subset of the test for moment |
@@ -581,7 +581,11 @@ def load_data(self, dataframe, dataset_id, table_id, chunksize): | |||
self._print("\rLoad is {0}% Complete".format( | |||
((total_rows - remaining_rows) * 100) / total_rows)) | |||
|
|||
body = StringIO('{}\n'.format('\n'.join(rows))) | |||
body = '{}\n'.format('\n'.join(rows)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use u'{}\n'.format(u'\n'.join(rows))
is the if statement checking for bytes
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not:
> body = u'{}\n'.format(u'\n'.join(rows))
E UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 77: ordinal not in range(128)
I think the nub problem is that row.to_json
comes out as either bytes or str depending on the python version - so we need some branching somewhere. Unless there's a function in python that can deal with both (this all seems a bit inelegant)
(I also tried decoding the row first on 576, which made Py2 pass, but then python3 failed, because it can't decode unicode.
Please add this to the changelog at https://github.com/pydata/pandas-gbq/blob/master/docs/source/changelog.rst under a new section heading |
@tswast added whatsnew |
The build failed for this after merging because the test was waiting on user credentials:
I'm not entirely sure why this is happening, since this test is in the |
Hmmm. I did add some tests to that class. I've removed them in this branch: https://github.com/maxim-lian/pandas-gbq/tree/test-fix , (+ included a fix that I didn't carry over to those in the ServiceAccount class). Could you try running that in Travis? |
Confirming that this made the following (note the accented character) work again for me on Python 3.6.3... pd.DataFrame({'my_string': ['María']}).to_gbq('xxx.yyy', 'project-id') |
@maxim-lian That looks like it should fix it, thanks. Build running at https://travis-ci.org/tswast/pandas-gbq/builds/330577200 Note: there are instructions at https://pandas-gbq.readthedocs.io/en/latest/contributing.html#running-google-bigquery-integration-tests for setting up your personal fork to build with integration tests on Travis. |
Ah great! Not sure how I missed that. That's super |
Ugh. Still failing with a timeout on Travis.
It's not asking for an authorization code anymore, so I'm not sure how it's getting stuck. |
I think I figured it out. These tests need a private key to be manually set. Building at https://travis-ci.org/tswast/pandas-gbq/builds/330598420 |
Ooof, sorry to leave you with that. #109 would solve all our problems forevermore |
Potential fix for #106
...but someone with better bytes / str understanding needs to review