Skip to content

default filters for v2 object dtype are wrong #2627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
d-v-b opened this issue Jan 2, 2025 · 3 comments
Open

default filters for v2 object dtype are wrong #2627

d-v-b opened this issue Jan 2, 2025 · 3 comments
Labels
bug Potential issues with the zarr-python library
Milestone

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Jan 2, 2025

this example does not work in main:

def test_x() -> None:
    array = create(
        store={},
        path='foo',
        dtype='O',
        zarr_format=2,
        shape=(3,)
        )
    array[:] = np.array(['a', 'b', 'c'], dtype='O')

The problem is caused because zarr chooses the wrong default filters for O dtype arrays -- zarr chooses VlenBytes, when it should be choosing VlenUTF8. Since the structure of the default codecs is likely to change soon, the fix should probably be made in the context of #2463

cc @dcherian

@dstansby dstansby added the bug Potential issues with the zarr-python library label Jan 2, 2025
@dstansby
Copy link
Contributor

dstansby commented Jan 3, 2025

Seems pretty major, so I've added to the 3.0 milestone - we should at least document this as a known issue!

@dstansby dstansby added this to the 3.0.0 milestone Jan 3, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented Jan 4, 2025

this is still broken on main. I can "fix" it by associating the dtype string O with the VlenUTF8 codec instead of VlenBytes, but that shouldn't work in general for O dtype arrays (and it breaks some tests).

knowing little about how zarr encodes arbitrary python objects, I checked how this works in v2, and it seems that zarr.create(dtype='O') would error if an object_codec was not provided. Not sure we want to emulate that. The original bug report came from xarray's integration tests with zarr main; I can look into how those tests are calling zarr APIs exactly

@jhamman
Copy link
Member

jhamman commented Jan 8, 2025

I'm adding this to the known issues section in the v3 migration guide and marking it as After 3.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants