-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Update assign_coords with a MultiIndex to match new Coordinates API #8039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
On the current main you should be able to do this: mulind_coords = xr.Coordinates.from_pandas_multiindex(mulind, 'time')
da.chunk(time=50).assign_coords(mulind_coords) Directly using a pandas MultiIndex for creating or overriding Xarray coordinates is now deprecated. That said, it should still be supported in principle, so the inconsistent coordinates vs. multi-index state you get after |
Thanks @benbovy ! Sadly, your proposed solution did not work, it failed with: However, the following line: da.chunk(time=50).drop_vars('time').assign_coords(time=mulind) works with both the master and 2023.7.0. Meaning that the multi-coordinates are correctly shown in the DataArray repr and that unstacking mulind_coords = xr.Coordinates.from_pandas_multiindex(mulind, 'time')
da.chunk(time=50).drop_vars('time').assign_coords(mulind_coords) In my code, I'll fix the error by also converting It seems to me that overriding a coordinate variable should be possible with Full traceback of the error above: ValueError Traceback (most recent call last) File ~/Projets/xarray/xarray/core/common.py:620, in DataWithCoords.assign_coords(self, coords, **coords_kwargs) File ~/Projets/xarray/xarray/core/coordinates.py:467, in Coordinates.update(self, other) File ~/Projets/xarray/xarray/core/merge.py:555, in merge_coords(objects, compat, join, priority_arg, indexes, fill_value) File ~/Projets/xarray/xarray/core/alignment.py:847, in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value) File ~/Projets/xarray/xarray/core/alignment.py:783, in align(join, copy, indexes, exclude, fill_value, *objects) File ~/Projets/xarray/xarray/core/alignment.py:567, in Aligner.align(self) File ~/Projets/xarray/xarray/core/alignment.py:312, in Aligner.assert_no_index_conflict(self) ValueError: cannot re-index or align objects with conflicting indexes found for the following coordinates: 'time' (2 conflicting indexes)
|
<!--Please ensure the PR fulfills the following requirements! --> <!-- If this is your first PR, make sure to add your details to the AUTHORS.rst! --> ### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #1417 - [ ] Tests for the changes have been added (for bug fixes / features) - [ ] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] CHANGES.rst has been updated (with summary of main changes) - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? Overriding a coordinate with a pandas MultiIndex was not working properly before, but this didn't result in any errors up to a recent change in xarray's master. When calling `assign_coords` with a multiindex on a coordinate that already existed, the sub-indexes would not appear in the DataArray repr. In the most recent xarray master, that now triggers an error when trying to unstack the multiindex, but only if dask is used. Before, the unstacking would proceed as normal and the sub-indexes would magically appear on the result. Dropping the previous coordinate solves this problem. The next xarray will implement a new public `xr.Coordinates` class that should be used to assign coords, instead of a raw `pd.MultiIndex`. Nonetheless, overriding is still not possible and dropping the previous coordinate is still needed. ### Does this PR introduce a breaking change? No. ### Other information: See pydata/xarray#8039.
Agreed. We don't need full alignment but we would still need to check that dimension sizes match and whether the new assigned coordinate(s) do not break any multi-coordinate index in the original object. Maybe we could just use |
@aulemahal Since you have a |
@pydata/xarray does any one have time to fix this. I think we should consider it a release blocker. |
I can have a deeper look at it later this week. I'm not sure that it will be straightforward to fix, though, at least properly 😕. If not, I'll see if we can provide a quick and dirty fix. |
@aulemahal I've just run your MWE on the main branch (eceec5f) and it seems to work, could you confirm? >>> import pandas as pd
>>> import xarray as xr
>>> da = xr.DataArray([1] * 730, coords={"time": xr.date_range('1900-01-01', periods=730, freq='D', calendar='noleap')})
>>> mulind = pd.MultiIndex.from_arrays((da.time.dt.year.values, da.time.dt.dayofyear.values), names=('year', 'doy'))
>>> da.chunk(time=50).assign_coords(time=mulind).unstack('time')
<xarray.DataArray (year: 2, doy: 365)>
dask.array<reshape, shape=(2, 365), dtype=int64, chunksize=(1, 365), chunktype=numpy.ndarray>
Coordinates:
* year (year) int64 1900 1901
* doy (doy) int64 1 2 3 4 5 6 7 8 9 ... 358 359 360 361 362 363 364 365 There are still two issues to fix:
But if the MWE works in main at least we shouldn't consider this as a release blocker. |
The initial "regression" is fixed, it works for me! Thanks a lot! |
Great, although I have no idea on what could have fixed that initial regression since #7368 :-) |
Thanks all! Let's leave the issue open to track #8039 (comment) |
What is your issue?
A pattern we used in
xclim
(and elsewhere) seems to be broken on the master.See MWE:
Now this works ok with both the current master and the latest release. However, if we chunk
da
, the last line now fails:On the master, this gives:
ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}
Full traceback:
ValueError Traceback (most recent call last)
Cell In[44], line 1
----> 1 da.chunk(time=50).assign_coords(time=mulind).unstack("time")
File ~/Projets/xarray/xarray/core/dataarray.py:2868, in DataArray.unstack(self, dim, fill_value, sparse)
2808 def unstack(
2809 self,
2810 dim: Dims = None,
2811 fill_value: Any = dtypes.NA,
2812 sparse: bool = False,
2813 ) -> DataArray:
2814 """
2815 Unstack existing dimensions corresponding to MultiIndexes into
2816 multiple new dimensions.
(...)
2866 DataArray.stack
2867 """
-> 2868 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
2869 return self._from_temp_dataset(ds)
File ~/Projets/xarray/xarray/core/dataset.py:5481, in Dataset.unstack(self, dim, fill_value, sparse)
5479 for d in dims:
5480 if needs_full_reindex:
-> 5481 result = result._unstack_full_reindex(
5482 d, stacked_indexes[d], fill_value, sparse
5483 )
5484 else:
5485 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
File ~/Projets/xarray/xarray/core/dataset.py:5365, in Dataset._unstack_full_reindex(self, dim, index_and_vars, fill_value, sparse)
5362 else:
5363 # TODO: we may depreciate implicit re-indexing with a pandas.MultiIndex
5364 xr_full_idx = PandasMultiIndex(full_idx, dim)
-> 5365 indexers = Indexes(
5366 {k: xr_full_idx for k in index_vars},
5367 xr_full_idx.create_variables(index_vars),
5368 )
5369 obj = self._reindex(
5370 indexers, copy=False, fill_value=fill_value, sparse=sparse
5371 )
5373 for name, var in obj.variables.items():
File ~/Projets/xarray/xarray/core/indexes.py:1435, in Indexes.init(self, indexes, variables, index_type)
1433 unmatched_keys = set(indexes) ^ set(variables)
1434 if unmatched_keys:
-> 1435 raise ValueError(
1436 f"unmatched keys found in indexes and variables: {unmatched_keys}"
1437 )
1439 if any(not isinstance(idx, index_type) for idx in indexes.values()):
1440 index_type_str = f"{index_type.module}.{index_type.name}"
ValueError: unmatched keys found in indexes and variables: {'year', 'doy'}
This seems related to PR #7368.
The reason for the title of this issue is that in both versions, I now realize the
da.assign_coords(time=mulind)
prints as:Something's fishy, because the two "sub" indexes are not showing.
And indeed, with the current master, I can get this to work by doing (again changing the last line):
But it seems a bit odd to me that we need to reconstruct the DataArray to replace its coordinate with a "MultiIndex" one.
Thus, my questions are:
assign_coords
? If not, then this issue would become a feature request.The text was updated successfully, but these errors were encountered: