API: getitem[int] vs setitem[int] with Float64Index #33469

jbrockmendel · 2020-04-10T22:44:58Z

ser = pd.Series([1, 2, 3, 4], index=[1.1, 2.1, 3.0, 4.1])

>>> ser[5]                    # <--we are treating 5 as a label
KeyError: 5

>>> ser[5] = 5                # < --we are treating 5 as position
IndexError: index 5 is out of bounds for axis 0 with size 4

>>> ser[3] = 5                # <-- we are treating 3 as a label
>>> ser
1.1    1
2.1    2
3.0    5
4.1    4
dtype: int64

The ser[5] = 5 case is an outlier because instead of having its label-vs-positional behavior determined by ser.index._should_fallback_to_positional, it is defermined by if is_integer(key) and not self.index.inferred_type == "integer":

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-09-07T20:36:39Z

This API is certainly a mess .. ;)

The proposal is to change the setitem case where it fallbacks to positional to also be strictly label-based?

But so the integer would also be interpreted as label, and thus do enlargement in case it is not yet present?

Using your example series:

In [40]: ser
Out[40]: 
1.1    1
2.1    2
3.0    3
4.1    4
dtype: int64

In [41]: ser[2] = 100

In [42]: ser
Out[42]: 
1.1      1
2.1      2
3.0    100
4.1      4
dtype: int64

In [43]: ser[2.0] = -100

In [44]: ser
Out[44]: 
1.1      1
2.1      2
3.0    100
4.1      4
2.0   -100
dtype: int64

I was first thinking this would be a relatively safe change, since it right now already tries label-based for setitem first (so if the integer key is present as a label, it would already do label-based now). So the potential difference in behaviour is only when the label is not present. Right now that can update a value, but in the future add another value like the above? Which is kind of silently resulting in different data. We could also deprecate integer keys for setitem first or raise an error for it?

jbrockmendel · 2020-09-08T04:06:51Z

We could also deprecate integer keys for setitem first or raise an error for it?

Do you mean "deprecate ever treating integer as a label for Float64Index"? Or "deprecate allowing ser[integer] with Float64Index"? The first seems reasonable.

I'd be OK with "integers are always considered labels" or "integers are always considered positional", so there is no question of fall-back behavior.

jbrockmendel · 2021-06-09T23:37:31Z

Options that come to mind for fixing this:

Deprecate casting ints to floats in Float64Index.get_loc, so that ser[4] is always positional when we have a Float64Index.
Change Float64Index._should_fallback_to_positional to return not (self == self.astype(int)).any() (ATM it always returns False).
- Upsides: Fixes the OP issue without breaking any tests, fixes the inconsistency of how we check for fallback in Series.__setitem__ vs Series.__getitem__
- Downsides: perf (caching would help some), value-dependent behavior
Change Float64Index._should_fallback_to_positional to always return True.
- Breaks 5 tests, 4 of which are checking that we raise, the last is trivial to fix.

jorisvandenbossche · 2021-06-10T19:47:14Z

1. Deprecate casting ints to floats in Float64Index.get_loc, so that ser[4] is always positional when we have a Float64Index.

Currently I think ser[4] is always strictly label-based for getitem? Since that part is actually consistent / unambiguous (if you know the semantics ..) at this moment, I would maybe leave it as is. Or if we deprecate it, have it raise an error in the future instead of being positional.

2. Change Float64Index._should_fallback_to_positional to return not (self == self.astype(int)).any() (ATM it always returns False).

Downsides: perf (caching would help some), value-dependent behavior

For that last reason (value-dependent behavior), I wouldn't go for this. ser[2] meaning something different for pd.Series([..], index=[1.1, 2.0]) vs pd.Series([..], index=[1.0, 2.0]) seems quite confusing or surprising.

3. Change Float64Index._should_fallback_to_positional to always return True.

What's the behavioral consequence of this? So that if the key (cast to float) is not present, always use positional?

Is a 4th option to change Float64Index._should_fallback_to_positional to always return False? (like it is for Int64Index?)

jbrockmendel · 2021-06-10T23:12:28Z

Note: in all the options described above, we are changing in Series.__setitem__

-            if is_integer(key) and self.index.inferred_type != "integer":
+            if is_integer(key) and self.index._should_fallback_to_positional():

Change Float64Index._should_fallback_to_positional to always return True.

What's the behavioral consequence of this? So that if the key (cast to float) is not present, always use positional?

If we do that, and add in __getitem__ a clause so that we try label-based before falling back, then we break 5 tests e.g.

    def test_floating_misc(self):
        ser = Series(np.arange(5), index=np.arange(5) * 2.5, dtype=np.int64)

        with pytest.raises(KeyError, match=r"^4$"):
>           ser[4]
E           Failed: DID NOT RAISE <class 'KeyError'>

(without the extra clause in __getitem__ we break 9 tests, including a few that aren't with pytest.raises(...))

Is a 4th option to change Float64Index._should_fallback_to_positional to always return False? (like it is for Int64Index?)

That is the status quo for _should_fallback_to_positional. If we change the line in Series.__setitem__ and leave everything else unchanged, we break 1 test (huh, thought it was more:

    def test_setitem_index_float64(self):
        val = 5
        obj = pd.Series([1, 2, 3, 4], index=[1.1, 2.1, 3.1, 4.1])
        assert obj.index.dtype == np.float64
    
        # float + int -> int
        temp = obj.copy()
        msg = "index 5 is out of bounds for axis 0 with size 4"
        with pytest.raises(IndexError, match=msg):
>           temp[5] = 5
E           Failed: DID NOT RAISE <class 'IndexError'>

jreback · 2021-06-24T13:45:04Z

IIUC I think integers should always be label based for FloatIndex. so this means deprecating (and then changing) the setitem behavior.
getitem would be unchanged i think.

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 10, 2020

jbrockmendel mentioned this issue Apr 10, 2020

REF: Simplify __getitem__ by doing positional-int check first #33471

Merged

jbrockmendel added API - Consistency Internal Consistency of API/Behavior API Design Indexing Related to indexing on series/frames, not to indexes themselves and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 13, 2020

mroeschke added the Enhancement label May 7, 2020

jbrockmendel mentioned this issue May 15, 2020

DEPR: deprecate Index.__getitem__ with float key #34191

Closed

jbrockmendel mentioned this issue Sep 7, 2020

BUG?: should ComplexBlock._can_hold_element/should_store check itemsize? #32878

Closed

jbrockmendel mentioned this issue Jun 10, 2021

PERF: cache _should_fallback_to_positional #41917

Merged

jbrockmendel mentioned this issue Jun 25, 2021

DEPR: Series.__setitem__ with Float64Index falling back to positional #42215

Merged

4 tasks

jreback added this to the 1.4 milestone Jun 25, 2021

jreback closed this as completed in #42215 Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: getitem[int] vs setitem[int] with Float64Index #33469

API: getitem[int] vs setitem[int] with Float64Index #33469

jbrockmendel commented Apr 10, 2020

jorisvandenbossche commented Sep 7, 2020

jbrockmendel commented Sep 8, 2020

jbrockmendel commented Jun 9, 2021

jorisvandenbossche commented Jun 10, 2021

jbrockmendel commented Jun 10, 2021

jreback commented Jun 24, 2021

API: __getitem__[int] vs __setitem__[int] with Float64Index #33469

API: __getitem__[int] vs __setitem__[int] with Float64Index #33469

Comments

jbrockmendel commented Apr 10, 2020

jorisvandenbossche commented Sep 7, 2020

jbrockmendel commented Sep 8, 2020

jbrockmendel commented Jun 9, 2021

jorisvandenbossche commented Jun 10, 2021

jbrockmendel commented Jun 10, 2021

jreback commented Jun 24, 2021

API: getitem[int] vs setitem[int] with Float64Index #33469

API: getitem[int] vs setitem[int] with Float64Index #33469