Assignment of column via .loc for numpy non-ns datetimes #27928

inmoonlight · 2019-08-15T07:16:16Z

closes Assignment of column via .loc #27395
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback

the most important types of tests are replicating the OP indexing issue. put in pandas/tests/indexing/test_loc.py (see the setting tests)

jreback · 2019-08-15T12:20:17Z

doc/source/whatsnew/v0.25.1.rst

@@ -32,7 +32,7 @@ Categorical
 Datetimelike
 ^^^^^^^^^^^^
 - Bug in :func:`to_datetime` where passing a timezone-naive :class:`DatetimeArray` or :class:`DatetimeIndex` and ``utc=True`` would incorrectly return a timezone-naive result (:issue:`27733`)
-
+- Bug in :func:`maybe_cast_to_datetime` where converting a `np.datetime64` to `datetime64[D]` raise `TypeError` (:issue: `27395`)


this is a private function, you would have to indicate the user facing experience here instead (e.g. the assignment)

jreback · 2019-08-15T12:22:34Z

pandas/core/dtypes/cast.py

@@ -1026,7 +1026,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
            )

            if is_datetime64 and not is_dtype_equal(dtype, _NS_DTYPE):
-                if dtype.name in ("datetime64", "datetime64[ns]"):
+                if dtype.name in ("datetime64", "datetime64[ns]", "datetime64[D]"):


just use dtype.kind == 'M'

Since pandas doesn't support to convert datetime[ps], I used (dtype.kind == 'M') and (dtype.name != 'datetime64[ps]')

jreback · 2019-08-15T12:22:44Z

pandas/core/dtypes/cast.py

@@ -1044,7 +1044,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
                    value = [value]

            elif is_timedelta64 and not is_dtype_equal(dtype, _TD_DTYPE):
-                if dtype.name in ("timedelta64", "timedelta64[ns]"):
+                if dtype.name in ("timedelta64", "timedelta64[ns]", "timedelta64[D]"):


dtype.kind == 'm'

Since pandas doesn't support to convert timedelta[ps], I used (dtype.kind == 'm') and (dtype.name != 'timedelta64[ps]')

jreback · 2019-08-15T12:23:37Z

pandas/tests/dtypes/cast/test_infer_dtype.py

@@ -169,3 +170,14 @@ def test_cast_scalar_to_array(obj, dtype):

    arr = cast_scalar_to_array(shape, obj, dtype=dtype)
    tm.assert_numpy_array_equal(arr, exp)
+
+
+@pytest.mark.parametrize(


its ok to test this i guess, but would need to parameterize on datetime64[D], ns, datetime64 etc and check the result

done. I made all possible datetime formats except [ps] (as mentioned above)

I think @jreback's comment was requesting testing those other frequencies here.

And then assert the result matches too.

result = maybe_cast_to_datetime(obj, dtype) expected = pd.Timestamp(...) # right? assert result == expected

- update doc - update dtype matching fucntion - update test

TomAugspurger

Looks close. A few questions on the tests.

TomAugspurger · 2019-08-16T11:24:57Z

doc/source/whatsnew/v0.25.1.rst

@@ -85,6 +84,7 @@ Indexing
 - Bug in partial-string indexing returning a NumPy array rather than a ``Series`` when indexing with a scalar like ``.loc['2015']`` (:issue:`27516`)
 - Break reference cycle involving :class:`Index` and other index classes to allow garbage collection of index objects without running the GC. (:issue:`27585`, :issue:`27840`)
 - Fix regression in assigning values to a single column of a DataFrame with a ``MultiIndex`` columns (:issue:`27841`).
+- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue: `27395`)


Suggested change

- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue: `27395`)

- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue:`27395`)

I'm not sure if sphinx will complain about the space or not.

Also, I think the backticks on "non-ns datetime" aren't necessary.

TomAugspurger · 2019-08-16T11:28:28Z

pandas/tests/dtypes/cast/test_infer_dtype.py

@@ -169,3 +170,14 @@ def test_cast_scalar_to_array(obj, dtype):

    arr = cast_scalar_to_array(shape, obj, dtype=dtype)
    tm.assert_numpy_array_equal(arr, exp)
+
+
+@pytest.mark.parametrize(


I think @jreback's comment was requesting testing those other frequencies here.

And then assert the result matches too.

result = maybe_cast_to_datetime(obj, dtype) expected = pd.Timestamp(...) # right? assert result == expected

TomAugspurger · 2019-08-16T11:28:58Z

pandas/tests/indexing/test_loc.py

+            }
+        )
+        df.loc[:, "day"] = df.loc[:, "timestamp"].values.astype(dtype)
+


Can you add an assert about the result here? tm.assert_series_equal(df['day'], expected)

While I update the tests, I found a strange result. (Not sure this is intended.)

df = pd.DataFrame({'timestamp': [np.datetime64('2017-01-01 01:02:23'), np.datetime64('2018-11-12 12:34:12')]}) df['year'] = df.loc[:, 'timestamp'].values.astype('datetime64[Y]') df['month'] = df.loc[:, 'timestamp'].values.astype('datetime64[M]') df.loc[:, 'day'] = df.loc[:, 'timestamp'].values.astype('datetime64[D]')

result:

timestamp day month year 0 2017-01-01 01:02:23 2017-01-01 2017-01-01 2017-01-01 1 2018-11-12 12:34:12 2018-11-12 2018-11-01 2018-01-01

astype result is okay.

df.loc[:, 'timestamp'].values.astype('datetime64[Y]')

result:

array(['2017', '2018'], dtype='datetime64[Y]')

If this is intended, then I'll make an assert with this result and remove previous test (pandas/tests/dtypes/cast/test_infer_dtype.py)

Without running the test in master, it isn't obvious to the reader what is being tested. I suggest something like:

@pytest.mark.parametrize("unit", ...) def test_loc_setitem_datetime64_non_ns_converts(self, unit): # GH#27928 a few words about how this used to break df = DataFrame( { "timestamp": [ np.datetime64("2017-02-11 12:41:29"), np.datetime64("1991-11-07 04:22:37"), ] } ) df[unit] = df["timestamp"].values.astype("datetime64[{unit}]".format(unit=unit)) df.loc[:, unit] = df["timestamp"].values.astype("datetime64[{unit}]".format(unit=unit))

jreback · 2019-08-16T12:16:56Z

pandas/core/dtypes/cast.py

@@ -1055,7 +1055,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
                    )

            if is_scalar(value):
-                if value == iNaT or isna(value):
+                if value is iNaT or isna(value):


@jbrockmendel something to address later, we need to separate integer tests here

@inmoonlight is there a reason you changed this?

The integer iNaT is not interned, so this seems dangerous

@jreback
I found this led to the failure in pandas-dev.pandas (Linux py37_np_dev) env. If you all think it is dangerous, I'll put it back.

Restore the original I think. We've had some flaky tests in master recently that we're fixing elsewhere.

@TomAugspurger
Restored it.

jreback · 2019-08-16T12:18:16Z

pandas/core/dtypes/cast.py

@@ -1044,7 +1044,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
                    value = [value]

            elif is_timedelta64 and not is_dtype_equal(dtype, _TD_DTYPE):
-                if dtype.name in ("timedelta64", "timedelta64[ns]"):
+                if (dtype.kind == "m") and (dtype.name != "timedelta64[ps]"):


let's leave [ps] for another issue. I am not sure what we should do about it. we could convert it with precision loss rather than actually raising (which is what I think it would do now)

@jreback
Without this, ci check failed...

Do you remember the CI failure? It may have been a flaky test.

@TomAugspurger
Yes, the failure part was: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/series/test_constructors.py#L1352

…x/27395/loc-assignment

- update docs - remove previous tests - update tests - revert to the changes that made to pass the ci tests

TomAugspurger · 2019-08-26T13:32:28Z

Looks like there are still some CI failures. Do you need help looking into them?

…

On Wed, Aug 21, 2019 at 10:25 AM Jihyung Moon ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/dtypes/cast.py <#27928 (comment)>: > @@ -1055,7 +1055,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"): ) if is_scalar(value): - if value == iNaT or isna(value): + if value is iNaT or isna(value): @TomAugspurger <https://github.com/TomAugspurger> Restored it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27928?email_source=notifications&email_token=AAKAOITQBERVCNCF3RMLKI3QFVM7XA5CNFSM4IL3XFI2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCCIBNBQ#discussion_r316249730>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOISVG63DGWO27U3CKWTQFVM7XANCNFSM4IL3XFIQ> .

inmoonlight · 2019-08-29T13:02:19Z

@TomAugspurger

Desperately...

I looked into several CI issues and found they all raise the same error.

=================================== FAILURES ===================================
_ TestSeriesConstructors.test_constructor_generic_timestamp_bad_frequency[m8[ps]-cannot convert timedeltalike] _
[gw0] linux -- Python 3.7.4 /home/travis/miniconda3/envs/pandas-dev/bin/python
self = <pandas.tests.series.test_constructors.TestSeriesConstructors object at 0x7f2b6f5c2910>
dtype = 'm8[ps]', msg = 'cannot convert timedeltalike'
    @pytest.mark.parametrize(
        "dtype,msg",
        [
            ("m8[ps]", "cannot convert timedeltalike"),
            ("M8[ps]", "cannot convert datetimelike"),
        ],
    )
    def test_constructor_generic_timestamp_bad_frequency(self, dtype, msg):
        # see gh-15524, gh-15987
    
        with pytest.raises(TypeError, match=msg):
>           Series([], dtype=dtype)
E           Failed: DID NOT RAISE <class 'TypeError'>
pandas/tests/series/test_constructors.py:1363: Failed
_ TestSeriesConstructors.test_constructor_generic_timestamp_bad_frequency[M8[ps]-cannot convert datetimelike] _
[gw0] linux -- Python 3.7.4 /home/travis/miniconda3/envs/pandas-dev/bin/python
self = <pandas.tests.series.test_constructors.TestSeriesConstructors object at 0x7f2b6f5df750>
dtype = 'M8[ps]', msg = 'cannot convert datetimelike'
    @pytest.mark.parametrize(
        "dtype,msg",
        [
            ("m8[ps]", "cannot convert timedeltalike"),
            ("M8[ps]", "cannot convert datetimelike"),
        ],
    )
    def test_constructor_generic_timestamp_bad_frequency(self, dtype, msg):
        # see gh-15524, gh-15987
    
        with pytest.raises(TypeError, match=msg):
>           Series([], dtype=dtype)
E           Failed: DID NOT RAISE <class 'TypeError'>
pandas/tests/series/test_constructors.py:1363: Failed

At first glance, I thought pandas does not support [ps] level. This led to 98a5061 .

How could I solve it?

TomAugspurger · 2019-08-30T14:40:52Z

At first glance, I thought pandas does not support [ps] level.

That's correct. I think if the user explicitly requests [ps] we raise, and if we're giving a [ps]-resolution array we convert to [ns] (is that correct @jbrockmendel?)

jbrockmendel · 2019-08-30T22:20:39Z

I think that's right. xref #23571

inmoonlight · 2019-08-31T15:14:35Z

@TomAugspurger

So,,, raising ValueError when allocating [ps] resolution is an issue to be solved...
Which means pandas/tests/series/test_constructors.py:1363 can be removed?

Am I right? 🤔

TomAugspurger · 2019-09-03T16:03:25Z

Which means pandas/tests/series/test_constructors.py:1363 can be removed?

IIUC, I think that should stay. I think these are the expected behaviors

>>> pd.Series([], dtype='datetime64[ps]')                                    # raises
>>> pd.Series(np.array([], dtype='datetime64[ps]'), dtype='datetime64[ps]')  # raises
>>> pd.Series(np.array([], dtype='datetime64[ps]'))                          # OK

inmoonlight · 2019-09-04T12:55:30Z

@TomAugspurger

Thanks for the explanations!
The final modification raises an error on those inputs.
Still more modification required?

TomAugspurger

@jbrockmendel can you take another look here?

TomAugspurger · 2019-09-11T20:22:09Z

pandas/core/dtypes/cast.py

@@ -1026,7 +1026,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
            )

            if is_datetime64 and not is_dtype_equal(dtype, _NS_DTYPE):
-                if dtype.name in ("datetime64", "datetime64[ns]"):
+                if dtype <= np.dtype("M8[ns]"):


What does <= do for numpy dtypes? Check if it's a subtype?

Not clear to me either:

>>> "timedelta64[ns]" < np.dtype("m8[ns]") False >>> "timedelta64[us]" < np.dtype("m8[ns]") True >>> "timedelta64[ps]" > np.dtype("m8[ns]") True >>> "timedelta64" < np.dtype("m8[ns]") True

@TomAugspurger @jbrockmendel
It checks whether the dtype timespan is longer than [ns] or shorter. (FYI, https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-units)

can you add a comment to this effect (in both places)

TomAugspurger · 2019-09-11T20:23:41Z

doc/source/whatsnew/v0.25.2.rst

@@ -49,7 +49,7 @@ Interval
 Indexing
 ^^^^^^^^

-
+- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`)


This will need to go to v1.0.0.rst now, sorry.

jreback · 2019-09-13T13:19:42Z

thanks @inmoonlight

…27928)

inmoonlight added 2 commits August 15, 2019 16:14

Fix issue 27395

ed9d7f0

Fix lint error

49471e7

jreback changed the title ~~Fix issue 27395~~ Assignment of column via .loc for numpy non-ns datetimes Aug 15, 2019

jreback requested changes Aug 15, 2019

View reviewed changes

shoot numpy-dev future warning

9efb169

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Aug 15, 2019

inmoonlight added 3 commits August 15, 2019 22:26

Reflect reviews

41df89d

- update doc - update dtype matching fucntion - update test

Fix ci error

6df540c

Fix dtype match except [ps]

98a5061

TomAugspurger reviewed Aug 16, 2019

View reviewed changes

jreback requested changes Aug 16, 2019

View reviewed changes

TomAugspurger and others added 3 commits August 21, 2019 08:29

Merge remote-tracking branch 'upstream/master' into inmoonlight-hotfi…

299a5b1

…x/27395/loc-assignment

Reflect reviews

cf18fb9

- update docs - remove previous tests - update tests - revert to the changes that made to pass the ci tests

Fix conflicts

09ab203

Fix CI error

93c2f85

inmoonlight added 3 commits August 31, 2019 22:07

Fix conficts

ad771f5

Fix CI error

35009b7

Fix CI error

6a48ca6

TomAugspurger reviewed Sep 11, 2019

View reviewed changes

inmoonlight added 4 commits September 12, 2019 21:37

Merge branch 'master' into hotfix/27395/loc-assignment

063cc37

Update whatsnew

d64a2d8

Add comments

86fd1c1

Merge branch 'master' into hotfix/27395/loc-assignment

2ddbc01

inmoonlight added 2 commits September 13, 2019 11:32

Fix CI error

48e454a

Fix CI error

c8eb91f

jreback added this to the 1.0 milestone Sep 13, 2019

jreback approved these changes Sep 13, 2019

View reviewed changes

jreback merged commit 810fa77 into pandas-dev:master Sep 13, 2019

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

Assignment of column via .loc for numpy non-ns datetimes (pandas-dev#…

afb8682

…27928)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

Assignment of column via .loc for numpy non-ns datetimes (pandas-dev#…

ba1242b

…27928)

	- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue: `27395`)
	- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue:`27395`)

Assignment of column via .loc for numpy non-ns datetimes #27928

Assignment of column via .loc for numpy non-ns datetimes #27928

Conversation

inmoonlight commented Aug 15, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inmoonlight Aug 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inmoonlight Aug 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inmoonlight Aug 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inmoonlight Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

TomAugspurger commented Aug 26, 2019 via email

inmoonlight commented Aug 29, 2019

TomAugspurger commented Aug 30, 2019

jbrockmendel commented Aug 30, 2019

inmoonlight commented Aug 31, 2019

TomAugspurger commented Sep 3, 2019

inmoonlight commented Sep 4, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 13, 2019

inmoonlight Aug 16, 2019 •

edited

Loading

inmoonlight Aug 17, 2019 •

edited

Loading

inmoonlight Aug 17, 2019 •

edited

Loading

inmoonlight Aug 21, 2019 •

edited

Loading