-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Assignment of column via .loc for numpy non-ns datetimes #27928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment of column via .loc for numpy non-ns datetimes #27928
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the most important types of tests are replicating the OP indexing issue. put in pandas/tests/indexing/test_loc.py (see the setting tests)
doc/source/whatsnew/v0.25.1.rst
Outdated
@@ -32,7 +32,7 @@ Categorical | |||
Datetimelike | |||
^^^^^^^^^^^^ | |||
- Bug in :func:`to_datetime` where passing a timezone-naive :class:`DatetimeArray` or :class:`DatetimeIndex` and ``utc=True`` would incorrectly return a timezone-naive result (:issue:`27733`) | |||
- | |||
- Bug in :func:`maybe_cast_to_datetime` where converting a `np.datetime64` to `datetime64[D]` raise `TypeError` (:issue: `27395`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a private function, you would have to indicate the user facing experience here instead (e.g. the assignment)
pandas/core/dtypes/cast.py
Outdated
@@ -1026,7 +1026,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"): | |||
) | |||
|
|||
if is_datetime64 and not is_dtype_equal(dtype, _NS_DTYPE): | |||
if dtype.name in ("datetime64", "datetime64[ns]"): | |||
if dtype.name in ("datetime64", "datetime64[ns]", "datetime64[D]"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use dtype.kind == 'M'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pandas
doesn't support to convert datetime[ps]
, I used (dtype.kind == 'M') and (dtype.name != 'datetime64[ps]')
pandas/core/dtypes/cast.py
Outdated
@@ -1044,7 +1044,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"): | |||
value = [value] | |||
|
|||
elif is_timedelta64 and not is_dtype_equal(dtype, _TD_DTYPE): | |||
if dtype.name in ("timedelta64", "timedelta64[ns]"): | |||
if dtype.name in ("timedelta64", "timedelta64[ns]", "timedelta64[D]"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype.kind == 'm'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pandas
doesn't support to convert timedelta[ps]
, I used (dtype.kind == 'm') and (dtype.name != 'timedelta64[ps]')
@@ -169,3 +170,14 @@ def test_cast_scalar_to_array(obj, dtype): | |||
|
|||
arr = cast_scalar_to_array(shape, obj, dtype=dtype) | |||
tm.assert_numpy_array_equal(arr, exp) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its ok to test this i guess, but would need to parameterize on datetime64[D], ns, datetime64 etc and check the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I made all possible datetime formats except [ps] (as mentioned above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jreback's comment was requesting testing those other frequencies here.
And then assert the result matches too.
result = maybe_cast_to_datetime(obj, dtype)
expected = pd.Timestamp(...) # right?
assert result == expected
- update doc - update dtype matching fucntion - update test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks close. A few questions on the tests.
doc/source/whatsnew/v0.25.1.rst
Outdated
@@ -85,6 +84,7 @@ Indexing | |||
- Bug in partial-string indexing returning a NumPy array rather than a ``Series`` when indexing with a scalar like ``.loc['2015']`` (:issue:`27516`) | |||
- Break reference cycle involving :class:`Index` and other index classes to allow garbage collection of index objects without running the GC. (:issue:`27585`, :issue:`27840`) | |||
- Fix regression in assigning values to a single column of a DataFrame with a ``MultiIndex`` columns (:issue:`27841`). | |||
- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue: `27395`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue: `27395`) | |
- Fix assignment of column via `.loc` with numpy `non-ns datetime` type (:issue:`27395`) |
I'm not sure if sphinx will complain about the space or not.
Also, I think the backticks on "non-ns datetime" aren't necessary.
@@ -169,3 +170,14 @@ def test_cast_scalar_to_array(obj, dtype): | |||
|
|||
arr = cast_scalar_to_array(shape, obj, dtype=dtype) | |||
tm.assert_numpy_array_equal(arr, exp) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jreback's comment was requesting testing those other frequencies here.
And then assert the result matches too.
result = maybe_cast_to_datetime(obj, dtype)
expected = pd.Timestamp(...) # right?
assert result == expected
} | ||
) | ||
df.loc[:, "day"] = df.loc[:, "timestamp"].values.astype(dtype) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an assert about the result here? tm.assert_series_equal(df['day'], expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I update the tests, I found a strange result. (Not sure this is intended.)
df = pd.DataFrame({'timestamp': [np.datetime64('2017-01-01 01:02:23'), np.datetime64('2018-11-12 12:34:12')]})
df['year'] = df.loc[:, 'timestamp'].values.astype('datetime64[Y]')
df['month'] = df.loc[:, 'timestamp'].values.astype('datetime64[M]')
df.loc[:, 'day'] = df.loc[:, 'timestamp'].values.astype('datetime64[D]')
result:
timestamp day month year
0 2017-01-01 01:02:23 2017-01-01 2017-01-01 2017-01-01
1 2018-11-12 12:34:12 2018-11-12 2018-11-01 2018-01-01
astype
result is okay.
df.loc[:, 'timestamp'].values.astype('datetime64[Y]')
result:
array(['2017', '2018'], dtype='datetime64[Y]')
If this is intended, then I'll make an assert with this result and remove previous test (pandas/tests/dtypes/cast/test_infer_dtype.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without running the test in master, it isn't obvious to the reader what is being tested. I suggest something like:
@pytest.mark.parametrize("unit", ...)
def test_loc_setitem_datetime64_non_ns_converts(self, unit):
# GH#27928 a few words about how this used to break
df = DataFrame(
{
"timestamp": [
np.datetime64("2017-02-11 12:41:29"),
np.datetime64("1991-11-07 04:22:37"),
]
}
)
df[unit] = df["timestamp"].values.astype("datetime64[{unit}]".format(unit=unit))
df.loc[:, unit] = df["timestamp"].values.astype("datetime64[{unit}]".format(unit=unit))
pandas/core/dtypes/cast.py
Outdated
@@ -1055,7 +1055,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"): | |||
) | |||
|
|||
if is_scalar(value): | |||
if value == iNaT or isna(value): | |||
if value is iNaT or isna(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel something to address later, we need to separate integer tests here
@inmoonlight is there a reason you changed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integer iNaT is not interned, so this seems dangerous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
I found this led to the failure in pandas-dev.pandas (Linux py37_np_dev)
env. If you all think it is dangerous, I'll put it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore the original I think. We've had some flaky tests in master recently that we're fixing elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger
Restored it.
pandas/core/dtypes/cast.py
Outdated
@@ -1044,7 +1044,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"): | |||
value = [value] | |||
|
|||
elif is_timedelta64 and not is_dtype_equal(dtype, _TD_DTYPE): | |||
if dtype.name in ("timedelta64", "timedelta64[ns]"): | |||
if (dtype.kind == "m") and (dtype.name != "timedelta64[ps]"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave [ps] for another issue. I am not sure what we should do about it. we could convert it with precision loss rather than actually raising (which is what I think it would do now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
Without this, ci check failed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember the CI failure? It may have been a flaky test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…x/27395/loc-assignment
- update docs - remove previous tests - update tests - revert to the changes that made to pass the ci tests
Looks like there are still some CI failures. Do you need help looking into
them?
…On Wed, Aug 21, 2019 at 10:25 AM Jihyung Moon ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/dtypes/cast.py
<#27928 (comment)>:
> @@ -1055,7 +1055,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"):
)
if is_scalar(value):
- if value == iNaT or isna(value):
+ if value is iNaT or isna(value):
@TomAugspurger <https://github.com/TomAugspurger>
Restored it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27928?email_source=notifications&email_token=AAKAOITQBERVCNCF3RMLKI3QFVM7XA5CNFSM4IL3XFI2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCCIBNBQ#discussion_r316249730>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOISVG63DGWO27U3CKWTQFVM7XANCNFSM4IL3XFIQ>
.
|
Desperately... I looked into several CI issues and found they all raise the same error.
At first glance, I thought pandas does not support [ps] level. This led to 98a5061 . How could I solve it? |
That's correct. I think if the user explicitly requests |
I think that's right. xref #23571 |
So,,, raising ValueError when allocating [ps] resolution is an issue to be solved... Am I right? 🤔 |
IIUC, I think that should stay. I think these are the expected behaviors >>> pd.Series([], dtype='datetime64[ps]') # raises
>>> pd.Series(np.array([], dtype='datetime64[ps]'), dtype='datetime64[ps]') # raises
>>> pd.Series(np.array([], dtype='datetime64[ps]')) # OK |
Thanks for the explanations! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel can you take another look here?
@@ -1026,7 +1026,7 @@ def maybe_cast_to_datetime(value, dtype, errors="raise"): | |||
) | |||
|
|||
if is_datetime64 and not is_dtype_equal(dtype, _NS_DTYPE): | |||
if dtype.name in ("datetime64", "datetime64[ns]"): | |||
if dtype <= np.dtype("M8[ns]"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does <=
do for numpy dtypes? Check if it's a subtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me either:
>>> "timedelta64[ns]" < np.dtype("m8[ns]")
False
>>> "timedelta64[us]" < np.dtype("m8[ns]")
True
>>> "timedelta64[ps]" > np.dtype("m8[ns]")
True
>>> "timedelta64" < np.dtype("m8[ns]")
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger @jbrockmendel
It checks whether the dtype timespan is longer than [ns]
or shorter. (FYI, https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-units)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment to this effect (in both places)
doc/source/whatsnew/v0.25.2.rst
Outdated
@@ -49,7 +49,7 @@ Interval | |||
Indexing | |||
^^^^^^^^ | |||
|
|||
- | |||
- Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to go to v1.0.0.rst now, sorry.
thanks @inmoonlight |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff