-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TYP: rename FrameOrSeries to NDFrameT #43752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/computation/align.py
Outdated
@@ -49,7 +49,7 @@ def _align_core_single_unary_op( | |||
|
|||
|
|||
def _zip_axes_from_type( | |||
typ: type[FrameOrSeries], new_axes: Sequence[Index] | |||
typ: type[NDFrameT], new_axes: Sequence[Index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unbound typevar? pyright not picking this up. can just use Type[DataFrame | Series]
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that pyright has some issues with nested TypeVar(bound=...)
. I will create an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced it with Type[NDFrame]
. Similar to #42367, I can try to replace NDFrame
with DataFrame | Series
in private functions.
List
, Tuple
, ... are invariant (Type
is not) which creates an interesting usecase for TypeVar(bound=...)
to still allow sub-classes, see this example microsoft/pyright#2344 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced it with
Type[NDFrame]
. Similar to #42367, I can try to replaceNDFrame
withDataFrame | Series
in private functions.
isn't it, replace NDFrame
with DataFrame | Series
in public functions? Although could also do this for private functions too and cast at the call site if needed.
List
,Tuple
, ... are invariant (Type
is not) which creates an interesting usecase forTypeVar(bound=...)
to still allow sub-classes, see this example microsoft/pyright#2344 (comment).
I'm not sure what the pyright error is, but mypy gives
/home/simon/t1.py:25: error: Argument 1 to "func1" has incompatible type "List[Bear]"; expected "List[Animal]" [arg-type]
/home/simon/t1.py:25: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
/home/simon/t1.py:25: note: Consider using "Sequence" instead, which is covariant
The suggested fix is in line with https://github.com/python/typeshed/blob/master/CONTRIBUTING.md#conventions ..
avoid invariant collection types (list, dict) in argument positions, in favor of covariant types like Mapping or Sequence
I'm not sure how many functions we have that mutate the passed parameters where we can't do this.
We also have cases where where we have/will need overloads for tuple in indexing functions, so it may become necessary to use invariant collection types in argument positions in the overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it, replace
NDFrame
withDataFrame | Series
in public functions? Although could also do this for private functions too and cast at the call site if needed.
Yes, public functions that are not meant to take any NDFrame. I will do that later today.
I would be fine with using TypeVar in an invariant container (List) instead of using the covariant container (Sequence) with NDFrame. In both cases, it would probably be good to have a brief comment.
I'm not sure what the pyright error is, but mypy gives
error: Argument of type "list[Bear]" cannot be assigned to parameter "a" of type "List[Animal]" in function "func1"
TypeVar "_T@list" is invariant
"Bear" is incompatible with "Animal" (reportGeneralTypeIssues)
reportGeneralTypeIssues is currently disabled for pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @twoertwein lgtm ex the places you identified where a typevar is not appropriate.
@@ -91,7 +91,7 @@ def describe_ndframe( | |||
) | |||
|
|||
result = describer.describe(percentiles=percentiles) | |||
return cast(FrameOrSeries, result) | |||
return cast(NDFrameT, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is either a bug in the code or the NDFrameT
typevar is not appropriate here...
import pandas as pd
print(pd.__version__)
print(type(pd.DataFrame(columns=["a"]).describe()))
print(type(pd.Series(dtype="object").describe()))
class DataFrameSubClass(pd.DataFrame):
pass
print(type(DataFrameSubClass(columns=["a"]).describe()))
class SeriesSubClass(pd.Series):
pass
print(type(SeriesSubClass(dtype="object").describe()))
1.4.0.dev0+768.g7dd22546bc
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
although this could be a use case for the other typevar constructor (without bound=...
, where the types are not preserved other than the type listed in the TypeVar constructor...
import pandas as pd
from typing import TypeVar
NDFrameT = TypeVar("NDFrameT", bound=pd.core.generic.NDFrame)
FrameOrSeriesT = TypeVar("FrameOrSeriesT", pd.Series, pd.DataFrame)
class DataFrameSubClass(pd.DataFrame):
pass
class SeriesSubClass(pd.Series):
pass
def describe(obj: NDFrameT) -> NDFrameT:
pass
reveal_type(describe(pd.DataFrame()))
reveal_type(describe(pd.Series()))
reveal_type(describe(DataFrameSubClass()))
reveal_type(describe(SeriesSubClass()))
def describe_alt(obj: FrameOrSeriesT) -> FrameOrSeriesT:
pass
reveal_type(describe_alt(pd.DataFrame()))
reveal_type(describe_alt(pd.Series()))
reveal_type(describe_alt(DataFrameSubClass()))
reveal_type(describe_alt(SeriesSubClass()))
/home/simon/t1.py:17: note: Revealed type is "pandas.core.frame.DataFrame*"
/home/simon/t1.py:18: note: Revealed type is "pandas.core.series.Series*"
/home/simon/t1.py:19: note: Revealed type is "t1.DataFrameSubClass*"
/home/simon/t1.py:20: note: Revealed type is "t1.SeriesSubClass*"
/home/simon/t1.py:25: note: Revealed type is "pandas.core.frame.DataFrame*"
/home/simon/t1.py:26: note: Revealed type is "pandas.core.series.Series*"
/home/simon/t1.py:27: note: Revealed type is "pandas.core.frame.DataFrame*"
/home/simon/t1.py:28: note: Revealed type is "pandas.core.series.Series*"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body of describe_ndframe
makes it clear that it does not work with any NDFrame: Series-like is treated as a Series and DataFrame-like is treated as a DataFrame.
The TypeVar is "correct" - replacing print(type()) in your first example with reveal_type:
bar.py:3: note: Revealed type is "pandas.core.frame.DataFrame*"
bar.py:5: note: Revealed type is "pandas.core.series.Series*"
bar.py:12: note: Revealed type is "bar.DataFrameSubClass*"
bar.py:19: note: Revealed type is "bar.SeriesSubClass*"
I think that NDFrameT
is probably still the best annotation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FrameOrSeriesT
would be better, until describe is generic enough (if that is a goal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't make much of a difference: TypeVar is covariant by default, it would suggest that sub-classes are also handled correctly (which they are not in the case of describe). Could make an invariant TypeVar or leave as-is?
thanks @twoertwein |
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
but the current name impliespd.DataFrame | pd.Series
.