Skip to content

TYP: rename FrameOrSeries to NDFrameT #43752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 28, 2021
Merged

TYP: rename FrameOrSeries to NDFrameT #43752

merged 7 commits into from
Sep 28, 2021

Conversation

twoertwein
Copy link
Member

FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame") but the current name implies pd.DataFrame | pd.Series.

@jreback jreback added the Typing type annotations, mypy/pyright type checking label Sep 26, 2021
@jreback jreback added this to the 1.4 milestone Sep 26, 2021
@@ -49,7 +49,7 @@ def _align_core_single_unary_op(


def _zip_axes_from_type(
typ: type[FrameOrSeries], new_axes: Sequence[Index]
typ: type[NDFrameT], new_axes: Sequence[Index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unbound typevar? pyright not picking this up. can just use Type[DataFrame | Series] here?

Copy link
Member Author

@twoertwein twoertwein Sep 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that pyright has some issues with nested TypeVar(bound=...). I will create an issue.

microsoft/pyright#2344

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced it with Type[NDFrame]. Similar to #42367, I can try to replace NDFrame with DataFrame | Series in private functions.

List, Tuple, ... are invariant (Type is not) which creates an interesting usecase for TypeVar(bound=...) to still allow sub-classes, see this example microsoft/pyright#2344 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced it with Type[NDFrame]. Similar to #42367, I can try to replace NDFrame with DataFrame | Series in private functions.

isn't it, replace NDFrame with DataFrame | Series in public functions? Although could also do this for private functions too and cast at the call site if needed.

List, Tuple, ... are invariant (Type is not) which creates an interesting usecase for TypeVar(bound=...) to still allow sub-classes, see this example microsoft/pyright#2344 (comment).

I'm not sure what the pyright error is, but mypy gives

/home/simon/t1.py:25: error: Argument 1 to "func1" has incompatible type "List[Bear]"; expected "List[Animal]"  [arg-type]
/home/simon/t1.py:25: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
/home/simon/t1.py:25: note: Consider using "Sequence" instead, which is covariant

The suggested fix is in line with https://github.com/python/typeshed/blob/master/CONTRIBUTING.md#conventions ..

avoid invariant collection types (list, dict) in argument positions, in favor of covariant types like Mapping or Sequence

I'm not sure how many functions we have that mutate the passed parameters where we can't do this.

We also have cases where where we have/will need overloads for tuple in indexing functions, so it may become necessary to use invariant collection types in argument positions in the overload.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it, replace NDFrame with DataFrame | Series in public functions? Although could also do this for private functions too and cast at the call site if needed.

Yes, public functions that are not meant to take any NDFrame. I will do that later today.

I would be fine with using TypeVar in an invariant container (List) instead of using the covariant container (Sequence) with NDFrame. In both cases, it would probably be good to have a brief comment.

I'm not sure what the pyright error is, but mypy gives

error: Argument of type "list[Bear]" cannot be assigned to parameter "a" of type "List[Animal]" in function "func1"
    TypeVar "_T@list" is invariant
      "Bear" is incompatible with "Animal" (reportGeneralTypeIssues)

reportGeneralTypeIssues is currently disabled for pandas.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein lgtm ex the places you identified where a typevar is not appropriate.

@@ -91,7 +91,7 @@ def describe_ndframe(
)

result = describer.describe(percentiles=percentiles)
return cast(FrameOrSeries, result)
return cast(NDFrameT, result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is either a bug in the code or the NDFrameT typevar is not appropriate here...

import pandas as pd
print(pd.__version__)

print(type(pd.DataFrame(columns=["a"]).describe()))

print(type(pd.Series(dtype="object").describe()))

class DataFrameSubClass(pd.DataFrame):
    pass

print(type(DataFrameSubClass(columns=["a"]).describe()))

class SeriesSubClass(pd.Series):
    pass

print(type(SeriesSubClass(dtype="object").describe()))
1.4.0.dev0+768.g7dd22546bc
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

although this could be a use case for the other typevar constructor (without bound=..., where the types are not preserved other than the type listed in the TypeVar constructor...

import pandas as pd
from typing import TypeVar

NDFrameT = TypeVar("NDFrameT", bound=pd.core.generic.NDFrame)

FrameOrSeriesT = TypeVar("FrameOrSeriesT", pd.Series, pd.DataFrame)

class DataFrameSubClass(pd.DataFrame):
    pass

class SeriesSubClass(pd.Series):
    pass

def describe(obj: NDFrameT) -> NDFrameT:
    pass

reveal_type(describe(pd.DataFrame()))
reveal_type(describe(pd.Series()))
reveal_type(describe(DataFrameSubClass()))
reveal_type(describe(SeriesSubClass()))

def describe_alt(obj: FrameOrSeriesT) -> FrameOrSeriesT:
    pass

reveal_type(describe_alt(pd.DataFrame()))
reveal_type(describe_alt(pd.Series()))
reveal_type(describe_alt(DataFrameSubClass()))
reveal_type(describe_alt(SeriesSubClass()))
/home/simon/t1.py:17: note: Revealed type is "pandas.core.frame.DataFrame*"
/home/simon/t1.py:18: note: Revealed type is "pandas.core.series.Series*"
/home/simon/t1.py:19: note: Revealed type is "t1.DataFrameSubClass*"
/home/simon/t1.py:20: note: Revealed type is "t1.SeriesSubClass*"
/home/simon/t1.py:25: note: Revealed type is "pandas.core.frame.DataFrame*"
/home/simon/t1.py:26: note: Revealed type is "pandas.core.series.Series*"
/home/simon/t1.py:27: note: Revealed type is "pandas.core.frame.DataFrame*"
/home/simon/t1.py:28: note: Revealed type is "pandas.core.series.Series*"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of describe_ndframe makes it clear that it does not work with any NDFrame: Series-like is treated as a Series and DataFrame-like is treated as a DataFrame.

The TypeVar is "correct" - replacing print(type()) in your first example with reveal_type:
bar.py:3: note: Revealed type is "pandas.core.frame.DataFrame*"
bar.py:5: note: Revealed type is "pandas.core.series.Series*"
bar.py:12: note: Revealed type is "bar.DataFrameSubClass*"
bar.py:19: note: Revealed type is "bar.SeriesSubClass*"

I think that NDFrameT is probably still the best annotation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FrameOrSeriesT would be better, until describe is generic enough (if that is a goal)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't make much of a difference: TypeVar is covariant by default, it would suggest that sub-classes are also handled correctly (which they are not in the case of describe). Could make an invariant TypeVar or leave as-is?

@jreback jreback merged commit 4988d6e into pandas-dev:master Sep 28, 2021
@jreback
Copy link
Contributor

jreback commented Sep 28, 2021

thanks @twoertwein

@twoertwein twoertwein deleted the NDFrameT branch June 8, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants