Skip to content

DRAFT: Support online operations in windowing operations #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mroeschke opened this issue May 17, 2021 · 2 comments
Closed

DRAFT: Support online operations in windowing operations #50

mroeschke opened this issue May 17, 2021 · 2 comments

Comments

@mroeschke
Copy link
Collaborator

mroeschke commented May 17, 2021

Problem

Currently calculating the the windowing aggregation of several pandas objects requires concatenating them together and returning the entire result

pd.concat([df1, df2, df3, ...], axis=1).rolling(10).mean()

By holding the latest state of a windowing aggregation, it is possible to calculate the above more efficiently (and flexibly) in an online fashion, e.g.

roll = df1.rolling(10)
roll.mean()
roll.mean(update=df2)
roll.mean(update=df3)

Aiming to eventually target:

rolling/expanding/ewm
groupby.rolling/groupby.expanding/groupby.ewm

As windowing operations are adapting numba this would be a numba only feature.

Important API features

  1. Independent state sharing (e.g. an update provided to rolling mean should not affect the state of rolling variance)
  2. Independent numba engine configuration (e.g. a user should be able to have rolling mean call nogil=True while also having rolling variance call nogil=False)

Proposed API options (example with ExponentialMovingWindow)

  1. ExponentialMovingWindow automatically hold state of an aggregation call, and an update keyword exists on the aggregation methods to pass in updated data
In [1]: s = pd.Series([0, 1, 2, np.nan, 4])

In [2]: ewm = s.head(1).ewm(com=0.5)

# Prime the first resulting state
In [3]: ewm.mean(engine="numba")
Out[3]:
0    0.0
dtype: float64

In [4]: ewm.mean(engine="numba", update=s.iloc[1:4])
Out[4]:
1    0.750000
2    1.615385
3    1.615385
dtype: float64

In [5]: ewm.mean(engine="numba", update=s.tail(1))
Out[5]:
1    3.670213
dtype: float64

# Clear the latest ewm state
In [6]: ewm.reset_update()

# Re-prime the first resulting state
In [7]: ewm.mean(engine="numba")
Out[7]:
0    0.0
dtype: float64

In [8]: ewm.mean(engine="numba", update=s.tail(4))
Out[8]:
1    0.750000
2    1.615385
3    1.615385
4    3.670213
dtype: float64
  1. A new online method in ewm that makes a new OnlineExponentialMovingWindow object. The aggregations methods on this object then support an update keyword.
In [6]: s = pd.Series([0, 1, 2, np.nan, 4])

# This will raise if numba is not installed
In [7]: ewm_online = s.head(1).ewm(com=0.5).online()

In [8]: ewm_online
Out[8]: OnlineExponentialMovingWindow [com=0.5,min_periods=1,adjust=True,ignore_na=False,axis=0]

# Prime the first resulting state
In [10]: ewm_online.mean()
Out[10]:
0    0.0
dtype: float64

In [11]: ewm_online.mean(update=s.iloc[1:4])
Out[11]:
1    0.750000
2    1.615385
3    1.615385
dtype: float64

# Clear the latest ewm state
In [12]: ewm_online.reset()

# Re-prime the first resulting state
In [13]: ewm_online.mean()
Out[13]:
0    0.0
dtype: float64

In [14]: ewm_online.mean(update=s.tail(4))
Out[14]:
1    0.750000
2    1.615385
3    1.615385
4    3.670213
dtype: float64

Considered API Options

  1. Create a new object per windowing method + aggregation, but this would lead to too many individual classes
  2. Use streamz as an optional dependency for online calculations. However, streamz has a full API workflow adopted for this operation which may be difficult to reuse. Also, we would ideally want this to start as a numba accelerated operation
@jreback
Copy link
Collaborator

jreback commented May 17, 2021

ewm_online = s.head(1).ewm(com=0.5).online(name='foo_mean')

ewm_mean_online = s.head(1).ewm(com=0.5).online()

@mroeschke
Copy link
Collaborator Author

closed by pandas-dev#41888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants