Skip to content

BUG: PyMC 5.7.2 OOM - memory leak #6852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danjenson opened this issue Aug 11, 2023 · 3 comments
Closed

BUG: PyMC 5.7.2 OOM - memory leak #6852

danjenson opened this issue Aug 11, 2023 · 3 comments
Labels

Comments

@danjenson
Copy link

danjenson commented Aug 11, 2023

Describe the issue:

Process memory grows steadily until it consumes all available memory (and swap). Replicated on linux and M1 Mac. Note that the default 'fork' for multiprocessing on linux fails immediately before it even begins sampling with Errno 12 OOM.

PYMC version: 5.7.2

Linux system:

  • Void Linux
  • Kernel 6.3.12_1
  • 64 GB DDR5 RAM
  • 24 GB RTX 4090 GPU
  • AMD Ryzen 9 7950X 16 core, 32 threads

Mac System:

  • 16 GB memory
  • 8 Cores

Dataset: ~161 mb total.

Reproduceable code example:

#!/usr/bin/env python3
import numpy as np
import pandas as pd
import pymc as pm


def pymc_bayes(df: pd.DataFrame):
    a, b, c, i = df.a.values, df.b.values, df.c.values, df.i.values
    n_i = int(i.max() + 1)
    with pm.Model() as m:
        alpha = pm.Normal("alpha", 0, 1, shape=[n_i])
        beta_b = pm.HalfNormal("beta_b", 1)
        beta_c = pm.HalfNormal("beta_c", 1)
        beta_int = pm.Normal("beta_int", 0, 1)
        mu = pm.Deterministic(
            "mu", alpha[i] + beta_b * b + beta_c * c + beta_int * b * c
        )
        sigma = pm.Exponential("sigma", 1)
        a_hat = pm.Normal("a_hat", mu, sigma, observed=a)
        idata = pm.sample(mp_ctx="spawn")  # fork fails immediately with OOM
        idata.to_netcdf("pymc_bayes.nc")
    print("finished!")


if __name__ == "__main__":
    n, n_int = 2618018, 17  # to match the real dataset I care about
    df = pd.DataFrame(np.random.randn(n, 3), columns=['a', 'b', 'c'])
    df['i'] = np.random.randint(0, n_int, size=n)
    pymc_bayes(df)

Error message:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta_b, beta_c, beta_int, sigma]
Process worker_chain_2:███████████████████████---------------| 76.14% [6091/8000 18:42<05:51 Sampling 4 chains, 0 divergences]s]
Process worker_chain_3:
Process worker_chain_0:
Process worker_chain_1:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
    self._start_loop()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
    msg = self._recv_msg()
          ^^^^^^^^^^^^^^^^
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
    return self._msg_pipe.recv()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
    chunk = read(handle, remaining)
            ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
    self._start_loop()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
    msg = self._recv_msg()
          ^^^^^^^^^^^^^^^^
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
    return self._msg_pipe.recv()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
    chunk = read(handle, remaining)
            ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
    self._start_loop()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
    msg = self._recv_msg()
          ^^^^^^^^^^^^^^^^
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
    return self._msg_pipe.recv()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
    chunk = read(handle, remaining)
            ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
    self._start_loop()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
    msg = self._recv_msg()
          ^^^^^^^^^^^^^^^^
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
    return self._msg_pipe.recv()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
    chunk = read(handle, remaining)
            ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
    _Process(*args).run()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
    self._msg_pipe.send(("error", e))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
    n = write(self._handle, buf)
        ^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
    _Process(*args).run()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
    self._msg_pipe.send(("error", e))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
    n = write(self._handle, buf)
        ^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
    _Process(*args).run()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
    self._msg_pipe.send(("error", e))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
    n = write(self._handle, buf)
        ^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
    _Process(*args).run()
  File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
    self._msg_pipe.send(("error", e))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
    self._send(header + buf)
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
    n = write(self._handle, buf)
        ^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe

PyMC version information:

PYMC 5.7.2
Aesara 2.9.1
PyTensor 2.14.2

uname -a: Linux ghost 6.3.13_1 #1 SMP PREEMPT_DYNAMIC Tue Jul 25 00:19:40 UTC 2023 x86_64 GNU/Linux

Context for the issue:

This is a simple linear model with an interaction term, although I couldn't get it to work without OOM even with two covariates.

@danjenson danjenson added the bug label Aug 11, 2023
@welcome
Copy link

welcome bot commented Aug 11, 2023

Welcome Banner
🎉 Welcome to PyMC! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 11, 2023

Hi

That's a very big deterministic to store in every iteration (as big as your dataset). Does the problem go away if you remove it?

You can still use the same expression, just don't wrap it in a deterministic.

Also can you try with a single chain so we see the traceback of the error?

@danjenson
Copy link
Author

It was the Deterministic -- I didn't realize this stores every value for every iteration (multiplied by the number of processes). Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants