Skip to content

ConnectionResetError from multiprocess sampling #7354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fonnesbeck opened this issue Jun 12, 2024 · 0 comments
Open

ConnectionResetError from multiprocess sampling #7354

fonnesbeck opened this issue Jun 12, 2024 · 0 comments
Labels

Comments

@fonnesbeck
Copy link
Member

fonnesbeck commented Jun 12, 2024

Describe the issue:

This has come up in the past (#6852, #4167) and has now started cropping up again. Multiprocess sampling will fail sometime during sampling with a ConnectionResetError. Most recently, it has been happening to me on Linux (Fedora).

A workaround is to simply change the random number seed of the sampler, and it usually runs.

Details below.

Reproduceable code example:

Seems to be stochastic, so hard to reproduce.

Error message:

---------------------------------------------------------------------------
ConnectionResetError                      Traceback (most recent call last)
Cell In[28], [line 2](vscode-notebook-cell:?execution_count=28&line=2)
      [1](vscode-notebook-cell:?execution_count=28&line=1) with ad_spend_model:
----> [2](vscode-notebook-cell:?execution_count=28&line=2)     ptrace = pm.sample(100, chains=6, cores=4, random_seed=random_seed)

File ~/repos/pymc/pymc/sampling/mcmc.py:841, in sample(draws, tune, chains, cores, random_seed, progressbar, progressbar_theme, step, var_names, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, blas_cores, model, **kwargs)
    [839](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:839) _print_step_hierarchy(step)
    [840](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:840) try:
--> [841](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:841)     _mp_sample(**sample_args, **parallel_args)
    [842](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:842) except pickle.PickleError:
    [843](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:843)     _log.warning("Could not pickle model, sampling singlethreaded.")

File ~/repos/pymc/pymc/sampling/mcmc.py:1254, in _mp_sample(draws, tune, step, chains, cores, random_seed, start, progressbar, progressbar_theme, traces, model, callback, blas_cores, mp_ctx, **kwargs)
   [1252](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1252) try:
   [1253](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1253)     with sampler:
-> [1254](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1254)         for draw in sampler:
   [1255](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1255)             strace = traces[draw.chain]
   [1256](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1256)             strace.record(draw.point, draw.stats)

File ~/repos/pymc/pymc/sampling/parallel.py:471, in ParallelSampler.__iter__(self)
    [464](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:464) task = progress.add_task(
    [465](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:465)     self._desc.format(self),
    [466](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:466)     completed=self._completed_draws,
    [467](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:467)     total=self._total_draws,
    [468](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:468) )
    [470](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:470) while self._active:
--> [471](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:471)     draw = ProcessAdapter.recv_draw(self._active)
    [472](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:472)     proc, is_last, draw, tuning, stats = draw
    [473](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:473)     self._completed_draws += 1

File ~/repos/pymc/pymc/sampling/parallel.py:328, in ProcessAdapter.recv_draw(processes, timeout)
    [326](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:326) idxs = {id(proc._msg_pipe): proc for proc in processes}
    [327](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:327) proc = idxs[id(ready[0])]
--> [328](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:328) msg = ready[0].recv()
    [330](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:330) if msg[0] == "error":
    [331](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:331)     old_error = msg[1]

File ~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:250, in _ConnectionBase.recv(self)
    [248](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:248) self._check_closed()
    [249](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:249) self._check_readable()
--> [250](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:250) buf = self._recv_bytes()
    [251](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:251) return _ForkingPickler.loads(buf.getbuffer())

File ~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:430, in Connection._recv_bytes(self, maxsize)
    [429](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:429) def _recv_bytes(self, maxsize=None):
--> [430](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:430)     buf = self._recv(4)
    [431](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:431)     size, = struct.unpack("!i", buf.getvalue())
    [432](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:432)     if size == -1:

File ~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:395, in Connection._recv(self, size, read)
    [393](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:393) remaining = size
    [394](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:394) while remaining > 0:
--> [395](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:395)     chunk = read(handle, remaining)
    [396](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:396)     n = len(chunk)
    [397](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:397)     if n == 0:

ConnectionResetError: [Errno 104] Connection reset by peer


### PyMC version information:

Python version       : 3.12.3
pymc      : 5.15.1+17.g508a1341f.dirty
pytensor  : 2.22.1

### Context for the issue:

_No response_
@fonnesbeck fonnesbeck added the bug label Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant