-
Notifications
You must be signed in to change notification settings - Fork 898
Attempt to free memory that is still in use by an ongoing MPI communication #3268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Update: this has triggered today again. |
I always observe this with allgather2 test (but it's not 100% reproducible), so in conjunction with the time it first appeared and the commit history, I can say that this may be related to #3159 |
These MTT runs were done with vanilla Open MPI, no UCX, MXM, etc. |
From web-ex 04/11 - Not 100% reproducible, so lets keep this issue open a bit longer. |
still reproducible: |
Artem -- don't forget that you can click on the "Absolute date range" link on the right in MTT and get a short link: https://mtt.open-mpi.org/index.php?do_redir=2412 |
Thanks, I didn't know that: https://mtt.open-mpi.org/index.php?do_redir=2412 |
@artpol84 Where are we on this issue? |
@jsquyres I don't see this error anymore in our MTT. |
Sweet! I'll close. |
I am getting exactly this error with OpenMPI 3.0.0 and using RMA (multithreaded) in my code. This is reproducible and it does not happen with MPICH. Any ideas? |
@icebaman Can you open a new issue with a small reproducer code? |
Hi, |
I'm afraid not. Have you tried Open MPI v4.x? |
Hi Jeff, no but I also got that advice on another ticket for a different issue, so will try that shortly. Thanks! |
Yeah, sorry, 1.10.x is ancient and not really supported any more. |
No worries; the only reason folks are running 1.10.X is that 2 and 3 are having scalability problems in our environment. Suspect it has something to do with startup mechanism, Slurm config, and our network, but hope 4 works out of the box. |
We observed this error only once in our MTT. We are running v2.x with SLURM/pmix there. It is possible that it is somehow related to this configuration, though I doubt that.
Here is the error message:
The text was updated successfully, but these errors were encountered: