Skip to content

osc/pt2pt hangs in MPI_Win_flush #6819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
devreal opened this issue Jul 15, 2019 · 1 comment
Closed

osc/pt2pt hangs in MPI_Win_flush #6819

devreal opened this issue Jul 15, 2019 · 1 comment

Comments

@devreal
Copy link
Contributor

devreal commented Jul 15, 2019

Trying to run the example from #6552 on master (020a591) with UCX OSC disabled leads to the process performing RMA hanging in MPI_Win_flush:

Process 1:

#14 main (argc=3, argv=0x7fffffffb928) at test_mpi_rget_fetch_op.c:36 (at 0x0000000000400d84)
#13 PMPI_Win_flush () from openmpi-git-ucx/lib/libmpi.so.0 (at 0x00002aaaaab674a1)
#12 ompi_osc_pt2pt_flush () from openmpi-git-ucx/lib/openmpi/mca_osc_pt2pt.so (at 0x00002aaaad842d3d)
#11 ompi_osc_pt2pt_flush_lock () from openmpi-git-ucx/lib/openmpi/mca_osc_pt2pt.so (at 0x00002aaaad840b15)
#10 opal_progress () from openmpi-git-ucx/lib/libopen-pal.so.0 (at 0x00002aaaab2e4d1c)
#9 mca_pml_ucx_progress () from openmpi-git-ucx/lib/openmpi/mca_pml_ucx.so (at 0x00002aaaac8d07d7)
#8 ucp_worker_progress (worker=0x2aaaad879010) at openucx/ucx-1.5.2/build/src/ucp/../../../src/ucp/core/ucp_worker.c:1426 (at 0x00002aaaac170ee2)
#7 uct_worker_progress (worker=<optimized out>) at openucx/ucx-1.5.2/build/../src/uct/api/uct.h:1677 (at 0x00002aaaac170ee2)
#6 ucs_callbackq_dispatch (cbq=<optimized out>) at openucx/ucx-1.5.2/build/../src/ucs/datastruct/callbackq.h:209 (at 0x00002aaaac170ee2)
#5 uct_rc_verbs_iface_progress (arg=0x4f6770) at openucx/ucx-1.5.2/build/src/uct/../../../src/uct/ib/rc/verbs/rc_verbs_iface.c:116 (at 0x00002aaaac206347)
#4 uct_rc_verbs_iface_poll_tx (iface=0x4f6770) at openucx/ucx-1.5.2/build/src/uct/../../../src/uct/ib/rc/verbs/rc_verbs_iface.c:83 (at 0x00002aaaac206347)
#3 uct_ib_poll_cq (wcs=0x7fffffffb310, count=<synthetic pointer>, cq=<optimized out>) at openucx/ucx-1.5.2/build/../src/uct/ib/base/ib_device.h:289 (at 0x00002aaaac206347)
#2 ibv_poll_cq (wc=0x7fffffffb310, num_entries=<optimized out>, cq=<optimized out>) at /usr/include/infiniband/verbs.h:1458 (at 0x00002aaaac206347)
#1 mlx5_poll_cq () from /usr/lib64/libmlx5.so.1 (at 0x00002aaaac4f0ea7)
#0 pthread_spin_lock () from /usr/lib64/libpthread.so.0 (at 0x00002aaaaacdb483)

Process 0 waits in MPI_Barrier:

#13 main (argc=3, argv=0x7fffffffb928) at test/test_mpi_rget_fetch_op.c:44 (at 0x0000000000400dab)
#12 PMPI_Barrier () from openmpi-git-ucx/lib/libmpi.so.0 (at 0x00002aaaaab39e45)
#11 ompi_coll_base_barrier_intra_two_procs () from openmpi-git-ucx/lib/libmpi.so.0 (at 0x00002aaaaab84a32)
#10 ompi_request_default_wait () from openmpi-git-ucx/lib/libmpi.so.0 (at 0x00002aaaaab21255)
#9 opal_progress () from openmpi-git-ucx/lib/libopen-pal.so.0 (at 0x00002aaaab2e4d1c)
#8 mca_pml_ucx_progress () from openmpi-git-ucx/lib/openmpi/mca_pml_ucx.so (at 0x00002aaaac8d07d7)
#7 ucp_worker_progress (worker=0x2aaaad879010) at openucx/ucx-1.5.2/build/src/ucp/../../../src/ucp/core/ucp_worker.c:1426 (at 0x00002aaaac170ee2)
#6 uct_worker_progress (worker=<optimized out>) at openucx/ucx-1.5.2/build/../src/uct/api/uct.h:1677 (at 0x00002aaaac170ee2)
#5 ucs_callbackq_dispatch (cbq=<optimized out>) at openucx/ucx-1.5.2/build/../src/ucs/datastruct/callbackq.h:209 (at 0x00002aaaac170ee2)
#4 uct_rc_verbs_iface_progress (arg=0x4f6770) at openucx/ucx-1.5.2/build/src/uct/../../../src/uct/ib/rc/verbs/rc_verbs_iface.c:111 (at 0x00002aaaac2061ff)
#3 uct_rc_verbs_iface_poll_rx_common (iface=0x4f6770) at openucx/ucx-1.5.2/build/src/uct/../../../src/uct/ib/rc/verbs/rc_verbs_common.h:182 (at 0x00002aaaac2061ff)
#2 uct_ib_poll_cq (wcs=0x7fffffffb2b0, count=<synthetic pointer>, cq=<optimized out>) at openucx/ucx-1.5.2/build/../src/uct/ib/base/ib_device.h:289 (at 0x00002aaaac2061ff)
#1 ibv_poll_cq (wc=0x7fffffffb2b0, num_entries=<optimized out>, cq=<optimized out>) at /usr/include/infiniband/verbs.h:1458 (at 0x00002aaaac2061ff)
#0 mlx5_poll_cq () from /usr/lib64/libmlx5.so.1 (at 0x00002aaaac4f0e49)

The code of the example is:

#include <mpi.h>
#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    int rank, size;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Win win;
    char *base;

    MPI_Win_allocate(
        sizeof(uint64_t),
        1,
        MPI_INFO_NULL,
        MPI_COMM_WORLD,
        &base,
        &win);

    int target = 0;

    if (size == 2) {
      if (rank != target) {
        MPI_Win_lock(MPI_LOCK_EXCLUSIVE, target, 0, win);
        uint64_t res;
        uint64_t val;
        MPI_Request req;
        MPI_Rget(&val, 1, MPI_UINT64_T, target, 0, 1, MPI_UINT64_T, win, &req);
        MPI_Wait(&req, MPI_STATUS_IGNORE);
        // SEGFAULTs
        MPI_Fetch_and_op(&val, &res, MPI_UINT64_T, target, 0, MPI_SUM, win);
        MPI_Win_flush(target, win);

        MPI_Win_unlock(target, win);
      }
    } else {
      printf("Skipping exclusive lock test for more than 2 ranks!\n");
    }

    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Win_free(&win);

    MPI_Finalize();

    return 0;
}

Reproducible when running with -mca osc ^ucx and runs succeed with with --mca osc ^ucx --mca btl_uct_memory_domains ib/mlx5_0. I am able to reproduce that problem with the 4.0.1 release as well.

Open MPI was configured with:

$ ../configure CC=gcc CXX=g++ FC=gfortran --with-ucx=$HOME/opt/ucx-1.6.x-gnu/ --without-verbs --prefix=$HOME/opt/openmpi-4.0.1-ucx

Originally reported in #6816

@janjust janjust assigned janjust and unassigned janjust Jul 16, 2019
@devreal
Copy link
Contributor Author

devreal commented Mar 11, 2021

osc/pt2pt is no longer supported. Closing

@devreal devreal closed this as completed Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants