Skip to content

attribute functions lead to application segfault #10339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeffhammond opened this issue Apr 29, 2022 · 25 comments · Fixed by #10344
Closed

attribute functions lead to application segfault #10339

jeffhammond opened this issue Apr 29, 2022 · 25 comments · Fixed by #10344

Comments

@jeffhammond
Copy link
Contributor

jeffhammond commented Apr 29, 2022

Your attributes implementation is broken by GCC 11.

I speculate the code that GCC 11 has optimized into badness is not valid C, but I'm not enough of a language lawyer to know for sure.

The Bug

$ /tmp/install-ompi-main/bin/mpicc -g3 MCVE.c
$ /tmp/install-ompi-main/bin/mpirun -n 1 gdb ./a.out -ex "set width 1000" -ex "thread apply all bt" -ex run -ex bt -ex "set confirm off" -ex quit
No protocol specified
...
Reading symbols from ./a.out...
Starting program: /tmp/armci-mpi-ompi-main/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7596700 (LWP 759692)]

Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
0x00005555555552ed in main () at MCVE.c:19
19	      if ( (*attr_val)==MPI_WIN_SEPARATE ) {
#0  0x00005555555552ed in main () at MCVE.c:19

MCVE

#include <stdio.h>
#include <mpi.h>

int main(void)
{
  int world_me;
  MPI_Win win;
  void * base;
  void * attr_ptr;
  int    attr_flag;
  MPI_Init(NULL,NULL);
  MPI_Comm_rank(MPI_COMM_WORLD, &world_me);
  MPI_Win_allocate( 1, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &base, &win);
  /* this function will always return flag=false in MPI-2 */
  MPI_Win_get_attr(win, MPI_WIN_MODEL, &attr_ptr, &attr_flag);
  if (attr_flag) {
    int * attr_val = (int*)attr_ptr;
    if (world_me==0) {
      if ( (*attr_val)==MPI_WIN_SEPARATE ) {
        printf("MPI_WIN_MODEL = MPI_WIN_SEPARATE \n" );
      } else if ( (*attr_val)==MPI_WIN_UNIFIED ) {
        printf("MPI_WIN_MODEL = MPI_WIN_UNIFIED \n" );
      } else {
        printf("MPI_WIN_MODEL = %d (not UNIFIED or SEPARATE) \n", *attr_val );
      }
    }
  } else {
    if (world_me==0) {
      printf("MPI_WIN_MODEL attribute missing \n");
    }
  }
  MPI_Win_free(&win);
  MPI_Finalize();
  return 0;
}

ompi_info

jhammond@nuclear:/tmp/armci-mpi-ompi-main$ /tmp/install-ompi-main/bin/ompi_info
                 Package: Open MPI jhammond@nuclear Distribution
                Open MPI: 5.1.0a1
  Open MPI repo revision: v2.x-dev-9819-g02d91b56b8
   Open MPI release date: Unreleased developer copy
                 MPI API: 3.1.0
            Ident string: 5.1.0a1
                  Prefix: /tmp/install-ompi-main
 Configured architecture: x86_64-pc-linux-gnu
           Configured by: jhammond
           Configured on: Fri Apr 29 07:49:20 UTC 2022
          Configure host: nuclear
  Configure command line: '--prefix=/tmp/install-ompi-main' '--without-psm2'
                          '--without-cuda' '--without-ofi'
                          '--without-libfabric'
                Built by: jhammond
                Built on: Fri 29 Apr 2022 07:52:53 AM UTC
              Built host: nuclear
              C bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the gfortran compiler and/or Open
                          MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /bin/gcc
  C compiler family name: GNU
      C compiler version: 11.1.0
            C++ compiler: g++
   C++ compiler absolute: /bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /bin/gfortran
         Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, Event lib: yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
          MPI extensions: affinity, cuda, ftmpi
 Fault Tolerance support: yes
          FT MPI support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.1.0)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.1.0)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.1.0)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.1.0)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.1.0)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.1.0)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.1.0)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v5.1.0)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.1.0)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.1.0)
             MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.1.0)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                 MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.1.0)
                MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.1.0)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.1.0)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
                          v5.1.0)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.1.0)
                MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.1.0)
                 MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.1.0)
                 MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.1.0)
                 MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.1.0)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.1.0)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.1.0)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v5.1.0)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v5.1.0)

It works with Open-MPI 3...

This code has worked for approximately 8 years with every other MPI implementation I've tried, including this one.

$ mpicc -g MCVE.c
$ mpirun -n 1 ./a.out
MPI_WIN_MODEL = MPI_WIN_UNIFIED
$ ompi_info
                 Package: Open MPI qa@sky1 Distribution
                Open MPI: 3.1.5
  Open MPI repo revision: v3.1.5
   Open MPI release date: Nov 15, 2019
                Open RTE: 3.1.5
  Open RTE repo revision: v3.1.5
   Open RTE release date: Nov 15, 2019
                    OPAL: 3.1.5
      OPAL repo revision: v3.1.5
       OPAL release date: Nov 15, 2019
                 MPI API: 3.1.0
            Ident string: 3.1.5
                  Prefix: /opt/nvidia/hpc_sdk/Linux_x86_64/22.2/comm_libs/openmpi/openmpi-3.1.5
 Configured architecture: x86_64-unknown-linux-gnu
          Configure host: sky1
           Configured by: qa
           Configured on: Thu Jan 13 10:48:36 PST 2022
          Configure host: sky1
  Configure command line: '--prefix=/proj/nv/libraries/Linux_x86_64/22.2/openmpi/209518-rel-1'
                          '--enable-shared' '--enable-static' '--without-tm'
                          '--enable-mpi-cxx' '--disable-wrapper-runpath'
                          '--without-ucx' '--without-libnl'
                          '--with-wrapper-ldflags=-Wl,-rpath
                          -Wl,$ORIGIN:$ORIGIN/../../lib:$ORIGIN/../../../lib:$ORIGIN/../../../compilers/lib:$ORIGIN/../../../../compilers/lib:$ORIGIN/../../../../../compilers/lib'
                          '--enable-mpirun-prefix-by-default'
                          '--with-libevent=internal' '--with-slurm'
                          '--without-libnl'
                          '--with-cuda=/proj/cuda/10.0/Linux_x86_64'
                Built by: qa
                Built on: Thu Jan 13 11:05:27 PST 2022
              Built host: sky1
              C bindings: yes
            C++ bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the nvfortran -nomp compiler and/or
                          Open MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: rpath
              C compiler: nvc -nomp
     C compiler absolute: /proj/nv/Linux_x86_64/209518-rel/compilers/bin/nvc
  C compiler family name: PGI
      C compiler version: 22.2-0
            C++ compiler: nvc++ -nomp
   C++ compiler absolute: /proj/nv/Linux_x86_64/209518-rel/compilers/bin/nvc++
           Fort compiler: nvfortran -nomp
       Fort compiler abs: /proj/nv/Linux_x86_64/209518-rel/compilers/bin/nvfortran
         Fort ignore TKR: yes (!DIR$ IGNORE_TKR)
   Fort 08 assumed shape: no
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: yes
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
          MPI extensions: affinity, cuda
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v3.1.5)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v3.1.5)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA btl: self (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA btl: openib (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA btl: smcuda (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA btl: tcp (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA btl: vader (MCA v2.1.0, API v3.0.0, Component v3.1.5)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v3.1.5)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
               MCA hwloc: hwloc1117 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v3.1.5)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v3.1.5)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v3.1.5)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v3.1.5)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA pmix: pmix2x (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v3.1.5)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v3.1.5)
              MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v3.1.5)
              MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v3.1.5)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v3.1.5)
                 MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v3.1.5)
                 MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v3.1.5)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v3.1.5)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v3.1.5)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v3.1.5)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v3.1.5)
              MCA errmgr: dvm (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v3.1.5)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v3.1.5)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v3.1.5)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v3.1.5)
            MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v3.1.5)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA oob: ud (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v3.1.5)
                MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v3.1.5)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v3.1.5)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v3.1.5)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v3.1.5)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v3.1.5)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v3.1.5)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v3.1.5)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v3.1.5)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA state: dvm (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v3.1.5)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v3.1.5)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: spacc (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
               MCA fcoll: static (MCA v2.1.0, API v2.0.0, Component v3.1.5)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                  MCA io: romio314 (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v3.1.5)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v3.1.5)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v3.1.5)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v3.1.5)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v3.1.5)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v3.1.5)
@jeffhammond
Copy link
Contributor Author

jeffhammond commented Apr 29, 2022

4.1.4v1 works, so the bug was introduced in the 5.0 branch.

v5.0.0rc1 and v5.0.0rc2 work.

v5.0.0rc6, v5.0.0rc5, v5.0.0rc4 and v5.0.0rc3 are broken.

@jeffhammond
Copy link
Contributor Author

I speculate that #10070 is the culprit, based on the timing. That commit is from March 5, and the error first appears in the tag created on March 8.

@bwbarrett can you take a look at this?

Generally, can the developers of Open-MPI please add more tests so this doesn't happen again? It seems rather shocking that the Open-MPI test suite does not contain basic unit tests of object attribute functions. There are numerous tests in the MPICH test suite that you all can use, or you can just run the ARMCI-MPI test suite so I don't have to do it manually as a maintainer of a dependent project, since there is overwhelming evidence that you all have a problem when it comes to RMA QA.

% git grep MPI_Win_get_attr
f08/attr/attrlangc.c:    MPI_Win_get_attr(win, key, &attrval, &flag);
f08/attr/attrlangc.c:    MPI_Win_get_attr(win, *fkey, &attrval, &flag);
f08/rma/baseattrwinf08.f90:      call MPI_Win_get_attr( win, MPI_WIN_BASE, valout, flag, ierr )
f08/rma/baseattrwinf08.f90:      call MPI_Win_get_attr( win, MPI_WIN_SIZE, valout, flag, ierr )
f08/rma/baseattrwinf08.f90:      call MPI_Win_get_attr( win, MPI_WIN_DISP_UNIT, valout, flag, ierr)
f77/rma/baseattrwinf.f:      call MPI_Win_get_attr( win, MPI_WIN_BASE, valout, flag, ierr )
f77/rma/baseattrwinf.f:      call MPI_Win_get_attr( win, MPI_WIN_SIZE, valout, flag, ierr )
f77/rma/baseattrwinf.f:      call MPI_Win_get_attr( win, MPI_WIN_DISP_UNIT, valout, flag, ierr)
f90/attr/attrlangc.c:    MPI_Win_get_attr(win, key, &attrval, &flag);
f90/attr/attrlangc.c:    MPI_Win_get_attr(win, *fkey, &attrval, &flag);
rma/attrorderwin.c:        MPI_Win_get_attr(win, key[i], &val_p, &flag);
rma/attrorderwin.c:        MPI_Win_get_attr(win, key[i], &val_p, &flag);
rma/baseattrwin.c:    MPI_Win_get_attr(win, MPI_WIN_BASE, &v, &flag);
rma/baseattrwin.c:    MPI_Win_get_attr(win, MPI_WIN_SIZE, &v, &flag);
rma/baseattrwin.c:    MPI_Win_get_attr(win, MPI_WIN_DISP_UNIT, &v, &flag);
rma/win_flavors.c:    MPI_Win_get_attr(window, MPI_WIN_CREATE_FLAVOR, &flavor, &flag);
rma/win_flavors.c:    MPI_Win_get_attr(window, MPI_WIN_MODEL, &model, &flag);
rma/win_flavors.c:    MPI_Win_get_attr(window, MPI_WIN_CREATE_FLAVOR, &flavor, &flag);
rma/win_flavors.c:    MPI_Win_get_attr(window, MPI_WIN_MODEL, &model, &flag);
rma/win_flavors.c:    MPI_Win_get_attr(window, MPI_WIN_CREATE_FLAVOR, &flavor, &flag);
rma/win_flavors.c:    MPI_Win_get_attr(window, MPI_WIN_MODEL, &model, &flag);
util/mtest.c:    merr = MPI_Win_get_attr(*win, MPI_WIN_BASE, &addr, &flag);
util/mtest.c:        merr = MPI_Win_get_attr(*win, mem_keyval, &val, &flag);

@jeffhammond
Copy link
Contributor Author

1bcc6b1 is broken. Now to check the commit before it...

@ggouaillardet
Copy link
Contributor

I was able to successfully pass the test program with both main branch ( @02d91b56b8eb98e705fbae0f75f31f2a2af55f3d ) and v5.0.x ( @ea99285f02ade6c6d0930944b5c29fc633ecdf2f )

FWIW, I configure'd with --enable-debug. valgrind did not show any memory error.

@jeffhammond can you please copy/paste your configure command line?

@jeffhammond
Copy link
Contributor Author

@jeffhammond
Copy link
Contributor Author

Also, you can see the configure used in ompi_info above. That's why I didn't include it separately.

@ggouaillardet
Copy link
Contributor

I tried again without --enable-debug, and it passes the test.

which compiler are you using?

BTW, the latest updates are in the main branch (it has been renamed from master a few weeks ago). can you please confirm you did not use the master branch? (i do not know if it has been removed or it is outdated).

@jeffhammond
Copy link
Contributor Author

I used main but also, you can see I have the failure with multiple 5.0.x release candidates.

@jeffhammond
Copy link
Contributor Author

jeffhammond commented Apr 29, 2022

See ompi_info above for compiler information:

C compiler absolute: /bin/gcc
C compiler family name: GNU
C compiler version: 11.1.0

@jeffhammond
Copy link
Contributor Author

jeffhammond commented Apr 29, 2022

To rule out other causes, I've removed all MPI-related packages from Apt and built Open-MPI statically. It seems static is broken in that I have to manually remove *.la relative to mpicc -show but the result demonstrates that I'm not getting the wrong shared library and that v5.0.0rc5 is still broken.

I guess I'll rule out GCC 11 being the cause but I'm really skeptical that this is a C compiler bug.

jhammond@nuclear:/tmp/armci-mpi-ompi-v5.0.0rc5$ /tmp/install-ompi-v5.0.0rc5/bin/mpirun -n 1 gdb tests/mpi/test_win_model -ex 'set width 1000' -ex 'thread apply all bt' -ex run -ex bt -ex 'set confirm off' -ex quit
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from tests/mpi/test_win_model...
Starting program: /tmp/armci-mpi-ompi-v5.0.0rc5/tests/mpi/test_win_model
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7c0e700 (LWP 3153463)]
Starting MPI window attribute test with 1 processes

Thread 1 "test_win_model" received signal SIGSEGV, Segmentation fault.
0x0000555555589614 in main (argc=1, argv=0x7fffffffdd98) at tests/mpi/test_win_model.c:36
36	  if (attr_flag && (*attr_val)!=MPI_WIN_UNIFIED && rank==0)
#0  0x0000555555589614 in main (argc=1, argv=0x7fffffffdd98) at tests/mpi/test_win_model.c:36
jhammond@nuclear:/tmp/armci-mpi-ompi-v5.0.0rc5$ ldd tests/mpi/test_win_model
	linux-vdso.so.1 (0x00007ffe3ddef000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4893aaf000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4893a8c000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f4893a70000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4893921000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f489391c000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4893914000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4893722000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4893fc3000)
jhammond@nuclear:/tmp/armci-mpi-ompi-v5.0.0rc5$ /tmp/install-ompi-v5.0.0rc5/bin/ompi_info
                 Package: Open MPI jhammond@nuclear Distribution
                Open MPI: 5.0.0rc5
  Open MPI repo revision: v5.0.0rc5
   Open MPI release date: Unreleased developer copy
                 MPI API: 3.1.0
            Ident string: 5.0.0rc5
                  Prefix: /tmp/install-ompi-v5.0.0rc5
 Configured architecture: x86_64-pc-linux-gnu
           Configured by: jhammond
           Configured on: Fri Apr 29 12:03:46 UTC 2022
          Configure host: nuclear
  Configure command line: 'CC=gcc' '--prefix=/tmp/install-ompi-v5.0.0rc5'
                          '--without-psm2' '--without-libfabric'
                          '--without-ofi' '--without-cuda'
                          '--enable-mpi-fortran=none' '--enable-static'
                          '--disable-shared'
                Built by: jhammond
                Built on: Fri 29 Apr 2022 12:07:13 PM UTC
              Built host: nuclear
              C bindings: yes
             Fort mpif.h: no
            Fort use mpi: no
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /bin/gcc
  C compiler family name: GNU
      C compiler version: 11.1.0
            C++ compiler: g++
   C++ compiler absolute: /bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /bin/gfortran
         Fort ignore TKR: no
   Fort 08 assumed shape: no
      Fort optional args: no
          Fort INTERFACE: no
    Fort ISO_FORTRAN_ENV: no
       Fort STORAGE_SIZE: no
      Fort BIND(C) (all): no
      Fort ISO_C_BINDING: no
 Fort SUBROUTINE BIND(C): no
       Fort TYPE,BIND(C): no
 Fort T,BIND(C,name="a"): no
            Fort PRIVATE: no
           Fort ABSTRACT: no
       Fort ASYNCHRONOUS: no
          Fort PROCEDURE: no
         Fort USE...ONLY: no
           Fort C_FUNLOC: no
 Fort f08 using wrappers: no
         Fort MPI_SIZEOF: no
             C profiling: yes
   Fort mpif.h profiling: no
  Fort use mpi profiling: no
   Fort use mpi_f08 prof: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, Event lib: yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
          MPI extensions: affinity, cuda, ftmpi
 Fault Tolerance support: yes
          FT MPI support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v5.0.0)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v5.0.0)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v5.0.0)
                 MCA btl: self (MCA v2.1.0, API v3.3.0, Component v5.0.0)
                 MCA btl: sm (MCA v2.1.0, API v3.3.0, Component v5.0.0)
                 MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.0)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v5.0.0)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.0)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.0)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v5.0.0)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v5.0.0)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v5.0.0)
               MCA mpool: hugepage (MCA v2.1.0, API v3.1.0, Component v5.0.0)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v5.0.0)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v5.0.0)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v5.0.0)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v5.0.0)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v5.0.0)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v5.0.0)
                MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.0.0)
             MCA threads: pthreads (MCA v2.1.0, API v1.0.0, Component v5.0.0)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v5.0.0)
                 MCA bml: r2 (MCA v2.1.0, API v2.1.0, Component v5.0.0)
                MCA coll: adapt (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: basic (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: han (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: inter (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: libnbc (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: self (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: sm (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: sync (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: tuned (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA coll: ftagree (MCA v2.1.0, API v2.4.0, Component v5.0.0)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v5.0.0)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v5.0.0)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v5.0.0)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.0)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v5.0.0)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.0)
                MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
                          v5.0.0)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v5.0.0)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.0)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.0)
                MCA part: persist (MCA v2.1.0, API v4.0.0, Component v5.0.0)
                 MCA pml: cm (MCA v2.1.0, API v2.1.0, Component v5.0.0)
                 MCA pml: ob1 (MCA v2.1.0, API v2.1.0, Component v5.0.0)
                 MCA pml: v (MCA v2.1.0, API v2.1.0, Component v5.0.0)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v5.0.0)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v5.0.0)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v5.0.0)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v5.0.0)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v5.0.0)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v5.0.0)

@jeffhammond
Copy link
Contributor Author

jeffhammond commented Apr 29, 2022

Okay, this is some weird ****. GCC 7 is fine.

I'll figure out which GCC versions are bad, but if it's entirely possible it's a UB situation, in which case it's still an Open-MPI problem.

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.1.0-1ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (Ubuntu 11.1.0-1ubuntu1~20.04)

But the fact remains, when GCC 11 is used (and not GCC 10 or older), the following is true, which implies that some change in Open MPI between v5.0.0rc2 and v5.0.0rc3 breaks a trivial MPI utility function.

  • 4.1.4v1 works, so the bug was introduced in the 5.0 branch.
  • v5.0.0rc1 and v5.0.0rc2 work.
  • v5.0.0rc6, v5.0.0rc5, v5.0.0rc4 and v5.0.0rc3 are broken.

@dalcinl
Copy link
Contributor

dalcinl commented Apr 29, 2022

@jeffhammond I ran mpi4py testsuite on the commit from your original post under GitHub Actions https://github.com/mpi4py/mpi4py-testing/actions/workflows/openmpi.yml

Full logs here: https://github.com/mpi4py/mpi4py-testing/runs/6227884365?check_suite_focus=true
There is something definitely fishy going on:

python: win/win.c:463: ompi_win_destruct: Assertion `OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (win->error_handler))->obj_magic_id' failed.
[fv-az457-406:164337] *** Process received signal ***
[fv-az457-406:164337] Signal: Aborted (6)
[fv-az457-406:164337] Signal code:  (-6)

@ggouaillardet
Copy link
Contributor

I rebuild with the same command and gcc 11.1.0 on rhel7 and was unable to reproduce the issue :-(

which ubuntu flavor are you running on?

@dalcinl
Copy link
Contributor

dalcinl commented Apr 29, 2022

which ubuntu flavor are you running on?

Well, it is the ubuntu-20.04 image from GitHub Actions's runners. From the logs:

@dalcinl
Copy link
Contributor

dalcinl commented Apr 29, 2022

ompi_win_destruct: Assertion OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (win->error_handler))->obj_magic_id' failed.`

@ggouaillardet This looks like memory corruption. Have you run Jeff's reproducer under valgrind?

@dalcinl
Copy link
Contributor

dalcinl commented Apr 29, 2022

@ggouaillardet I ran my CI over ompi/main: https://github.com/mpi4py/mpi4py-testing/runs/6229692910?check_suite_focus=true
Same failure. This is definitely a regression introduced during the week, things have been green for about a month: https://github.com/mpi4py/mpi4py-testing/actions/workflows/openmpi.yml

PS: Perhaps I should run mpi4py tests daily rather than weekly. But It would be even better if you copy over my CI and setup your own GitHub Actions to run mpi4py testsuite. Or at least join this repo (note, it is not the main mpi4py repo, but one used exclusively for running tests manually or on schedule) https://github.com/mpi4py/mpi4py-testing to get notifications when the scheduled tests fail.

@jeffhammond
Copy link
Contributor Author

jeffhammond commented Apr 29, 2022

I see the same issue with GCC 11 - but not with GCC 10 - on AArch64.

$ /tmp/install-ompi-v5.0.0rc5/bin/mpirun -n 1 gdb /tmp/armci-mpi-ompi-v5.0.0rc5/tests/mpi/test_win_model
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2
Reading symbols from /tmp/armci-mpi-ompi-v5.0.0rc5/tests/mpi/test_win_model...
run
(gdb) Starting program: /tmp/armci-mpi-ompi-v5.0.0rc5/tests/mpi/test_win_model
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xfffff76191b0 (LWP 2398059)]
Starting MPI window attribute test with 1 processes

Thread 1 "test_win_model" received signal SIGSEGV, Segmentation fault.
0x0000aaaaaaaaad74 in main (argc=1, argv=0xffffffffe068) at tests/mpi/test_win_model.c:36
36	  if (attr_flag && (*attr_val)!=MPI_WIN_UNIFIED && rank==0)

@jeffhammond
Copy link
Contributor Author

Ubuntu 20.04 on all my machines...

@jeffhammond
Copy link
Contributor Author

jeffhammond commented Apr 29, 2022

I reproduced on the Ubuntu 18.04 AArch64 machine I have, with GCC 11:

jhammond@xavier-agx:/tmp/armci-mpi-ompi-v5.0.0rc5$     /tmp/install-ompi-$VER/bin/mpirun -n 1 ./tests/mpi/test_win_model
Starting MPI window attribute test with 1 processes
[xavier-agx:25155] *** Process received signal ***
[xavier-agx:25155] Signal: Segmentation fault (11)
[xavier-agx:25155] Signal code: Address not mapped (1)
[xavier-agx:25155] Failing at address: (nil)
[xavier-agx:25155] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x7fae9ac6c0]
[xavier-agx:25155] [ 1] ./tests/mpi/test_win_model[0x400d50]
[xavier-agx:25155] [ 2] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0)[0x7fae52b7a0]
[xavier-agx:25155] [ 3] ./tests/mpi/test_win_model[0x400b7c]
[xavier-agx:25155] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node xavier-agx exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
jhammond@xavier-agx:/tmp/armci-mpi-ompi-v5.0.0rc5$ /tmp/install-ompi-$VER/bin/mpirun -n 1 gdb ./tests/mpi/test_win_model
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
...
(gdb) Starting program: /tmp/armci-mpi-ompi-v5.0.0rc5/tests/mpi/test_win_model
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fb76371b0 (LWP 25171)]
Starting MPI window attribute test with 1 processes

Thread 1 "test_win_model" received signal SIGSEGV, Segmentation fault.
0x0000000000400d50 in main (argc=1, argv=0x7fffffe248) at tests/mpi/test_win_model.c:36
36	  if (attr_flag && (*attr_val)!=MPI_WIN_UNIFIED && rank==0)
bt
(gdb) #0  0x0000000000400d50 in main (argc=1, argv=0x7fffffe248) at tests/mpi/test_win_model.c:36

@ggouaillardet what version of glibc do you have on RHEL 7? Maybe binutils and ld too, just to be thorough.

@devreal
Copy link
Contributor

devreal commented Apr 29, 2022

I can reproduce it on Linux Mint with GCC 11.2.0 and Open MPI debugging disabled. Will take a closer look

@awlauria
Copy link
Contributor

awlauria commented Apr 29, 2022

I can't reproduce on my 18.04 laptop with gcc 7.5 (the latest/default).

rhel 8.4 with gcc 8.4.1 also no dice.

@jeffhammond
Copy link
Contributor Author

Please use GCC 11. I've tried 7-11 and only 11 triggers this.

@devreal
Copy link
Contributor

devreal commented Apr 29, 2022

This does not only affect RMA window attributes but all attributes seem broken with GCC 11. I put up #10343 but it's more a bandaid than a fix for code that appears fishy to me. I hope someone who remembers the rationale behind it can comment.

@jeffhammond jeffhammond changed the title MPI_Win_get_attr(MPI_WIN_MODEL) leads to application segfault attribute functions lead to application segfault Apr 30, 2022
@ggouaillardet
Copy link
Contributor

FWIW, I reported this behavior to the GCC folks at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105449

The issue occurs from -O2. I ran a few tests among the GCC versions I have:

  • GCC 9.2.0 works as expected
  • GCC 10.1.0 evidences the suspicious behavior we see on gcc 11.1.0

@apinski-cavium
Copy link

And the code is just violating C aliasing rules. Pointers types cannot alias int. So when accessing via an "int" you cannot access something which was stored as a void*. (I made a mistake in the GCC bug report because I missed the array was int* and not int** but it was a minor mistake that does not change the aliasing issues).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants