-
Notifications
You must be signed in to change notification settings - Fork 899
Attributes: replace custom void* union with C union #10344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this on a big endian cpu with 8 bytes Fortran INTEGER
?
I suspect that would not work when "converting" between (4 bytes) int
and (8 bytes) INTEGER
.
This is the reason I suggested we keep the *_pos
logic but initialize at compile time based on pointer sizes and endianness.
FWIW, an quick (and likely thread safe) workaround is to declare
(and keep |
I don't see where there would be an issue in this implementation. We would read the 4B int (
Sure, that's a bandaid that will work until it doesn't because some compiler will figure out that aliasing the storage of a
Never mind that last comment, got the cast precedence wrong... |
I agree with @devreal . The existing code is not valid C and the compiler is now showing that. The correct fix is to use a union. Putting a bandaid on bad code is just going to lead to issues again in the future. |
ompi/attribute/attribute.c
Outdated
@@ -393,25 +391,30 @@ do { \ | |||
* Cases for attribute values | |||
*/ | |||
typedef enum ompi_attribute_translate_t { | |||
OMPI_ATTRIBUTE_INVALID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this break internal ABI? Maybe do OMPI_ATTRIBUTE_INVALID = -1
to avoid shifting the existing values by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or move it at the end as OMPI_ATTRIBUTE_MAX_VALID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no internal ABI, it's all in the same file. Changed it to -1 nevertheless.
ompi/attribute/attribute.c
Outdated
@@ -393,25 +391,30 @@ do { \ | |||
* Cases for attribute values | |||
*/ | |||
typedef enum ompi_attribute_translate_t { | |||
OMPI_ATTRIBUTE_INVALID, | |||
OMPI_ATTRIBUTE_C, | |||
OMPI_ATTRIBUTE_INT, | |||
OMPI_ATTRIBUTE_FINT, | |||
OMPI_ATTRIBUTE_AINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit. While you are editing this file can you add a trailing comma?
@devreal I reran mpi4py's testsuite against your clone and branch, but it is still failing. I though that the issue was related to attributes, as the original report came from using Win attributes, but looks like something else broke during the last week or so. full log: https://github.com/mpi4py/mpi4py-testing/runs/6256108177?check_suite_focus=true
|
@dalcinl Thanks, I will look into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a union is a much better approach.
@devreal and I chatted in Slack about this. I pushed a 2nd commit to this PR with a few minor comment updates to ompi/attribute/attribute.c. |
FWIW, the "test all 9 cases" attribute mini-test suite that we have (currently in the private test suite, but since I wrote 100% of the code in it, it would be straightforward to move this test suite to ompi-tests-public) would probably be useful here. That being said:
I had hoped that our "simple" attribute test suite would show the same error as was reported in #10339, but it's not. |
I made a change to the I will squash the changes I pushed once approved, just wanted an easy path to restore these functions if need be. |
@dalcinl I was unable to reproduce the crash with mpi4py. I tried running with valgrind and found some spurious invalid reads like the one below but nothing that could corrupt the magic ID and no double free.
|
@devreal I found a trivial reproducer. I'm using the branch from this PR. However, at this point I'm not sure the issue is related to attributes. from mpi4py import MPI
for i in range(14):
MPI.INFO_ENV.Dup().Free() Iterating 14 times (or more) as in the snippet above triggers the assertion, but using 13 or less, all is good and valgrind is clean.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's squash.
The current implementation uses a void* to store different types of attribute value integers and attempts to figure out proper offsets for storing smaller integers in that pointer. The required pointer aliasing is UB and causes issues with GCC 11. The new implementation replaces the self-built pointer-based union with a C union and selects the (pointer to the) right field based on the av_set_from value. This patch also fixes a bug where copied attributes always had the set_from field set to C pointer, which worked but is technically not correct. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Correct a few minor mistakes in the large comment at the top of ompi/attribute/attribute.c from when 72cfbb6 added several new cases to attribute handling. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Squashed, will merge once tests have passed. |
IIRC things became tricky when Fortran sets a 8 bytes INTEGER and C reads
it, expecting an int*
On a big ending cpu, you'd have to return a pointer to the middle of the
Fortran INTEGER.
But this might be some old MPI1 only bozo cases that have been removed from
the standard.
…On Mon, May 2, 2022, 09:41 Joseph Schuchart ***@***.***> wrote:
I suspect that would not work when "converting" between (4 bytes) int and
(8 bytes) INTEGER.
I don't see where there would be an issue in this implementation. We would
read the 4B int (av_int) and cast it to MPI_Fint (translate_to_fint)
before passing it to the duplicate/delete callback (COPY_ATTR_CALLBACKS/
DELETE_ATTR_CALLBACKS) or return it to the caller (ompi_attr_get_fint).
Where would that fail? AFAICS, that is what the current implementation
does, except that it crams everything into a void* and tries to be smart
about the storage space. Obviously, GCC has become smarter, which is why
we're spending hours on this. Really, that's what a C union is for.
void * volatile bogus = (void*) 1;
Sure, that's a bandaid that will work until it doesn't because some
compiler will figure out that aliasing the storage of a void* as two int
is still UB even though you slap volatile on it. Plus, I don't see the
benefit of the current implementation over my proposal.
Looking at the original code, I think the pointer arithmetic was broken
too:
item->av_int_pointer = (int *)&item->av_value + int_pos;
Taking the address of item->av_value (void**) and adding int_pos would
point av_int_pointer past av_value in the structure in any case where int_pos
!= 0. Incidentally, with int_pos == 1 it would point av_int_pointer to
itself. I have no idea how that could have ever worked 🤷♂️
—
Reply to this email directly, view it on GitHub
<#10344 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXF526NESRSMJTVOTYQZITVH4QERANCNFSM5U2ENWGA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
@ggouaillardet A large number of things - including all parameters of root and count - in MPI implementations break when users force Fortran Nothing in the Fortran or MPI standard says this should work, and I don't think it's good for anyone for implementers to try. |
The current implementation uses a void* to store different types of attribute value integers and attempts to figure out proper offsets for storing smaller integers in that pointer. The required pointer aliasing is UB and causes issues with GCC 11.
The new implementation replaces the self-built pointer-based union with a C union and selects the (pointer to the) right field based on the av_set_from value.
This patch also fixes a bug where copied attributes always had the set_from field set to C pointer, which worked but is technically not correct.
Supersedes #10343
Fixes #10339
Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu