Skip to content

Switch from extern "C" to extern "C-unwind" #718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 25, 2025
Merged

Conversation

DavisVaughan
Copy link
Contributor

@DavisVaughan DavisVaughan commented Feb 21, 2025

Closes #678
Reverts #683

Joint work with @lionel-


Consider the following

top_level_exec(|| {
     let msg = CString::new("ouch").unwrap();
     unsafe { Rf_error(msg.as_ptr()) };
})?;

This effectively gets called as something similar to this (but I'm oversimplifying a bit):

extern "C" fn callback(_args: *mut c_void)
{
     let msg = CString::new("ouch").unwrap();
     unsafe { Rf_error(msg.as_ptr()) };
}

unsafe { R_ToplevelExec(Some(callback), std::ptr::null_mut()) };

We can also write this in terms of frames

[ C: longjmp inside Rf_error ] --------------->
   |                                          |
[ Rust: Rf_error inside callback ]            | Jump over this middle `callback` frame
   |                                          |
[ C: callback inside R_ToplevelExec ]  <------|
   |
[ Rust: R_ToplevelExec inside top_level_exec ]

In other words, when a Rf_error() causes a longjmp, we are forced to jump over the rest of the Rust callback() frame. This is undefined behavior, even though we immediately catch that longjmp in R_ToplevelExec() in the next C frame. The fact that we have jumped over the rest of the Rust callback() is simplify undefined behavior, full stop.

The exact result of this undefined behavior seems to be platform dependent. It seems to mostly still work on Unix, as seen by our Mac and Linux builds still working. But on Windows it crashes ark hard with Rust 1.84, and we think it is due to this new extern "C" feature where Drop methods try to run now rust-lang/rust#129582. In the above example, we think the Drop method for msg is expected to run, but we've longjmped past callback() so it can't properly run, and this blows something up.


The best description of this is this table:
https://rust-lang.github.io/rfcs/2945-c-unwind-abi.html#abi-boundaries-and-unforced-unwinding.

Screenshot 2025-02-24 at 10 59 26 AM

We are currently in the combination of panic=unwind, "C"-like, causing Unforced foreign unwind to result in UB.

The goal of this PR is to move us to "C-unwind", causing Unforced foreign unwind to result in unwind. This was introduced in Rust 1.71 https://blog.rust-lang.org/2023/07/13/Rust-1.71.0.html#c-unwind-abi


With "C-unwind, we still end up jumping over the rest of callback(), but Rust now expects that this is possible, so I'm guessing it allows for this internally. So now msg is simply "leaked" because its Drop method doesn't run (This Drop behavior is also platform dependent, notably the Drop does seem to still run on Windows, but not on Unix). Before this PR we knew it would be leaked, but now we think we have made the leak "defined behavior" to Rust.

So the moral of the story here is to still be extremely careful with the closure supplied to try_catch() and top_level_exec(). If the closure longjmps, then you cannot expect any destructors to run, so the closures should only create PODs if possible. But we now believe that jumping over the small callback() frame is now well defined.

@lionel- lionel- mentioned this pull request Feb 24, 2025
Copy link
Contributor

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These C-unwind tags make it official that we expect longjumps to traverse our Rust frames (cf https://blog.rust-lang.org/2023/07/13/Rust-1.71.0.html#c-unwind-abi).

I don't feel like this solution is necessarily wrong? The situation hasn't changed since last time we discussed this and determined it would be prohibitive to have a 100% safe cross language boundary? We're generally careful in catching longjumps when they are expected (though I'm sure it happens).

@DavisVaughan DavisVaughan changed the title Minimal but almost certainly wrong Windows unwind solution Switch from extern "C" to extern "C-unwind" Feb 24, 2025
@DavisVaughan DavisVaughan requested a review from lionel- February 24, 2025 16:09
@lionel- lionel- merged commit 839eb0b into main Feb 25, 2025
6 checks passed
@lionel- lionel- deleted the test/windows-c-unwind branch February 25, 2025 10:44
@lionel-
Copy link
Contributor

lionel- commented Feb 25, 2025

Merging to get these sweet green check marks again!
Thanks for getting at the bottom of this!

@github-actions github-actions bot locked and limited conversation to collaborators Feb 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failures on Windows
2 participants