Skip to content

gh-109793: Allow Switching Interpreters During Finalization #109794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Sep 23, 2023

Essentially, we should check the thread ID rather than the thread state pointer.

@ericsnowcurrently ericsnowcurrently marked this pull request as ready for review September 25, 2023 17:19
@@ -46,6 +46,10 @@ typedef struct _Py_atomic_address {
atomic_uintptr_t _value;
} _Py_atomic_address;

typedef struct _Py_atomic_ulong {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use either _Py_atomic_load_uint32 or _Py_atomic_load_uint64 instead of defining your own.
Since thread id is an unsigned long, you'll need to use a macro, but it seems better to use the pre-existing functions.

#if SIZEOF_LONG == 8
#define _Py_atomic_load_ulong _Py_atomic_load_uint64
#elif SIZEOF_LONG == 4
#define _Py_atomic_load_ulong _Py_atomic_load_uint32
#else
#error "long must be 4 or 8 bytes in size"
#endif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ericsnowcurrently ericsnowcurrently merged commit 32466c9 into python:main Sep 27, 2023
@miss-islington
Copy link
Contributor

Thanks @ericsnowcurrently for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @ericsnowcurrently, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 32466c97c06ee5812923d695195394c736eeb707 3.12

@ericsnowcurrently ericsnowcurrently deleted the fix-finalization-other-tstate branch September 27, 2023 19:49
csm10495 pushed a commit to csm10495/cpython that referenced this pull request Sep 28, 2023
…thongh-109794)

Essentially, we should check the thread ID rather than the thread state pointer.
@vstinner
Copy link
Member

Sorry, @ericsnowcurrently, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 32466c9 3.12

Is it really reasonable to backport this major change in a stable branch? IMO it's too late to change that, no?


This change introduced a regression: Python started to crash again randomly at exit when they are daemon threads: see issue gh-110052. I confirmed (by manual testing) that the commit 32466c9 introduced the regression:

  • Before this commit, I cannot reproduce the issue in 5 minutes.
  • At this commit, I easily reproduce the issue in less than a minute.

@ericsnowcurrently: Please have a look. If you don't have the bandwidth to repair the regression, I will revert soon the change.

This code is very fragile :-( I fixed multiple bugs to try to handle all cases:

Note: By the way, recently, I fixed a race condition in _thread.start_new_thread(), but it doesn't seem to be related: commit 517cd82.


Before the commit, the test was fine:

vstinner@mona$ ./python -m test test_threading -m test_4_daemon_threads -j50 -F --fail-env-changed  
(...)
0:07:34 load avg: 91.03 [1640] test_threading passed
0:07:34 load avg: 91.03 [1641] test_threading passed
^C

A system load of 91.03 is high knowning that my laptop CPU has 6 cores / 12 threads.


I'm not comfortable with this change which has comments like:

        // XXX This isn't completely safe from daemon thraeds,
        // since tstate might be a dangling pointer.

Well, if it's not safe: don't do it.

@vstinner
Copy link
Member

Note: _PyRuntimeState_GetFinalizing() uses the "old" pycore_atomic.h for API, whereas _PyRuntimeState_GetFinalizingID() uses the "new" pyatomic.h API.

static inline PyThreadState*
_PyRuntimeState_GetFinalizing(_PyRuntimeState *runtime) {
    // pycore_atomic.h                                                    
    return (PyThreadState*)_Py_atomic_load_relaxed(&runtime->_finalizing);     
}

static inline unsigned long
_PyRuntimeState_GetFinalizingID(_PyRuntimeState *runtime) {        
    // pyatomic.h
    return _Py_atomic_load_ulong_relaxed(&runtime->_finalizing_id);         
}                  

@ericsnowcurrently
Copy link
Member Author

I'm looking into the crashes right now.

@ericsnowcurrently
Copy link
Member Author

At this commit, I easily reproduce the issue in less than a minute.

I'm going to follow up on gh-110052.

I'm not comfortable with this change which has comments like:

   // XXX This isn't completely safe from daemon thraeds,
  // since tstate might be a dangling pointer.

Well, if it's not safe: don't do it.

That's a comment I added about existing code. 😄

@vstinner
Copy link
Member

That's a comment I added about existing code. 😄

In Python 3.11, _PyThreadState_MustExit() does not deference tstate because it can be a dangling pointer: it's explained in the comment.

// Check if a Python thread must exit immediately, rather than taking the GIL
// if Py_Finalize() has been called.
//              
// When this function is called by a daemon thread after Py_Finalize() has been
// called, the GIL does no longer exist.
//              
// tstate can be a dangling pointer (point to freed memory): only tstate value
// is used, the pointer is not deferenced.
//              
// tstate must be non-NULL.
int         
_PyThreadState_MustExit(PyThreadState *tstate)
{           
    /* bpo-39877: Access _PyRuntime directly rather than using
       tstate->interp->runtime to support calls from Python daemon threads.
       After Py_Finalize() has been called, tstate can be a dangling pointer:
       point to PyThreadState freed memory. */
    PyThreadState *finalizing = _PyRuntimeState_GetFinalizing(&_PyRuntime);
    return (finalizing != NULL && finalizing != tstate);
}

By the way, in the main branch, the function still has this comment:

// tstate can be a dangling pointer (point to freed memory): only tstate value
// is used, the pointer is not deferenced.

@ericsnowcurrently
Copy link
Member Author

Is it really reasonable to backport this major change in a stable branch? IMO it's too late to change that, no?

The main motivation is to address the situation where a subinterpreter gets cleaned up while the runtime is finalizing. Without this fix, the process immediately exits with an exitcode of 0, even if it would have exited with 1 (and the rest of runtime finalization is skipped). That seems like it makes a backport worth it.

@ericsnowcurrently
Copy link
Member Author

In Python 3.11, _PyThreadState_MustExit() does not deference tstate because it can be a dangling pointer

I added the _Py_IsInterpreterFinalizing(tstate->interp) check in 26baa74 (May 2023). A dangling pointer with that check isn't an issue during runtime finalization since we just verified that the runtime isn't finalizing yet.

The cases where it could be a problem are strictly where a subinterpreter has daemon threads. That is unlikely since subinterpreters have daemon threads disabled by default.

ericsnowcurrently added a commit that referenced this pull request Sep 29, 2023
@ericsnowcurrently
Copy link
Member Author

FYI, any backport will need to include the fix in gh-110053 (6364873). Likewise, any revert will need to revert that change.

ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this pull request Oct 11, 2023
…thongh-109794)

Essentially, we should check the thread ID rather than the thread state pointer.
@bedevere-app
Copy link

bedevere-app bot commented Oct 11, 2023

GH-110705 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Oct 11, 2023
ericsnowcurrently added a commit that referenced this pull request Nov 28, 2023
…h-109794) (gh-110705)

Essentially, we should check the thread ID rather than the thread state pointer.
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
…thongh-109794)

Essentially, we should check the thread ID rather than the thread state pointer.
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants