(S) RFR: 8159461: bigapps/Kitchensink/stressExitCode hits assert: Must be VMThread or JavaThread

Volker Simonis volker.simonis at gmail.com
Thu Aug 4 15:48:13 UTC 2016


Hi David,

thanks for doing this change on all platforms.
The fix looks good. Maybe you can just extend the following comment with
something like:

 //  Note that the SR_lock plays no role in this suspend/resume protocol.
 //  It is only used in SR_handler as a thread termination indicator if
NULL.

Regards,
Volker

On Wed, Aug 3, 2016 at 3:13 AM, David Holmes <david.holmes at oracle.com>
wrote:

> webrev: http://cr.openjdk.java.net/~dholmes/8159461/webrev/
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8159461
>
> The suspend/resume signal (SR_signum) is never sent to a thread once it
> has started to terminate. On one platform (SuSE 12) we have seen what
> appears to be a "stuck" signal, which is only delivered when the
> terminating thread restores its original signal mask (as if pthread_sigmask
> makes the system realize there is a pending signal - we already check the
> signal was not blocked). At this point in the thread termination we have
> freed the osthread, so the the SR_handler would access deallocated memory.
> In debug builds we first hit an assertion that the current thread is a
> JavaThread or the VMThread - that assertion fails, even though it is a
> JavaThread, because we have already executed the ~JavaThread destructor and
> inside the ~Thread destructor we are a plain Thread not a JavaThread.
>
> The fix was to make a small adjustment to the thread termination process
> so that we delete the SR_lock before calling os::free_thread(). In the
> SR_handler() we can then use a NULL check of SR_lock() to indicate the
> thread has terminated and we return.
>
> While only seen on Linux I took the opportunity to apply the fix on all
> platforms and also cleaned up the code where we were using
> Thread::current() unsafely in a signal-handling context.
>
> Testing: regular tier 1 (JPRT)
>          Kitchensink (in progress)
>
> As we can't readily reproduce the problem I tested this by having a
> terminating thread raise SR_signum directly from within the ~Thread
> destructor.
>
> Thanks,
> David
>


More information about the hotspot-runtime-dev mailing list