(S) RFR: 8159461: bigapps/Kitchensink/stressExitCode hits assert: Must be VMThread or JavaThread

Volker Simonis volker.simonis at gmail.com
Mon Aug 8 13:35:41 UTC 2016


Hi David,

looks good now.

Thanks,
Volker


On Fri, Aug 5, 2016 at 4:28 AM, David Holmes <david.holmes at oracle.com> wrote:
> Hi Volker,
>
> Thanks for looking at this.
>
> On 5/08/2016 1:48 AM, Volker Simonis wrote:
>>
>> Hi David,
>>
>> thanks for doing this change on all platforms.
>> The fix looks good. Maybe you can just extend the following comment with
>> something like:
>>
>>  //  Note that the SR_lock plays no role in this suspend/resume protocol.
>>  //  It is only used in SR_handler as a thread termination indicator if
>> NULL.
>
>
> Darn this code is confusing - too many "SR"'s :( I have added
>
> //  Note that the SR_lock plays no role in this suspend/resume protocol,
> //  but is checked for NULL in SR_handler as a thread termination indicator.
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~dholmes/8159461/webrev.v2/
>
> This also reminded me to follow up on why the Solaris SR_handler is
> different and I found it is not actually installed as a direct signal
> handler, but is called from the real signal handler if dealing with a
> JavaThread or the VMThread. Consequently the Solaris version of the
> SR_handler can not encounter this specific bug and so I have reverted the
> changes to os_solaris.cpp
>
> Thanks,
> David
>
>
>> Regards,
>> Volker
>>
>> On Wed, Aug 3, 2016 at 3:13 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     webrev: http://cr.openjdk.java.net/~dholmes/8159461/webrev/
>>     <http://cr.openjdk.java.net/~dholmes/8159461/webrev/>
>>
>>     bug: https://bugs.openjdk.java.net/browse/JDK-8159461
>>     <https://bugs.openjdk.java.net/browse/JDK-8159461>
>>
>>     The suspend/resume signal (SR_signum) is never sent to a thread once
>>     it has started to terminate. On one platform (SuSE 12) we have seen
>>     what appears to be a "stuck" signal, which is only delivered when
>>     the terminating thread restores its original signal mask (as if
>>     pthread_sigmask makes the system realize there is a pending signal -
>>     we already check the signal was not blocked). At this point in the
>>     thread termination we have freed the osthread, so the the SR_handler
>>     would access deallocated memory. In debug builds we first hit an
>>     assertion that the current thread is a JavaThread or the VMThread -
>>     that assertion fails, even though it is a JavaThread, because we
>>     have already executed the ~JavaThread destructor and inside the
>>     ~Thread destructor we are a plain Thread not a JavaThread.
>>
>>     The fix was to make a small adjustment to the thread termination
>>     process so that we delete the SR_lock before calling
>>     os::free_thread(). In the SR_handler() we can then use a NULL check
>>     of SR_lock() to indicate the thread has terminated and we return.
>>
>>     While only seen on Linux I took the opportunity to apply the fix on
>>     all platforms and also cleaned up the code where we were using
>>     Thread::current() unsafely in a signal-handling context.
>>
>>     Testing: regular tier 1 (JPRT)
>>              Kitchensink (in progress)
>>
>>     As we can't readily reproduce the problem I tested this by having a
>>     terminating thread raise SR_signum directly from within the ~Thread
>>     destructor.
>>
>>     Thanks,
>>     David
>>
>>
>


More information about the hotspot-runtime-dev mailing list