RFR 8069048: (process) Suspend finishing threads when process exits [win]

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Jan 16 00:01:08 UTC 2015


On 1/15/15 5:09 AM, Ivan Gerasimov wrote:
> Hello everyone!
>
> This is yet another iteration in the attempt to solve the 'wrong exit 
> code' bug on Windows [1].
> The wrong exit code has been observed once again with one of the 
> regression tests.
>
> The suspicion is that this might be due to the critical section had 
> been destroyed before all the children threads were terminated.
> In that case, one of the threads had been unblocked and called 
> _endthreadex(), which resulted in a race.
>
> To address this scenario, it is proposed to make the thread that is 
> about to exit the process raise a flag.
> If the flag is raised, any threads wishing to exit should instead 
> suspend themselves.
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8069048
> WEBREV: http://cr.openjdk.java.net/~igerasim/8069048/0/webrev/

src/os/windows/vm/os_windows.cpp
     line 3895: // don't let the current thread to proceed to _endthreadex()
         Typo: 'let the current thread to proceed to'
            -> 'let the current thread proceed to'

     Just making sure that I understand the revised algorithm:

     - before the EPT_PROCESS thread gets here, EPT_THREAD threads
       will work as before and call line 3909 _endthreadex()

     - after the EPT_PROCESS thread gets here and sets the flag
       on line 3886: OrderAccess::release_store(&process_exiting, 1);

       - an EPT_THREAD thread may have made it past flag check on line
         3802: } else if (OrderAccess::load_acquire(&process_exiting) == 
0) {
         but it will be blocked on line 3803: 
EnterCriticalSection(&crit_sect);

       - an EPT_THREAD thread that sees the flag set on line 3802 will
         drop into the self-suspend block on lines 3892-3900

     - after the EPT_PROCESS thread exits the critical section, then
       any EPT_THREAD threads that were blocked trying to acquire
       the critical section will now see the flag set on line 3805:
       if (what == EPT_THREAD && 
OrderAccess::load_acquire(&process_exiting) == 0) {
       and drop into the self-suspend block on lines 3892-3900

     Short version: any EPT_THREAD threads that arrive after the
     EPT_PROCESS thread owns the critical section will never call
     line 3909 _endthreadex() because they self-suspend.

     OK, I concur that this new algorithm looks correct and will reduce
     the number of threads racing through line 3909 _endthreadex() while
     the EPT_PROCESS thread is trying to call exit().

     One possible hole remains that we've discussed before. If an
     EPT_THREAD thread calls _endthreadex() before the EPT_PROCESS
     thread gets here, and if the EPT_THREAD thread stalls in
     _endthreadex(), then it's still possible for that EPT_THREAD
     thread to mess up the exit code if it unblocks after the
     EPT_PROCESS thread has set the exit code. We've discussed this
     before and there's nothing we can do about other than try and
     reduce the probability by reducing the number of EPT_THREAD
     threads that are calling _endthreadex().

     Thumbs up!


Side note: A new failure of this mechanism was filed recently:

JDK-8069068 VM warning: WaitForMultipleObjects timed out (0) ...
https://bugs.openjdk.java.net/browse/JDK-8069068

     The bug was filed against JDK9-B45 so it has the most recent
     fix (https://bugs.openjdk.java.net/browse/JDK-8066863). The
     weird part is that WaitForMultipleObjects() timed out without
     an error code being set. Don't know if that means anything in
     particular in the Win* APIS...

     This fix (8069048) may also reduce the probability of this
     failure mode because we'll be queueing fewer threads on the
     handle list...

Dan


>
> [1] https://bugs.openjdk.java.net/browse/JDK-6573254
>
> Sincerely yours,
> Ivan



More information about the serviceability-dev mailing list