RFR 8069048: (process) Suspend finishing threads when process exits [win]
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Jan 16 00:01:08 UTC 2015
On 1/15/15 5:09 AM, Ivan Gerasimov wrote:
> Hello everyone!
>
> This is yet another iteration in the attempt to solve the 'wrong exit
> code' bug on Windows [1].
> The wrong exit code has been observed once again with one of the
> regression tests.
>
> The suspicion is that this might be due to the critical section had
> been destroyed before all the children threads were terminated.
> In that case, one of the threads had been unblocked and called
> _endthreadex(), which resulted in a race.
>
> To address this scenario, it is proposed to make the thread that is
> about to exit the process raise a flag.
> If the flag is raised, any threads wishing to exit should instead
> suspend themselves.
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8069048
> WEBREV: http://cr.openjdk.java.net/~igerasim/8069048/0/webrev/
src/os/windows/vm/os_windows.cpp
line 3895: // don't let the current thread to proceed to _endthreadex()
Typo: 'let the current thread to proceed to'
-> 'let the current thread proceed to'
Just making sure that I understand the revised algorithm:
- before the EPT_PROCESS thread gets here, EPT_THREAD threads
will work as before and call line 3909 _endthreadex()
- after the EPT_PROCESS thread gets here and sets the flag
on line 3886: OrderAccess::release_store(&process_exiting, 1);
- an EPT_THREAD thread may have made it past flag check on line
3802: } else if (OrderAccess::load_acquire(&process_exiting) ==
0) {
but it will be blocked on line 3803:
EnterCriticalSection(&crit_sect);
- an EPT_THREAD thread that sees the flag set on line 3802 will
drop into the self-suspend block on lines 3892-3900
- after the EPT_PROCESS thread exits the critical section, then
any EPT_THREAD threads that were blocked trying to acquire
the critical section will now see the flag set on line 3805:
if (what == EPT_THREAD &&
OrderAccess::load_acquire(&process_exiting) == 0) {
and drop into the self-suspend block on lines 3892-3900
Short version: any EPT_THREAD threads that arrive after the
EPT_PROCESS thread owns the critical section will never call
line 3909 _endthreadex() because they self-suspend.
OK, I concur that this new algorithm looks correct and will reduce
the number of threads racing through line 3909 _endthreadex() while
the EPT_PROCESS thread is trying to call exit().
One possible hole remains that we've discussed before. If an
EPT_THREAD thread calls _endthreadex() before the EPT_PROCESS
thread gets here, and if the EPT_THREAD thread stalls in
_endthreadex(), then it's still possible for that EPT_THREAD
thread to mess up the exit code if it unblocks after the
EPT_PROCESS thread has set the exit code. We've discussed this
before and there's nothing we can do about other than try and
reduce the probability by reducing the number of EPT_THREAD
threads that are calling _endthreadex().
Thumbs up!
Side note: A new failure of this mechanism was filed recently:
JDK-8069068 VM warning: WaitForMultipleObjects timed out (0) ...
https://bugs.openjdk.java.net/browse/JDK-8069068
The bug was filed against JDK9-B45 so it has the most recent
fix (https://bugs.openjdk.java.net/browse/JDK-8066863). The
weird part is that WaitForMultipleObjects() timed out without
an error code being set. Don't know if that means anything in
particular in the Win* APIS...
This fix (8069048) may also reduce the probability of this
failure mode because we'll be queueing fewer threads on the
handle list...
Dan
>
> [1] https://bugs.openjdk.java.net/browse/JDK-6573254
>
> Sincerely yours,
> Ivan
More information about the hotspot-runtime-dev
mailing list