RFR 8160892: VM warning: WaitForMultipleObjects timed out
Ivan Gerasimov
ivan.gerasimov at oracle.com
Thu Jul 14 18:46:19 UTC 2016
Thank you David for looking into this!
Here's the webrev updated in accordance with your and Daniel's suggestions:
http://cr.openjdk.java.net/~igerasim/8160892/01/webrev/
Please see my answers inline
> Nit: can we change 'registered_itself" to just "registered" please.
Done.
>
> Can you explain under what conditions a thread will now reach the
> self-suspension code. Is that only if an error occurred such that we
> were unable to register our handle for the process-exiting thread to
> wait on? If so some commentary on that block seems appropriate -
> perhaps more appropriate there than back up at the place where it
> failed to get the handle (as Dan requested).
>
There are three kinds of threads, which can be caught in that
self-suspension loop:
1) All threads that want to end (by calling _endthreadex()) *after* some
process-exiting thread raised the flag `process_exiting`.
The rationale here is that we know that the whole process is going to be
terminated quite soon, so we do not allow any thread to call
_endthreadex(), which seems to have the concurrency bug.
2) Any thread that wants to end the whole process, after some other
thread raised the flag `process_exiting`.
If more than one threads want to end the process, we let to do it only
the thread that could raise the flag `process_exiting`. All other such
threads will have to suspend themselves.
3) (Unlikely to happen in practice) Any thread that wants to end by
calling _endthreadex(), but which failed to register itself due to
failure of DuplicateHandle().
Here we still have a race, which can result in a wrong exit code of the
process.
> Given we keep missing conditions I'm only cautiously optimistic about
> this.
> And I'd like to understand how we still sometimes end up exiting with
> an "error code" that seems to be the value of an exception! :(
The last time the sentinel exit code =20115 was reported almost a year ago.
After that the fix for JDK-8145127 had gone in, and I didn't see any
more reports about wrong exit codes since then.
In particular, that fix worked around the situation when more than one
threads concurrently call System.exit(), which might have caused a race.
With kind regards,
Ivan
>
> Thanks,
> David
>
>> With kind regards,
>> Ivan
>>
>
More information about the serviceability-dev
mailing list