RFR 8160892: VM warning: WaitForMultipleObjects timed out

Ivan Gerasimov ivan.gerasimov at oracle.com
Thu Jul 14 18:46:19 UTC 2016


Thank you David for looking into this!

Here's the webrev updated in accordance with your and Daniel's suggestions:
http://cr.openjdk.java.net/~igerasim/8160892/01/webrev/

Please see my answers inline


> Nit: can we change 'registered_itself" to just "registered" please.
Done.

>
> Can you explain under what conditions a thread will now reach the 
> self-suspension code. Is that only if an error occurred such that we 
> were unable to register our handle for the process-exiting thread to 
> wait on? If so some commentary on that block seems appropriate - 
> perhaps more appropriate there than back up at the place where it 
> failed to get the handle (as Dan requested).
>
There are three kinds of threads, which can be caught in that 
self-suspension loop:
1) All threads that want to end (by calling _endthreadex()) *after* some 
process-exiting thread raised the flag `process_exiting`.
The rationale here is that we know that the whole process is going to be 
terminated quite soon, so we do not allow any thread to call 
_endthreadex(), which seems to have the concurrency bug.
2) Any thread that wants to end the whole process, after some other 
thread raised the flag `process_exiting`.
If more than one threads want to end the process, we let to do it only 
the thread that could raise the flag `process_exiting`.  All other such 
threads will have to suspend themselves.
3) (Unlikely to happen in practice) Any thread that wants to end by 
calling _endthreadex(), but which failed to register itself due to 
failure of DuplicateHandle().
Here we still have a race, which can result in a wrong exit code of the 
process.

> Given we keep missing conditions I'm only cautiously optimistic about 
> this.
> And I'd like to understand how we still sometimes end up exiting with 
> an "error code" that seems to be the value of an exception! :(

The last time the sentinel exit code =20115 was reported almost a year ago.
After that the fix for JDK-8145127 had gone in, and I didn't see any 
more reports about wrong exit codes since then.
In particular, that fix worked around the situation when more than one 
threads concurrently call System.exit(), which might have caused a race.

With kind regards,
Ivan

>
> Thanks,
> David
>
>> With kind regards,
>> Ivan
>>
>



More information about the hotspot-runtime-dev mailing list