RFR 8145127: VM warning: WaitForMultipleObjects timed out (0) ...

Ivan Gerasimov ivan.gerasimov at oracle.com
Tue Dec 15 12:00:57 UTC 2015



On 15.12.2015 9:31, David Holmes wrote:
> I really wish we had some view inside windows to see exactly what is 
> going wrong here. :(
>

Yes, so do I!
It would be really helpful to know exactly how this race bug is 
"implemented".
Or, at least, how to deal with it to avoid the return code replacement 
for sure.

> On 15/12/2015 12:09 AM, Ivan Gerasimov wrote:
>> Hello!
>>
>> There was a timeout observed in os_windows.cpp at the line
>> 3945           res = WaitForMultipleObjects(MAXIMUM_WAIT_OBJECTS,
>> handles, FALSE, EXIT_TIMEOUT);
>>
>> This means there were more than 64 threads exiting at the same time and
>> none of the first 64 threads could make any progress during 5 minutes.
>> Sigh.
>>
>> To address this issue I suggest two things:
>> 1)
>> Increase the size of the queue of exiting threads.
>> We'll still have to wait for only the first 64 threads, if the queue is
>> exhausted.
>> But the chances we hit this condition are greatly reduced.
>>
>> 2)
>> Raise process_exiting flag earlier, i.e. before trying to enter the
>> critical section.
>> This should decrease the number of threads, contending for a slot in the
>> 'handles' array during the process exit.
>>
>> Additionally, it is proposed to suspend all the threads, but the one
>> which raised the flag 'process_exiting'.
>> It would be important in a case, when two threads are about to call
>> exit() concurrently.
>> Otherwise, a race could be faced, if the first thread is waiting for all
>> the registered handles, while the second one skips the critical section
>> altogether and calls ::exit().
>>
>> Build went fine on all platforms.  The JTREG tests from 'hotspot' subset
>> all pass.
>>
>> Would you please help review the proposed fix?
>>
>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8145127
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8145127/00/webrev/
>
> Can't quite get my head round the whole strategy, but in this loop:
>
> 4033       while (pr_ex != curr_id) {
>
> pr_ex is never updated!
>
It's intentional.
It's not meant the current thread will ever leave this loop, once enters.
The first line of the loop body will effectively suspend the current 
thread, so it will never call _endthreadex(), exit() or _exit() itself.
Just for the extra protection, the SuspendThread() call is wrapped into 
an infinite loop.

Sincerely yours,
Ivan



More information about the serviceability-dev mailing list