RFR 8030847: java/lang/management/ThreadMXBean/ThreadBlockedCount.java fails intermittently again

Wed Jan 8 01:51:05 PST 2014

On 8.1.2014 10:26, David Holmes wrote:
> On 8/01/2014 6:40 PM, Jaroslav Bachorik wrote:
>> Hi David,
>>
>> On 7.1.2014 03:27, David Holmes wrote:
>>> Hi Jaroslav,
>>>
>>> On 23/12/2013 10:42 PM, Jaroslav Bachorik wrote:
>>>> Please, review the following test fix:
>>>>
>>>> Issue : https://bugs.openjdk.java.net/browse/JDK-8030847
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8030847/webrev.00/
>>>>
>>>> The root cause of the intermittent failures of this test is the fact
>>>> that there is a lot of hidden places in JDK classes when the checked
>>>> thread can get BLOCKED - and it will distort the blocked count and the
>>>> test will fail. The ones identified in this case were:
>>>>
>>>> - ThreadMXBean.getThreadInfo()
>>>> - System.out.println()
>>>> - Phaser.arriveAndAwaitAdvance()
>>>>
>>>> Whether the thread gets blocked or not depends on many variables and
>>>> this makes the failure very intermittent.
>>>>
>>>> The fix consists of:
>>>> - not using ThreadMXBean.getThreadInfo() from within the tested thread
>>>> - not using System.out.println() (or any other kind of output) in the
>>>> tested thread
>>>> - not using Phaser to synchronize the tested thread and the control
>>>> thread
>>>>
>>>> The toughest part is to replace Phaser for the synchronization purposes
>>>> with a similar construct which would not block the thread when waiting
>>>> for the other party. CyclicBarrier didn't work either as probably
>>>> wouldn't not any other solution based on java.util.concurrent locks.
>>>>
>>>> The TwoPartySynchronizer provides the block-free synchronization and is
>>>> based on atomics and Thread.wait(). It is not a general purpose
>>>> replacement for Phaser or CyclicBarrier but it works very well for
>>>> exactly two parties needing progress synchronization and not wanting to
>>>> block any of the parties.
>>>
>>> I see you actually meant Thread.sleep, which does of course block the
>>
>> Yes, Thread.sleep().
>>
>>> thread but doesn't put it into the problematic "blocked" state that
>>
>> Exactly - it puts the thread to "sleeping" state.
>>
>>> affects the blocked-count. That said I don't understand why this problem
>>> exists given that by definition the use of any of the synchronizers
>>> (based on park()) should not affect the blocked count:
>>>
>>> getBlockedCount(): Returns the total number of times that the thread
>>> associated with this ThreadInfo blocked to enter or reenter a monitor.
>>>
>>> None of CyclicBarrier/Phaser etc are monitors, so the BlockedCount
>>> should not be being updated. If it is then that is a spec or
>>> implementation bug in itself :(
>>
>> Indeed, it seems so. I've run the test with JFR enabled and one can
>> distinctively see that the test fails when the thread is parked as it
>> puts the thread into the "blocked" state in the end. I've also patched
>> JVM (the attached monitor-contention.patch; applies to the hotspot
>> repository) to report the thread going into "blocked" state and printing
>> the actual stack trace at that moment and it also shows that the thread
>> goes to "blocked" state somewhere from the "park()" code.
>>
>> I am attaching the JFR recordings - one for the failing test and one for
>> the successful test.
>>
>> IMO, it seems that the ThreadInfo was not updated to reflect the
>> introduction of the park()/unpark() methods. In the current state it
>> mistakenly reports parking a thread as blocking but if the
>> implementation is updated to include only blocking on monitor entry (to
>> correspond to the API docs) we will miss the information about the
>> thread being parked (when the thread also does not execute any user
>> code). This would most probably call for the update of ThreadInfo API.
>
> park() puts you in Thread.State WAITING which is exposed via
> ThreadInfo.getWaitedCount, so I don't see any issue there. If parking is
> causing a change to the blocked count then that is a major bug in the
> underlying MXBean implementation.

Ok, so there must be something else. According the debug output I added 
to share/vm/services/threadService.hpp in contended_enter_begin(thread) 
method I can see the thread being blocked here:

1. Might be related to class loading? The code being called at the 
reported line is "LockSupport.unpark(t)"
***
Blocking on object [I
=============================================================
[Contended thread] BlockedThread
	at java.util.concurrent.Phaser.releaseWaiters(Phaser.java:982)
	at java.util.concurrent.Phaser.arriveAndAwaitAdvance(Phaser.java:705)
	at 
threads.ThreadBlockedCount1$BlockedThread.run(ThreadBlockedCount1.java:99)
[Blocked count] 1
***

2. This report is missing information about the lock and the contended 
thread. I was not able to figure out how to easily print the information 
if the current thread is not the contended thread
***
at java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067)
	at java.util.concurrent.Phaser.arriveAndAwaitAdvance(Phaser.java:690)
	at 
threads.ThreadBlockedCount1$BlockedThread.run(ThreadBlockedCount1.java:104)
	- locked <0x00000000d6e108e0> (a java.lang.Object)
[Blocked count] 1
***

One of those reports can be seen in the debug output when the test fails.

-JB-

>
> David
> -----
>
>> -JB-
>>
>>>
>>> David
>>