jmx-dev RFR: 8020875 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails intermittently

Jaroslav Bachorik jaroslav.bachorik at oracle.com
Thu Jul 25 05:28:02 PDT 2013


On 07/25/2013 07:07 AM, David Holmes wrote:
> On 25/07/2013 12:08 AM, Jaroslav Bachorik wrote:
>> On 07/24/2013 03:17 PM, Chris Hegarty wrote:
>>> On 24/07/2013 13:49, Jaroslav Bachorik wrote:
>>>> On 07/24/2013 02:32 PM, Chris Hegarty wrote:
>>>>> On 24/07/2013 12:21, David Holmes wrote:
>>>>>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>>>>>
>>>>>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>>>>>> So we have 2 kinds of issues here:
>>>>>>>> 1) the test related, like Thread state checking, we can fix them in
>>>>>>>> the test
>>>>>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it
>>>>>>>> (add
>>>>>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>>>>>> times) in the test to make the test pass. Mandy is the expert and
>>>>>>>> better to get her opinion.
>>>>>>>
>>>>>>> It's probably a race in the VM implementation in determining the
>>>>>>> thread
>>>>>>> count. You will need to diagnose the VM implementation and
>>>>>>> compare the
>>>>>>> thread list and the implementation of getting the thread count
>>>>>>> (check
>>>>>>> hotspot/src/share/vm/services/threadService.cpp)
>>>>>>
>>>>>> There is a considerable code path between the point where a
>>>>>> terminating
>>>>>> thread causes Thread.join() to be allowed to return, and the point
>>>>>> where
>>>>>> the live thread count gets decremented. So using join() does not help
>>>>>> here. Arguably JVMTI should have based its counts around the
>>>>>> lifecycle
>>>>>> of the Java thread not the underlying native thread.
>>>>>
>>>>> It appears, from my reading of the code, that this situation ( a
>>>>> thread
>>>>> exiting ) should be handled. Or maybe I'm looking at the wrong
>>>>> interface.
>>>>>
>>>>> JavaThread::exit(...) {
>>>>>     ...
>>>>>     ThreadService::current_thread_exiting(this);
>>>>>     ...
>>>>>     ensure_join(..)
>>>>>     ...
>>>>> }
>>>>>
>>>>> So the exiting thread should be removed from the live thread count
>>>>> before Thread.join returns.
>>>>
>>>> Unfortunately, ensure_join(...) is called on line 1860 but
>>>> Threads::remove(this), which does the actual cleanup of the live
>>>> threads
>>>> counter, is called only on line 1919, leaving at least a few ns window
>>>> when the thread is reported as terminated in java but the counters
>>>> haven't been updated yet.
>>>
>>> Again, maybe I'm missing something but,
>>>
>>> static jlong get_live_thread_count()        { return
>>> _live_threads_count->get_value() - _exiting_threads_count; }
>>>
>>>   ... and current_thread_exiting(..) increments
>>> _exiting_threads_count, no?
>>
>> Well, apparently it does.
> 
> Yes. Thanks Chris I completely missed the use of the
> _exiting_threads_count to address this very issue.
> 
>> I am a complete stranger to the concurrency issues in the hotspot -
>> would it be possible that in ThreadService::remove_thread(..) the
>> _exiting_threads_count is decremented but _live_threads_count hasn't
>> been updated yet when someone calls the get_live_thread_count() function?
> 
> Yes. Updates are guarded by acquiring the Threads_lock, but reads are
> not. So it is indeed possible to request the live count between the
> decrement of the exiting count and the decrement of the live count
> itself. Mind you that is an extremely small window of opportunity in
> terms of this bug manifesting as often as it does.
> 
> Because get_live_thread_count returns the sum of two variables it has to
> use the same synchronization as is used to update those variables to
> ensure it returns a valid value. We can't grab the Threads_lock directly
> in get_live_thread_count as it is already called from code that holds
> the lock. So we would have to push this out to management.cpp's
> get_long_attribute.

I have filed a separate issue for hotspot/svc (JDK-8021335)

For the time being I propose modifying the test to be less race-prone in
java and adding a timeout of 500ms after terminating a number of threads.

The test modifications are at
http://cr.openjdk.java.net/~jbachorik/8020875/webrev.02

Thanks,

-JB-

> 
> David
> -----
> 
>> -JB-
>>
>>>
>>> -Chris.
>>>
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> -Chris.
>>>>>
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Mandy
>>>>
>>



More information about the jmx-dev mailing list