RFR(S) 8021335: Missing synchronization when reading counters for live threads and peak thread count

dean.long at oracle.com dean.long at oracle.com
Wed Oct 17 21:10:48 UTC 2018


On 10/16/18 7:33 PM, David Holmes wrote:
> Hi Dean,
>
> Thanks for tackling this.
>
> I'm still struggling to fully grasp why we need both the PerfCounters 
> and the regular counters. I get that we have to decrement the live 
> counts before ensure_join() has allowed Thread.join() to return, to 
> ensure that if we then check the number of threads it has dropped by 
> one. But I don't understand why that means we need to manage the 
> thread count in two parts. Particularly as now you don't use the 
> PerfCounter to return the live count, so it makes me wonder what role 
> the PerfCounter is playing as it is temporarily inconsistent with the 
> reported live count? Is it simply that we can't atomically decrement 
> the PerfCounters, and we can't require the Threads_lock when 
> decrementing?
>

Good questions.  I didn't know the history, so I tried not to change too 
much.  The PerfCounters appear to be there to support StatSampler.  I 
think it's OK for them to be a little out of sync. If we wanted to make 
them more in sync with the atomic counters, I think we could do any of 
the following:
- Grab Threads_lock before SR_lock() where we call 
current_thread_exiting, and update perf counts at that time
- instead of decrementing in remove_thread, do this in 
decrement_thread_counts
   _live_threads_count->set_value(_atomic_threads_count);
   _daemon_threads_count->set_value(_atomic_daemon_threads_count);
( I see that Mandy has suggested the same thing.)
- DO update the perf counts atomically
However, I consider the PerfCounters inconsistency a separate issue from 
this bug.

> On the code itself one thing I find irksome is that we already have a 
> daemon flag in remove_thread but decrement_thread_counts doesn't get 
> passed that and so always re-examines the thread to see if it is a 
> daemon. I know only one of the code paths has the daemon flag, so 
> perhaps we should hoist the daemon check up into JavaThread::exit and 
> then pass a daemon parameter to decrement_thread_counts.
>

OK, I've fixed this.

> There's also some ambiguity (existing) in relation to the existence of 
> the jt->threadObj() and its affect on whether the thread is considered 
> a daemon or not. Your check has to mirror the existing checks - the 
> threadObj may be NULL at removal if it was a failed attach, but the 
> addition considers a JNI-attached thread to be non-daemon. Overall 
> this seems buggy to me but as long as your check follows the same 
> rules that is okay. In that regard JavaThread::exit should never 
> encounter a NULL threadObj().
>

I refactored is_daemon checks into a single helper function.

> Minor nit: I suggest moving the two occurrences of the comment
>
> // Do not count VM internal or JVMTI agent threads
>
> inside is_hidden_thread so that we don't have to remember to update 
> the comments if the definition changes.
>

OK.  New webrev:

http://cr.openjdk.java.net/~dlong/8021335/webrev.5/

dl

> Thanks,
> David
> -----
>
> On 17/10/2018 9:43 AM, dean.long at oracle.com wrote:
>> New webrev:
>>
>> http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
>>
>> dl
>>
>> On 10/16/18 11:59 AM, dean.long at oracle.com wrote:
>>> On 10/16/18 10:28 AM, JC Beyler wrote:
>>>> Hi Dean,
>>>>
>>>> I noticed a typo here :
>>>> "are atomically" is in the comment but should no longer be there I 
>>>> believe; the sentence probably should be:
>>>> // These 2 counters are like the above thread counts, but are
>>>>
>>>
>>> Fixed!
>>>
>>>> Any way we could move this test into a helper method and both 
>>>> add_thread/current_thread_exiting could use the same "ignore" 
>>>> thread code so that we ensure the two never get out of sync?
>>>>
>>>> +  if (jt->is_hidden_from_external_view() ||
>>>> +      jt->is_jvmti_agent_thread()) {
>>>> +    return;
>>>> +  }
>>>>
>>>
>>> Good idea.  Fixed.
>>>
>>>> (Also by curiosity, add_thread has an assert on the Threads_lock 
>>>> but not thread_exiting?)
>>>>
>>>
>>> Right, I followed the existing pattern where 
>>> current_thread_exiting() only uses the atomic counters, so it 
>>> doesn't need Threads_lock.
>>>
>>> dl
>>>
>>>> Thanks,
>>>> Jc
>>>>
>>>>
>>>> On Tue, Oct 16, 2018 at 9:52 AM <dean.long at oracle.com 
>>>> <mailto:dean.long at oracle.com>> wrote:
>>>>
>>>>     https://bugs.openjdk.java.net/browse/JDK-8021335
>>>>     http://cr.openjdk.java.net/~dlong/8021335/webrev.3/
>>>> <http://cr.openjdk.java.net/%7Edlong/8021335/webrev.3/>
>>>>
>>>>     This change attempts to fix an old and intermittent problem with
>>>>     ThreadMXBean thread counts.
>>>>     Changes have 3 main parts:
>>>>     1. Make sure hidden threads don't affect counts (even when 
>>>> exiting)
>>>>     2. Fix race reading counts using atomic counters
>>>>     3. Remove some workarounds in test (because they hide bugs)
>>>>
>>>>     dl
>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> Thanks,
>>>> Jc
>>>
>>



More information about the serviceability-dev mailing list