RFR(S) 8021335: Missing synchronization when reading counters for live threads and peak thread count
dean.long at oracle.com
dean.long at oracle.com
Wed Oct 17 21:10:48 UTC 2018
On 10/16/18 7:33 PM, David Holmes wrote:
> Hi Dean,
>
> Thanks for tackling this.
>
> I'm still struggling to fully grasp why we need both the PerfCounters
> and the regular counters. I get that we have to decrement the live
> counts before ensure_join() has allowed Thread.join() to return, to
> ensure that if we then check the number of threads it has dropped by
> one. But I don't understand why that means we need to manage the
> thread count in two parts. Particularly as now you don't use the
> PerfCounter to return the live count, so it makes me wonder what role
> the PerfCounter is playing as it is temporarily inconsistent with the
> reported live count? Is it simply that we can't atomically decrement
> the PerfCounters, and we can't require the Threads_lock when
> decrementing?
>
Good questions. I didn't know the history, so I tried not to change too
much. The PerfCounters appear to be there to support StatSampler. I
think it's OK for them to be a little out of sync. If we wanted to make
them more in sync with the atomic counters, I think we could do any of
the following:
- Grab Threads_lock before SR_lock() where we call
current_thread_exiting, and update perf counts at that time
- instead of decrementing in remove_thread, do this in
decrement_thread_counts
_live_threads_count->set_value(_atomic_threads_count);
_daemon_threads_count->set_value(_atomic_daemon_threads_count);
( I see that Mandy has suggested the same thing.)
- DO update the perf counts atomically
However, I consider the PerfCounters inconsistency a separate issue from
this bug.
> On the code itself one thing I find irksome is that we already have a
> daemon flag in remove_thread but decrement_thread_counts doesn't get
> passed that and so always re-examines the thread to see if it is a
> daemon. I know only one of the code paths has the daemon flag, so
> perhaps we should hoist the daemon check up into JavaThread::exit and
> then pass a daemon parameter to decrement_thread_counts.
>
OK, I've fixed this.
> There's also some ambiguity (existing) in relation to the existence of
> the jt->threadObj() and its affect on whether the thread is considered
> a daemon or not. Your check has to mirror the existing checks - the
> threadObj may be NULL at removal if it was a failed attach, but the
> addition considers a JNI-attached thread to be non-daemon. Overall
> this seems buggy to me but as long as your check follows the same
> rules that is okay. In that regard JavaThread::exit should never
> encounter a NULL threadObj().
>
I refactored is_daemon checks into a single helper function.
> Minor nit: I suggest moving the two occurrences of the comment
>
> // Do not count VM internal or JVMTI agent threads
>
> inside is_hidden_thread so that we don't have to remember to update
> the comments if the definition changes.
>
OK. New webrev:
http://cr.openjdk.java.net/~dlong/8021335/webrev.5/
dl
> Thanks,
> David
> -----
>
> On 17/10/2018 9:43 AM, dean.long at oracle.com wrote:
>> New webrev:
>>
>> http://cr.openjdk.java.net/~dlong/8021335/webrev.4/
>>
>> dl
>>
>> On 10/16/18 11:59 AM, dean.long at oracle.com wrote:
>>> On 10/16/18 10:28 AM, JC Beyler wrote:
>>>> Hi Dean,
>>>>
>>>> I noticed a typo here :
>>>> "are atomically" is in the comment but should no longer be there I
>>>> believe; the sentence probably should be:
>>>> // These 2 counters are like the above thread counts, but are
>>>>
>>>
>>> Fixed!
>>>
>>>> Any way we could move this test into a helper method and both
>>>> add_thread/current_thread_exiting could use the same "ignore"
>>>> thread code so that we ensure the two never get out of sync?
>>>>
>>>> + if (jt->is_hidden_from_external_view() ||
>>>> + jt->is_jvmti_agent_thread()) {
>>>> + return;
>>>> + }
>>>>
>>>
>>> Good idea. Fixed.
>>>
>>>> (Also by curiosity, add_thread has an assert on the Threads_lock
>>>> but not thread_exiting?)
>>>>
>>>
>>> Right, I followed the existing pattern where
>>> current_thread_exiting() only uses the atomic counters, so it
>>> doesn't need Threads_lock.
>>>
>>> dl
>>>
>>>> Thanks,
>>>> Jc
>>>>
>>>>
>>>> On Tue, Oct 16, 2018 at 9:52 AM <dean.long at oracle.com
>>>> <mailto:dean.long at oracle.com>> wrote:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8021335
>>>> http://cr.openjdk.java.net/~dlong/8021335/webrev.3/
>>>> <http://cr.openjdk.java.net/%7Edlong/8021335/webrev.3/>
>>>>
>>>> This change attempts to fix an old and intermittent problem with
>>>> ThreadMXBean thread counts.
>>>> Changes have 3 main parts:
>>>> 1. Make sure hidden threads don't affect counts (even when
>>>> exiting)
>>>> 2. Fix race reading counts using atomic counters
>>>> 3. Remove some workarounds in test (because they hide bugs)
>>>>
>>>> dl
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>> Jc
>>>
>>
More information about the serviceability-dev
mailing list