RFR: 8221102: Allow GC threads to participate in threads claiming protocol
Roman Kennke
rkennke at redhat.com
Tue Mar 26 10:31:57 UTC 2019
Hi Kim,
>>> We could give less exposure to the claiming mechanism by providing a
>>> second function similar to the existing possibly_parallel_threads_do
>>> that does hit all the threads. The only problem I have with that is
>>> naming. Maybe the old function should be renamed to
>>> possibly_parallel_java_threads_and_vm_thread_do, for consistency with
>>> the serial function.
>>
>> Zhengyu found a problem with my approach to include all workers in the
>> protocol: workers that would be created at safepoints would start out
>> with 0 claim token, and the final verification would fail because it
>> wants all threads to be at global thread_parity.
>
> Yes, that could be a problem. For the current cases where we do
> iterate over all threads, we get away with it because we don't do the
> final thread count verification; that verification is only associated
> with the subset iteration of possibly_parallel_oops_do. Interesting.
> The couple of places currently doing all-thread iteration look like
> they should be fine with missing some that are under construction.
> But that kind of brings us back around to the NJT initialization race
> question from the other thread. I'll continue this thought over there.
>
>> It could be fixed
>> fairly easily by letting all threads start with current global parity.
>
> I suspect that introduces a race to get the current global parity.
>
>> But as you say, this might just be one case of assumptions and there
>> might be more lurking around. So let's go with your approach for now.
>>
>> Roman
>
> OK. So does that count as a "review"? And I still need a second.
No, not yet.
I've tested it against my failing stress test (together with your
njt_lock patch) and it seems to fix it.
I'm also testing it against Shenandoah test suite, and so far looks good.
I don't quite understand the resetting logic though. It first increments
the global counter. If it turns 0, it resets all thread tokens to 0,
using the claim_oops_do/cmpxchg protocol, but that is now expecting 0
too? Why not simply assign 0 instead? Maybe that code just isn't obvious
enough...?
Also, the resetting logic warrants a test, it is so rare, it will
otherwise not get executed, but if it fails, it would be catastrophic.
Roman
More information about the hotspot-dev
mailing list