RFR: 8221102: Allow GC threads to participate in threads claiming protocol

Tue Mar 26 10:31:57 UTC 2019

Hi Kim,

>>> We could give less exposure to the claiming mechanism by providing a
>>> second function similar to the existing possibly_parallel_threads_do
>>> that does hit all the threads.  The only problem I have with that is
>>> naming.  Maybe the old function should be renamed to
>>> possibly_parallel_java_threads_and_vm_thread_do, for consistency with
>>> the serial function.
>>
>> Zhengyu found a problem with my approach to include all workers in the
>> protocol: workers that would be created at safepoints would start out
>> with 0 claim token, and the final verification would fail because it
>> wants all threads to be at global thread_parity.
> 
> Yes, that could be a problem.  For the current cases where we do
> iterate over all threads, we get away with it because we don't do the
> final thread count verification; that verification is only associated
> with the subset iteration of possibly_parallel_oops_do.  Interesting.
> The couple of places currently doing all-thread iteration look like
> they should be fine with missing some that are under construction.
> But that kind of brings us back around to the NJT initialization race
> question from the other thread.  I'll continue this thought over there.
> 
>>   It could be fixed
>> fairly easily by letting all threads start with current global parity.
> 
> I suspect that introduces a race to get the current global parity.
> 
>> But as you say, this might just be one case of assumptions and there
>> might be more lurking around. So let's go with your approach for now.
>>
>> Roman
> 
> OK.  So does that count as a "review"?  And I still need a second.

No, not yet.

I've tested it against my failing stress test (together with your 
njt_lock patch) and it seems to fix it.

I'm also testing it against Shenandoah test suite, and so far looks good.

I don't quite understand the resetting logic though. It first increments 
the global counter. If it turns 0, it resets all thread tokens to 0, 
using the claim_oops_do/cmpxchg protocol, but that is now expecting 0 
too? Why not simply assign 0 instead? Maybe that code just isn't obvious 
enough...?

Also, the resetting logic warrants a test, it is so rare, it will 
otherwise not get executed, but if it fails, it would be catastrophic.

Roman