RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues

Roman Kennke rkennke at redhat.com
Wed Mar 20 08:09:51 UTC 2019



Am 20.03.19 um 09:06 schrieb Roman Kennke:
>>>> My current idea goes roughly like this (includes some Shenandoah mess
>>>> that will not be there in final webrev):
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8221102.patch
>>>>
>>>> However, this *still* doesn't solve my crashing testcase. Digging even
>>>> deeper...
>>>>
>>>> Roman
>>>
>>> I have a different idea for this new problem.  I’ll post something 
>>> more tomorrow.
>>>
>>> Let me know what you find with your test case.  Actually, can you 
>>> describe how to reproduce?
>>
>> Something like this:
>> for i in {1..20}; do CONF=fastdebug LANG=C LOG=info make run-test
>> TEST=gc/shenandoah/TestStringDedupStress.java; done
>>
>> should make it fail somewhat reliably. The attached patch
>> baddertest.patch should make it more likely (it launches only
>> aggressive-mode test runs). Also, it seems more likely when running on a
>> larger machine (with more cores).
>>
>> The test started failing somewhere between jdk-13+9 and jdk-13+11, and I
>> bisected it down to NJT PtrQueues change. It also seemed like the most
>> likely candidate in that frame. It only ever seems to crash with
>> +UseStringDuplication, and since the strdedup thread does SATB, it seems
>> plausible that the change affects this.
>>
>> Any help would be greatly appreciated.
> 
> I have added asserts that verify that, after final flushing of 
> thread-local SATB queues, that *all* thread's SATB queues are empty. It 
> does not trigger, any yet, I see crashes.
> 
> This tells me that it is failing to enqueue some oops to begin with. Our 
> ShBS::enqueue() not only checks the thread-local SATB-active flag, but 
> also the global one. Do you think there might be a race accessing this? 
> I.e. NJT possibly seeing a stale value because it does not synchronize 
> on the same stuff as Java threads do when safepointing?

E.g., PtrQueueSet::_all_active is not volatile and is not accessed using 
any OrderAccess either... ?

Roman




More information about the hotspot-gc-dev mailing list