RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues
Roman Kennke
rkennke at redhat.com
Wed Mar 20 08:06:10 UTC 2019
>>> My current idea goes roughly like this (includes some Shenandoah mess
>>> that will not be there in final webrev):
>>>
>>> http://cr.openjdk.java.net/~rkennke/JDK-8221102.patch
>>>
>>> However, this *still* doesn't solve my crashing testcase. Digging even
>>> deeper...
>>>
>>> Roman
>>
>> I have a different idea for this new problem. I’ll post something more tomorrow.
>>
>> Let me know what you find with your test case. Actually, can you describe how to reproduce?
>
> Something like this:
> for i in {1..20}; do CONF=fastdebug LANG=C LOG=info make run-test
> TEST=gc/shenandoah/TestStringDedupStress.java; done
>
> should make it fail somewhat reliably. The attached patch
> baddertest.patch should make it more likely (it launches only
> aggressive-mode test runs). Also, it seems more likely when running on a
> larger machine (with more cores).
>
> The test started failing somewhere between jdk-13+9 and jdk-13+11, and I
> bisected it down to NJT PtrQueues change. It also seemed like the most
> likely candidate in that frame. It only ever seems to crash with
> +UseStringDuplication, and since the strdedup thread does SATB, it seems
> plausible that the change affects this.
>
> Any help would be greatly appreciated.
I have added asserts that verify that, after final flushing of
thread-local SATB queues, that *all* thread's SATB queues are empty. It
does not trigger, any yet, I see crashes.
This tells me that it is failing to enqueue some oops to begin with. Our
ShBS::enqueue() not only checks the thread-local SATB-active flag, but
also the global one. Do you think there might be a race accessing this?
I.e. NJT possibly seeing a stale value because it does not synchronize
on the same stuff as Java threads do when safepointing?
Roman
More information about the hotspot-gc-dev
mailing list