RFR: JDK-8221086: Shenandoah-specific workaround for JDK-8220671
Kim Barrett
kim.barrett at oracle.com
Tue Mar 19 19:37:03 UTC 2019
> On Mar 19, 2019, at 3:18 PM, Roman Kennke <rkennke at redhat.com> wrote:
>
>>> JDK-8220671: Initialization race for non-JavaThread PtrQueue is making
>>> troubles in Shenandoah's testing. While we're working out a generally
>>> acceptable solution, we need a usable workaround in Shenandoah.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8221086
>>> Webrev:
>>> http://cr.openjdk.java.net/~rkennke/JDK-8221086/webrev.01/
>>>
>>> The proposed fix places a SuspendibleThreadJoiner around the
>>> on_thread_attach() and on_thread_detach(), thereby preventing the race.
>>>
>>> As soon as we figured out a solution for JDK-8220671 (which may well be
>>> this 'workaround') we shall revert/overwrite this change.
>>>
>>> Testing: many rounds of hotspot_gc_shenandoah and even more rounds of
>>> the offending TestStringDedupStress.
>>>
>>> Ok?
>>>
>>> Roman
>> I was expecting there to be some shared code changes here, and not a pure shenandoah
>> change.
>
> That's why I called it 'workaround'. It's not an all-inclusive solution. This allows others to make progress with their own patches. It's not helpful to have tests failing randomly for several days. Otherwise I'd have to ask you to back out the original JDK-8219613 ;-) This seems overkill. Also, having this run through our CI would give us some more confidence that the change is good (or not).
>
>> Doesn’t this have the same problem as your “nope, needs coffee” proposal in the
>> main email thread from yesterday? The attach/list-add pair needs to be atomic wrto the
>> SATB state change, and this doesn’t cover the list-add.
>
> It does prevent the critical sections:
> 1. Global SATB flag change
> 2. New-thread SATB flag cache pickup
>
> to prevent from racing with each other, because it causes attaching threads to line up at safepoints if anything like that happens. The actual list addition doesn't seem to be problematic. Or is it? If you think it is, pls explain why? Also, probably take it back to the main thread then ;-)
>
> In any case, the testcase that used to fail once in ~5 runs has survived many dozens of runs now. Even if the fix is not complete and only covers up a deeper problem, it helps us to make progress with other stuff. Ok?
>
> Roman
If it reduces noise in your CI enough to be useful to you, then I’ll withdraw from this (JDK-8221086) discussion.
I’ll take further discussion of the approach over to the main thread.
More information about the hotspot-gc-dev
mailing list