RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues
Kim Barrett
kim.barrett at oracle.com
Thu Mar 21 23:54:13 UTC 2019
> On Mar 21, 2019, at 4:05 AM, Roman Kennke <rkennke at redhat.com> wrote:
>> I think my patch to expand the claim counter is a good solution to 8221102 though, so unless you object
>> I’m going to claim that bug and send out that patch.
>
> Sure.
Done. RFR email sent, and I see you’ve responded.
>> Still trying to debug the crash though. I haven’t been delving into the string dedup code much before, and
>> Shenandoah’s looks pretty different from G1’s. I doubt the race we’ve been discussing for 8220671 is the
>> source of the problem though. I will continue investigating.
>
> Hint: The problem mostly appears to happen when many such processes run concurrently. It probably helps enlarging the race window. So what I is to have one terminal window open where I'm running the test in a loop (which keeps spawning processes), and in another terminal I manually run one process at a time (given the same arguments as jtreg would use), probably even in a debugger. Works well and fails almost every run.
>
> I'm now thinking that looking at the actual SATB queues only might be misleading. If we somehow really failed to mark the object because of dropped items from SATB (for example), the entry should have been cleared from the StringDedupQueue in final-mark pause. However, we see a stale reference there that points to a reclaimed region. Something else must go wrong.
I see you've found the problem (JDK-8221278), and it wasn't either of
the issues we've looked at so far in this thread. But I'll comment on
the things you mention here anyway, for completeness.
> What the previous code did was basically to synchronize the StringDedupThread with the VMThread (and any other non-Java-thread) on SATB enqueue, via the shared lock. This has gone with your change. I wonder if that is opening a race somewhere in strdedup code? I.e. it might have covered up a bug that only appears now?
>
> This scoping here looks fishy (altough not outright wrong):
> https://paste.fedoraproject.org/paste/o26tktNzT30VEO-rRj-8nQ
>
> Strictly speaking, the oop's scope goes across the safepoint, even though it's not used across it. But I can't really tell 100% sure: can the compiler do funny reorderings there?
I think there are sufficient barriers in operations inside yield() to
prevent that.
> Also, there is a call like this:
> deduplicate_shared_strings(&total_stat);
>
> at the very beginning of StringDedupThread::do_deduplication() that is outside of STS and can does heap modification and cause SATB enqueues too, and therefore should be within STS, right?
This is "just" pre-seeding the deduptable from the CDS archive's
shared stringtable. There won't be any duplicates in the shared
stringtable.
I'm worried about the hash-code update in there though. I'm having
trouble convincing myself that won't ever be applied to a shared
string in the archive, and that seems bad. Maybe there's something
I'm not aware of that makes that okay? There *is* a test using
StringDeduplicationRehashALot, which suggests we might be okay.
More information about the hotspot-gc-dev
mailing list