RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues
Kim Barrett
kim.barrett at oracle.com
Mon Mar 18 22:35:21 UTC 2019
> On Mar 18, 2019, at 7:39 AM, Roman Kennke <rkennke at redhat.com> wrote:
>
>>>> I think the proposed fix for _just_ StringDedupThread is insufficient.
>>>> I think there are other threads that could have similar races between
>>>> on_thread_attach and being added to the NJT thread list. And it
>>>> doesn't matter whether those threads touch oops or not. Even if they
>>>> don't, the state verification done when the SATB transition is made
>>>> can fail. (See SATBMarkQueueSet::verify_active_states.)
>>>
>>> Yeah I agree that this warrants a more generic solution. I am thinking about it. It is a tricky little bugger...
>>>
>>>> It might be sufficient to add a STS joiner around the on_thread_attach
>>>> / add_to_the_list sequence in NJT::pre_run, with the joiner only
>>>> active if invoked when not at a safepoint. That way we can still
>>>> create threads during a safepoint that are needed in that safepoint
>>>> (e.g. lazy allocation of worker threads). There are some possibly
>>>> complicated cases to think about though, or maybe disallow somehow.
>>>
>>>
>>> Checking for being at safepoint itself is racy if the thread is not participating in the safepointing protocol. On the other hand it might not be bad per se to block when at safepoint. As long as it doesn't hold back other threads that would not leave the safepoint as a result...
>>>
>>> I'll look a bit deeper into this and come back... on Monday ;-)
>>
>> Yeah, this does look tricky. I agree the activated STJ joiner on !safepoint is
>> racy and might not work. I will also think about it; I have one idea that seems
>> promising, but I want to think about it some more, given that none of my
>> previous ones seem to be panning out.
>
> Ok, I think I have a fix candidate:
> http://cr.openjdk.java.net/~rkennke/JDK-8220671/webrev.05/
>
> I added a method BS::needs_sts_sync_on_attach_detach() so that GCs can
> avoid using this additional synchronization when they are not
> implementing attach/detach or not using STS.
>
> This passes tier1 and the offending Shenandoah testcase locally. I will
> push it through jdk/submit.
>
> What do you think? Can I get reviews on this? Or do you have something
> better in mind?
>
> Thanks,
> Roman
I think ZGC has the same race with the thread-local cache of the bad
address mask.
The needs_std_sync_on_attach_detach function seems to be just an
optimization, that we should consider (or not) once we're confident we
have a working solution.
I don't think any GC presently needs anything done on the detach side.
Whether it even makes sense to change anything there depends on the
chosen solution.
Using the type of the thread was one idea I was exploring. I was
thinking about using is_ConcurrentGC_thread() though. Note that some
worker threads are ConcurrentGC threads.
The problem we're trying to solve is that (some implementations of)
on_thread_attach needs to be atomic with the associated list addition
wrto safepoints, or more precisely, with an operation performed within
some safepoints. on_thread_attach is caching global state in the
thread local data, while the operation is updating the global state
and needs to also update the thread local caches. For G1 and
Shenandoah, it's the SATB activation state; for ZGC it's the bad bit
mask.
Possibilities I've been looking at are:
(1) Conditional STS join around on_thread_attach / add_to_the_list. Still
thinking about what the right conditional might be though.
(2) Move on_thread_attach inside the NJTList_lock in add_to_the_list,
and make the relevant GC operation also lock the NJTList_lock. This
would mean on_thread_attach is conditionally called with JNTList_lock
held, depending on whether the thread is a Java thread or not; such
conditional lock contexts can be difficult.
(3) I think the reason on_thread_attach preceeds adding to the list is
to prevent SATB state verification from failing. (Except that doesn't
quite work because of this new-found race.) So another approach is to
reverse the order and address the verification failure directly. Add
another "unitialized" initial state. SATB activation verification
treats "uninitialized" as okay, and the activation state change
unconditionally updates the thread state. For on_thread_attach,
conditionally set the thread state when the old state is
"uninitialized", using cmpxchg to address the potential race between a
global change and the initialization. I think something similar could
be done for ZGC. But this is pretty tightly coupled to properties of
the state being cached.
More information about the hotspot-gc-dev
mailing list