RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues

Tue Mar 19 07:28:28 UTC 2019

Am 19. März 2019 08:16:49 MEZ schrieb Kim Barrett <kim.barrett at oracle.com>:
>> On Mar 19, 2019, at 2:48 AM, Roman Kennke <rkennke at redhat.com> wrote:
>> 
>> 
>> 
>> 
>>>> Ok, I think I have a fix candidate:
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8220671/webrev.05/
>>>> 
>>>> I added a method BS::needs_sts_sync_on_attach_detach() so that GCs
>>> can
>>>> avoid using this additional synchronization when they are not
>>>> implementing attach/detach or not using STS.
>>>> 
>>>> This passes tier1 and the offending Shenandoah testcase locally. I
>>> will
>>>> push it through jdk/submit.
>>>> 
>>>> What do you think? Can I get reviews on this? Or do you have
>>> something
>>>> better in mind?
>>>> 
>>>> Thanks,
>>>> Roman
>>> 
>>> I think ZGC has the same race with the thread-local cache of the bad
>>> address mask.
>> 
>> :-S
>
>But Per thinks the non-Java threads still don’t use that value (they
>weren’t
>before, and that hasn’t changed, though might in the future?)
>
>> 
>>> The needs_std_sync_on_attach_detach function seems to be just an
>>> optimization, that we should consider (or not) once we're confident
>we
>>> have a working solution.
>>> 
>>> I don't think any GC presently needs anything done on the detach
>side. 
>> 
>> Not sure. We might miss the final flush() or rather end up flushing
>on the wrong side of a safepoint.
>
>Note that very few NJTs ever exit at all.  And those which do are
>during process
>teardown, so not really relevant for global SATB active state change
>(for example).
>
>>> Using the type of the thread was one idea I was exploring.  I was
>>> thinking about using is_ConcurrentGC_thread() though.  Note that
>some
>>> worker threads are ConcurrentGC threads.
>> 
>> OK. Which concgc threads are worker threads too?
>
>ZGC uses workgang threads configured as “concurrent”.  There’s an
>argument for
>workgang construction indicating whether the associated threads are
>concurrent
>or not.  I *think* there are others, but not in a position to search
>for them right now.
>
>> 
>>> The problem we're trying to solve is that (some implementations of)
>>> on_thread_attach needs to be atomic with the associated list
>addition
>>> wrto safepoints, or more precisely, with an operation performed
>within
>>> some safepoints.  on_thread_attach is caching global state in the
>>> thread local data, while the operation is updating the global state
>>> and needs to also update the thread local caches.  For G1 and
>>> Shenandoah, it's the SATB activation state; for ZGC it's the bad bit
>>> mask.
>>> 
>>> Possibilities I've been looking at are:
>>> 
>>> (1) Conditional STS join around on_thread_attach / add_to_the_list. 
>>> Still
>>> thinking about what the right conditional might be though.
>> 
>> This seems the only one that works for me so far.
>> 
>>> (2) Move on_thread_attach inside the NJTList_lock in
>add_to_the_list,
>>> and make the relevant GC operation also lock the NJTList_lock.  This
>>> would mean on_thread_attach is conditionally called with
>JNTList_lock
>>> held, depending on whether the thread is a Java thread or not; such
>>> conditional lock contexts can be difficult.
>> 
>> I don't think this is sufficient. NJTList_lock does not coordinate
>with safe points like Threads_lock does, but we need that, otherwise
>the whole sequence races over safe points.
>
>It’s not safepoints that matter, it’s the operation being done in a
>safepoint, like
>changing the SATB active state.  That’s the “relevant GC operation”
>that also
>has to lock the NJTList_lock.
>
>> 
>>> (3) I think the reason on_thread_attach preceeds adding to the list
>is
>>> to prevent SATB state verification from failing.  (Except that
>doesn't
>>> quite work because of this new-found race.)  So another approach is
>to
>>> reverse the order and address the verification failure directly. 
>Add
>>> another "unitialized" initial state.  SATB activation verification
>>> treats "uninitialized" as okay, and the activation state change
>>> unconditionally updates the thread state.  For on_thread_attach,
>>> conditionally set the thread state when the old state is
>>> "uninitialized", using cmpxchg to address the potential race between
>a
>>> global change and the initialization.  I think something similar
>could
>>> be done for ZGC.  But this is pretty tightly coupled to properties
>of
>>> the state being cached.
>> 
>> This approach hasn't worked for me. It still races across safe
>points.
>
>Again, don’t care about safepoints per se, it’s the interaction between
>on_thread_attach and changing the global SATB active state that
>matters.

Right! Zhengyu suggested to use GlobalCounters multi-read/single-write protection around the relevant operations. I might try that today. What do you think?

Roman

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.