RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues
Roman Kennke
rkennke at redhat.com
Tue Mar 19 06:48:11 UTC 2019
>> Ok, I think I have a fix candidate:
>> http://cr.openjdk.java.net/~rkennke/JDK-8220671/webrev.05/
>>
>> I added a method BS::needs_sts_sync_on_attach_detach() so that GCs
>can
>> avoid using this additional synchronization when they are not
>> implementing attach/detach or not using STS.
>>
>> This passes tier1 and the offending Shenandoah testcase locally. I
>will
>> push it through jdk/submit.
>>
>> What do you think? Can I get reviews on this? Or do you have
>something
>> better in mind?
>>
>> Thanks,
>> Roman
>
>I think ZGC has the same race with the thread-local cache of the bad
>address mask.
:-S
>The needs_std_sync_on_attach_detach function seems to be just an
>optimization, that we should consider (or not) once we're confident we
>have a working solution.
>
>I don't think any GC presently needs anything done on the detach side.
Not sure. We might miss the final flush() or rather end up flushing on the wrong side of a safepoint.
>Whether it even makes sense to change anything there depends on the
>chosen solution.
OK
>Using the type of the thread was one idea I was exploring. I was
>thinking about using is_ConcurrentGC_thread() though. Note that some
>worker threads are ConcurrentGC threads.
OK. Which concgc threads are worker threads too?
>The problem we're trying to solve is that (some implementations of)
>on_thread_attach needs to be atomic with the associated list addition
>wrto safepoints, or more precisely, with an operation performed within
>some safepoints. on_thread_attach is caching global state in the
>thread local data, while the operation is updating the global state
>and needs to also update the thread local caches. For G1 and
>Shenandoah, it's the SATB activation state; for ZGC it's the bad bit
>mask.
>
>Possibilities I've been looking at are:
>
>(1) Conditional STS join around on_thread_attach / add_to_the_list.
>Still
>thinking about what the right conditional might be though.
This seems the only one that works for me so far.
>(2) Move on_thread_attach inside the NJTList_lock in add_to_the_list,
>and make the relevant GC operation also lock the NJTList_lock. This
>would mean on_thread_attach is conditionally called with JNTList_lock
>held, depending on whether the thread is a Java thread or not; such
>conditional lock contexts can be difficult.
I don't think this is sufficient. NJTList_lock does not coordinate with safe points like Threads_lock does, but we need that, otherwise the whole sequence races over safe points.
>(3) I think the reason on_thread_attach preceeds adding to the list is
>to prevent SATB state verification from failing. (Except that doesn't
>quite work because of this new-found race.) So another approach is to
>reverse the order and address the verification failure directly. Add
>another "unitialized" initial state. SATB activation verification
>treats "uninitialized" as okay, and the activation state change
>unconditionally updates the thread state. For on_thread_attach,
>conditionally set the thread state when the old state is
>"uninitialized", using cmpxchg to address the potential race between a
>global change and the initialization. I think something similar could
>be done for ZGC. But this is pretty tightly coupled to properties of
>the state being cached.
This approach hasn't worked for me. It still races across safe points.
Approaches I also shortly considered:
- coordinate with Threads_lock. That would have been more generic than STS, but didn't seem to work on non-Java threads. I didn't go any further than this.
- transition the thread to in_vm. This is also JavaThread only.
So far the webrev.05 seems to be the only working solution for me. I'm still stress-testing it.
Roman
--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
More information about the hotspot-gc-dev
mailing list