RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues

Tue Mar 19 21:51:08 UTC 2019

See:
https://bugs.openjdk.java.net/browse/JDK-8221102

I believe G1 is also affected by this problem, as it seems to flush SATB 
queues in pretty much the same way as Shenandoah:

http://hg.openjdk.java.net/jdk/jdk/file/ddfb658c8ce3/src/hotspot/share/gc/g1/g1ConcurrentMark.cpp#l1776

Please check it, and update as appropriate. I'm working on a fix.

Roman

> Uuh. I have been wrong all along. Or, at the very least, the race that
> we discovered is not the only cause of my problems.
> 
> Some further testing reveals that: we used the thread claiming protocol
> to do the final flush of all threads' SATB buffers. The
> StringDedupThread does not consistently participate in this. We're
> simply never claiming that thread, and thus skip flushing it.
> 
> This is a separate bug, related to JDK-8219613. I believe it is possible
> that this affects G1 with string-dedup too.
> 
> For this one, do we want to go forward with:
> http://cr.openjdk.java.net/~rkennke/JDK-8220671/webrev.07/
> 
> ?
> 
> 
> ROman
> 
>>> On Mar 19, 2019, at 2:59 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>>
>>>> On Mar 19, 2019, at 8:01 AM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>
>>>>
>>>>>>> The problem we're trying to solve is that (some implementations of)
>>>>>>> on_thread_attach needs to be atomic with the associated list addition
>>>>>>> wrto safepoints, or more precisely, with an operation performed within
>>>>>>> some safepoints.  on_thread_attach is caching global state in the
>>>>>>> thread local data, while the operation is updating the global state
>>>>>>> and needs to also update the thread local caches.  For G1 and
>>>>>>> Shenandoah, it's the SATB activation state; for ZGC it's the bad bit
>>>>>>> mask.
>>>>
>>>> Yes, it is not strictly the safepoint that is the issue. But as you say, there's state that we're reading concurrently that is otherwise written at safepoints (the SATB active flag), and not protected otherwise (e.g. by some lock that we could take when reading). There's also global flags and state updated at safepoints, and I am not sure we're good with that either if we don't let the non-Java-thread synchronize with safepoints on attach.
>>>>
>>>> Therefore I think that the way to go is to let such threads synchronize with safepoints on on_thread_attach(), and via STS. A Shenandoah-only solution that seems to do it:
>>>>
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8220671/webrev.06/
>>>>
>>>> Compared to my other proposals it has the advantage that it's only changing GC specific code, and not with any global/shared code. It will not introduce STS into paths that don't synchronize this anyway (e.g. CMS).
>>>>
>>>> Unless we come up with something even better, this would be the way to go for me. If you agree, I can do the G1 change in the same fashion.
>>>>
>>>> Roman
>>>
>>> As I said over in the JDK-8221086 thread, I don’t think that works.
>>> It is making on_thread_attach atomic wrto safepoints, but doesn’t
>>> include the NJT list modifications.
>>
>> The problem race, in pseudo-code:
>>
>> // thread creation
>> STS.join
>>    attach(this_thread)
>>      this_thread.SATB.active = old_state
>> STS.leave
>> // Asynchronous (wrto this_thread) safepoint might not update
>> // this_thread's SATB.active state because this_thread is not
>> // yet visible to the threads_do() performing the update.
>> lock NJTList
>>    this_thread._next = NJTList._head
>>    release_store(NJTList._head, this_thread)
>> unlock NJTList
>>
>>
>