RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues

Wed Mar 27 00:38:39 UTC 2019

> On Mar 26, 2019, at 7:24 AM, Per Liden <per.liden at oracle.com> wrote:
> 
> Hi,
> 
> On 3/26/19 11:43 AM, Roman Kennke wrote:
>>>> This thread went a bit off. May I propose this for review:
>>>> 
>>>> http://cr.openjdk.java.net/~rkennke/JDK-8220671/webrev.07/
>>>> 
>>>> it passes tier1 tests locally, and I submitted it to jdk/submit but that seems to have other hiccups.
>>>> 
>>>> WDYT?
>>> 
>>> I'm still not convinced the activation predicates for the STS joiners
>>> are correct here.  And I'm not what the right predicate might be or if
>>> it even exists.
>>> 
>>> Also, STS is presently a "GC" thing, and this would insinuate it into
>>> non-GC code, and affect non-GC threads (such as the WatcherThread and
>>> the JfrThreadSampler).
>>> 
>>> I still prefer using the NonJavaThreadsList_lock in NJT::pre/post_run
>>> and around the the relevant state changes, such as setting the SATB
>>> qset's _all_active member.
>> Ok, let's go with your patch:
>> http://cr.openjdk.java.net/~kbarrett/8220671/use_njtlock.00/
> 
> Looks good to me. I also think this is the better approach, as I would also feel uncomfortable with exposing the STS outside of GC.

Thanks, but see below.  I’m thinking about a bit of rework.

>> I've tested it against my stress test and Shenandoah test suite, and all seems fine.
>> Is the extra locking around assignment of global state needed? I think the locking in threads_do() is sufficient?
> 
> The locking over "_all_active = active;" looks sane to me. This patch has no locking in threads_do(), or am I missing something?
> 
> cheers,
> Per
> 
>> Roman

Right, there's no locking in (non_java_)threads_do, by design.

But looking at this again, the expanded exposure of the
NonJavaThreadsList_lock is making me regret that it's also being used
to protect the synchronize() in remove_from_the_list. If an iterator
were to try to lock that mutex while some NonJavaThread was on its way
out, we'd deadlock. The one new use of the mutex being added by this
change doesn't introduce such a problem, but still... I think it's
easy to avoid that problem by adding a new mutex just to protect the
synchronize call.