RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Doerr, Martin martin.doerr at sap.com
Mon Apr 6 10:15:51 UTC 2020


Hi Patricio,

I'm very sorry for the late review. I couldn't make it last week.

Thanks a lot for taking care of the ordering semantics. I'm a big fan of having it explicit, precise, clear and well documented.

VM_HandshakeOneThread and VM_HandshakeAllThreads work without the barrier because HandshakeState::try_process with _processing_sem.signal() is always called?
I think it would be better to have the explicit acquire barrier there, too. Do you agree?

check_state() is pointless in your new version.

I've also looked at your test. Shouldn't we set -XX:+UseBiasedLocking explicitly so the test still works as it should after BiasedLocking deprecation?

Best regards,
Martin


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Montag, 6. April 2020 00:40
> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, Martin
> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>; hotspot-
> runtime-dev at openjdk.java.net
> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
> VMThread intervention
> 
> On 4/04/2020 12:57 am, Patricio Chilano wrote:
> >
> > On 4/3/20 11:40 AM, Doerr, Martin wrote:
> >> Hi Patricio,
> >>
> >> thanks for the quick reply.
> >>
> >> So should we also use precise barriers in
> >> add_target_count(int count) { Atomic::add(&_pending_threads, count,
> >> memory_order_relaxed); }
> >> ?
> >> Relaxed would fit to your explanation.
> > Exactly. I was about to send you the diff with those two changes before
> > answering David and Dan about the acquire fence.   : )
> >
> >
> > diff --git a/src/hotspot/share/runtime/handshake.cpp
> > b/src/hotspot/share/runtime/handshake.cpp
> > --- a/src/hotspot/share/runtime/handshake.cpp
> > +++ b/src/hotspot/share/runtime/handshake.cpp
> > @@ -56,7 +56,7 @@
> >       assert(val >= 0, "_pending_threads=%d cannot be negative", val);
> >       return val == 0;
> >     }
> > -  void add_target_count(int count) { Atomic::add(&_pending_threads,
> > count); }
> > +  void add_target_count(int count) { Atomic::add(&_pending_threads,
> > count, memory_order_relaxed); }
> >     bool executed() const { return _executed; }
> >     const char* name() { return _handshake_cl->name(); }
> >
> > @@ -229,9 +229,8 @@
> >     // When this is executed by the Handshakee we need a release store
> >     // here to make sure memory operations executed in the handshake
> >     // closure are visible to the Handshaker after it reads that the
> > -  // operation has completed. Atomic::dec() already provides a full
> > -  // memory fence.
> > -  Atomic::dec(&_pending_threads);
> > +  // operation has completed.
> > +  Atomic::dec(&_pending_threads, memory_order_release);
> >
> >     // It is no longer safe to refer to 'this' as the
> > VMThread/Handshaker may have destroyed this operation
> >   }
> 
> The above seems okay. We don't generally try to optimise the barriers
> associated with these atomic rmw ops due to the complexity of the
> reasoning around the different types of barriers. But in this case the
> reasoning seems quite straight-forward. The increment does not indicate
> anything to any other thread, it simply sets the right value in the
> counter (which may already have been decremented). The decrement
> indicates a completion token and so needs at least a release-barrier.
> The final check that watches for the count to be zero, needs the
> matching acquire barrier.
> 
> Thanks,
> David
> -----
> 
> > Thanks Martin!
> >
> > Patricio
> >> Best regards,
> >> Martin
> >>
> >>
> >>> -----Original Message-----
> >>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
> >>> Sent: Freitag, 3. April 2020 16:33
> >>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> >>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
> >>> hotspot-runtime-dev at openjdk.java.net
> >>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
> >>> VMThread intervention
> >>>
> >>> Hi Martin,
> >>>
> >>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
> >>>> Hi David,
> >>>>
> >>>> the acquire belongs to the load of _pending_threads in is_completed()
> >>> (handshake.cpp:270).
> >>>> It's moved below the loop because we don't need to repeat the
> barrier.
> >>>>
> >>>> The release barrier is integrated in Atomic::dec(&_pending_threads)
> >>> (handshake.cpp:234).
> >>>> Should we better use Atomic::dec(&_pending_threads,
> >>> memory_order_release) to make it precise and explicit?
> >>> As you stated in your other email too we can specify explicit memory
> >>> ordering to Atomic::dec() which I forgot. I agree with explicitly
> >>> changing it to Atomic::dec(&_pending_threads,
> memory_order_release).
> >>>
> >>>> Release / acquire semantics are exactly what I was missing in the
> >>>> earlier
> >>> version.
> >>>>
> >>>> I don't understand why "add_target_count" is called after the
> >>>> handshakes
> >>> were activated (handshake.cpp:171).
> >>>> Couldn't it happen that the first handshake completes before that
> >>>> (it would
> >>> set _pending_threads to 0)?
> >>> Yes, _pending_threads can even go negative for the
> >>> VM_HandshakeAllThreads case. But we only check _pending_threads in
> the
> >>> VMThread so for the Handshakees it doesn't matter when they
> decrement
> >>> the counter what's the actual value.
> >>>
> >>> Thanks,
> >>> Patricio
> >>>> Best regards,
> >>>> Martin
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>> Sent: Freitag, 3. April 2020 15:22
> >>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr,
> >>>>> Martin
> >>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
> >>> hotspot-
> >>>>> runtime-dev at openjdk.java.net
> >>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
> >>>>> VMThread intervention
> >>>>>
> >>>>> Hi Patricio,
> >>>>>
> >>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
> >>>>>> Hi Martin,
> >>>>>>
> >>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
> >>>>>>> Hi Patricio,
> >>>>>>>
> >>>>>>> right, you got my point. Sorry for not explaining it in greater
> >>>>>>> detail
> >>>>>>> at the beginning. I was busy with other things.
> >>>>>>>
> >>>>>>> There should be a memory barrier ordering read of
> _pending_threads
> >>>>>>> (which is observed as 0) and whatever gets done when returning
> from
> >>>>>>> execute_direct.
> >>>>>>> I don't like relying on barriers which are hidden somewhere and
> >>>>>>> may be
> >>>>>>> subject to change.
> >>>>>> Yes, I agree with that. I added an explicit acquire fence at the
> >>>>>> end of
> >>>>>> Handshake::execute_direct() with a comment. Thanks for noticing it.
> >>>>> This is unclear to me:
> >>>>>
> >>>>> 281   // Prevent future loads from floating above the load of
> >>>>> _pending_threads
> >>>>>     282   // in is_completed(). This prevents reading stale data
> >>>>> modified
> >>>>> in the
> >>>>>     283   // handshake closure by the Handshakee.
> >>>>>     284   OrderAccess::acquire();
> >>>>>     285
> >>>>>     286   return op.executed();
> >>>>>
> >>>>> An acquire() by itself has no meaning, it has to be associated with a
> >>>>> release() and both have to be associated with the load, respectively,
> >>>>> store, of the same variable. From the comment it sounds like you just
> >>>>> want a loadload() barrier here.
> >>>>>
> >>>>> Other updates seem fine.
> >>>>>
> >>>>> Thanks,
> >>>>> David
> >>>>> -----
> >>>>>
> >>>>>>> I'd also prefer explicit barriers on the writers side over the
> >>>>>>> implicit ones from Atomic::add, because explicit barriers with
> >>>>>>> comments help a lot understanding the code.
> >>>>>>> (Not sure if you want to change that. Just my opinion.)
> >>>>>> I added a comment on top of Atomic::dec() on why we need a
> memory
> >>>>>> barrier there and the fact that we are already using a full memory
> >>>>>> fence.
> >>>>>> If you still think I should add an explicit barrier for code
> >>>>>> readability
> >>>>>> I can add it though.
> >>>>>>
> >>>>>>
> >>>>>> Here is v3 which also addresses Dan's comments.
> >>>>>>
> >>>>>> Full:
> >>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
> >>>>>> Inc:
> >>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
> >>>>>>
> >>>>>>
> >>>>>> Thanks!
> >>>>>> Patricio
> >>>>>>> Best regards,
> >>>>>>> Martin
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
> >>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
> >>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
> >>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-
> dev at openjdk.java.net
> >>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
> without
> >>>>>>>> VMThread intervention
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
> >>>>>>>>> Hi Martin,
> >>>>>>>>>
> >>>>>>>>> Thanks for looking at this.
> >>>>>>>>>
> >>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
> >>>>>>>>>> Hi Martin,
> >>>>>>>>>>
> >>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> first of all, thanks, Patricio, for doing this.
> >>>>>>>>>>>
> >>>>>>>>>>> One question:
> >>>>>>>>>>> _pending_threads is used for synchronizing threads.
> >>>>>>>>>>> Don't we need release / acquire semantics?
> >>>>>>>>>>> I guess otherwise handshaker can read stale data from
> >>> handshakee
> >>>>>>>>>>> after observing is_completed.
> >>>>>>>>>>> Or did I miss anything?
> >>>>>>>>>> Reading stale should be fine in this case, since the counter goes
> >>>>>>>>>> from n->0, thus you can only get false negatives from "bool
> >>>>>>>>>> is_completed()" ?
> >>>>>>>>> Right, as Robbin points out at most we will just have to loop
> >>>>>>>>> again.
> >>>>>>>>> Also, when the handshakee processes the handshake, right
> after
> >>>>>>>>> updating _pending_threads, it will either signal the
> >>>>>>>>> _processing_sem
> >>>>>>>>> if there are no direct operations, or it will signal the
> >>>>>>>>> _handshake_turn_sem, so that will provide the release already.
> The
> >>>>>>>>> next read in the loop should read the updated value. Are you
> okay
> >>> with
> >>>>>>>>> that?
> >>>>>>>> I realize you might be thinking of a different case now. You
> >>>>>>>> mean the
> >>>>>>>> handshaker could execute loads from code after
> >>>>>>>> Handshake::execute_direct() while it hasn't yet read
> >>> _pending_threads
> >>>>>>>> from is_completed()? Although there is a load_acquire() in
> >>>>>>>> ~ThreadsListHandle() that will prevent those issues you might
> argue
> >>> that
> >>>>>>>> we can't rely on code which might change.
> >>>>>>>> I think (not counting that load_acquire from ~ThreadsListHandle)
> >>>>>>>> the
> >>>>>>>> issue could appear for example if you created a ThreadsListHandle
> >>> first,
> >>>>>>>> then called Handshake::execute_direct(), and when returning you
> try
> >>> to
> >>>>>>>> read some field from the handshakee (you might want to assert
> >>>>> something
> >>>>>>>> for example). If there wasn't any ThreadsListHandle created
> before
> >>>>>>>> calling execute_direct() I think you are not allowed to access
> >>>>>>>> other
> >>>>>>>> JavaThread's data anyway since the thread might have exited
> >>>>>>>> already,
> >>>>>>>> unless you create a ThreadsListHandle which will already provide
> >>>>>>>> the
> >>>>>>>> acquire. So I think you are right and we might need a fence before
> >>>>>>>> returning from execute_direct() or add a comment that we are
> >>>>>>>> relying
> >>>>> on
> >>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you
> were
> >>>>>>>> thinking?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Patricio
> >>>>>>>>> Thanks!
> >>>>>>>>>
> >>>>>>>>> Patricio
> >>>>>>>>>> Thanks, Robbin
> >>>>>>>>>>
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>> Martin
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
> >>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
> >>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
> >>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
> >>>>>>>>>>>> hotspot-runtime-
> >>>>>>>>>>>> dev at openjdk.java.net
> >>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
> >>> without
> >>>>>>>>>>>> VMThread intervention
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Patricio,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for fixing, looks good.
> >>>>>>>>>>>>
> >>>>>>>>>>>> /Robbin
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Please review this redo for 8230594. Since the reason
> behind
> >>> the
> >>>>>>>>>>>>> Windows
> >>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost
> >>>>>>>>>>>>> identical
> >>>>>>>>>>>>> to the
> >>>>>>>>>>>>> reverse of the backout modulo the rebase.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> There is only one small fix for an issue not identified in the
> >>>>>>>>>>>>> original
> >>>>>>>>>>>>> patch which I explained in the comments. To avoid this
> >>>>>>>>>>>>> possible
> >>>>>>>>>>>>> issue of
> >>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm()
> while
> >>>>> there
> >>>>>>>>>>>>> are
> >>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem
> semaphore,
> >>> I
> >>>>>>>> added a
> >>>>>>>>>>>> check
> >>>>>>>>>>>>> to verify the JavaThread in question is not protected by
> some
> >>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In
> case
> >>>>>>>>>>>>> it is
> >>>>>>>>>>>>> protected, we need to at least clear the value for
> _thread_key
> >>> to
> >>>>>>>>>>>>> avoid
> >>>>>>>>>>>>> possible infinite loops at thread exit (this can happen in
> >>>>>>>>>>>>> Solaris).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The only other change is that I removed the check for local
> >>>>>>>>>>>>> polls in
> >>>>>>>>>>>>> Handshake::execute_direct() since after the backout,
> thread-
> >>> local
> >>>>>>>>>>>>> handshakes was implemented for arm32 which was the
> only
> >>>>>>>> remaining
> >>>>>>>>>>>> platform.
> >>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in t7
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>> saw no
> >>>>>>>>>>>>> failures.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Bug:
> >>>>>>>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8240918
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Webrev:
> >>>>>>>>>>>>>
> >>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Patricio
> >


More information about the hotspot-runtime-dev mailing list