RFR 8240918: [REDO] Allow direct handshakes without VMThread intervention

Doerr, Martin martin.doerr at sap.com
Mon Apr 6 15:51:29 UTC 2020


Hi Patricio,

looks very good.

Thanks again and best regards,
Martin


> -----Original Message-----
> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
> Sent: Montag, 6. April 2020 16:58
> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
> hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
> VMThread intervention
> 
> Hi Martin,
> 
> On 4/6/20 7:15 AM, Doerr, Martin wrote:
> > Hi Patricio,
> >
> > I'm very sorry for the late review. I couldn't make it last week.
> >
> > Thanks a lot for taking care of the ordering semantics. I'm a big fan of
> having it explicit, precise, clear and well documented.
> Yes, I also think it's more clear now with the explicit orderings. : )
> 
> > VM_HandshakeOneThread and VM_HandshakeAllThreads work without
> the barrier because HandshakeState::try_process with
> _processing_sem.signal() is always called?
> > I think it would be better to have the explicit acquire barrier there, too. Do
> you agree?
> I've been thinking about the VM_HandshakeOneThread and
> VM_HandshakeAllThreads cases since the other day and although I don't
> think there are issues in the use cases we have today it's probably a
> good idea to add the fence there also in case that changes. So for the
> VMThread it is safe without adding the barrier because upon returning
> from the do_it() method it will just notify the Handshaker that the
> operation has been processed and it will proceed to the main loop of
> VMThread::loop() to process the next operation. I think there could be
> an issue if we have one of this non-direct handshakes inside an
> operation that allows nested operations inside it. So when the VMThread
> returns from the handshake, it will go back to the first operation where
> it could try to read some data changed during the handshake. If the
> handshake was actually processed by the handshakee then we could have
> this issue of reading stale data.
> 
> > check_state() is pointless in your new version.
> Removed.
> 
> > I've also looked at your test. Shouldn't we set -XX:+UseBiasedLocking
> explicitly so the test still works as it should after BiasedLocking deprecation?
> Yes, thanks for noticing it. Since I'm planning to push this first I
> will change it now so I don't have to fix it later.
> 
> 
> Here is v4 which includes also the memory ordering changes I sent before:
> 
> Full:
> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/webrev/
> Inc:
> http://cr.openjdk.java.net/~pchilanomate/8240918/v4/inc/webrev/
> 
> 
> Thanks!
> 
> Patricio
> > Best regards,
> > Martin
> >
> >
> >> -----Original Message-----
> >> From: David Holmes <david.holmes at oracle.com>
> >> Sent: Montag, 6. April 2020 00:40
> >> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr, Martin
> >> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
> hotspot-
> >> runtime-dev at openjdk.java.net
> >> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
> >> VMThread intervention
> >>
> >> On 4/04/2020 12:57 am, Patricio Chilano wrote:
> >>> On 4/3/20 11:40 AM, Doerr, Martin wrote:
> >>>> Hi Patricio,
> >>>>
> >>>> thanks for the quick reply.
> >>>>
> >>>> So should we also use precise barriers in
> >>>> add_target_count(int count) { Atomic::add(&_pending_threads, count,
> >>>> memory_order_relaxed); }
> >>>> ?
> >>>> Relaxed would fit to your explanation.
> >>> Exactly. I was about to send you the diff with those two changes before
> >>> answering David and Dan about the acquire fence.   : )
> >>>
> >>>
> >>> diff --git a/src/hotspot/share/runtime/handshake.cpp
> >>> b/src/hotspot/share/runtime/handshake.cpp
> >>> --- a/src/hotspot/share/runtime/handshake.cpp
> >>> +++ b/src/hotspot/share/runtime/handshake.cpp
> >>> @@ -56,7 +56,7 @@
> >>>        assert(val >= 0, "_pending_threads=%d cannot be negative", val);
> >>>        return val == 0;
> >>>      }
> >>> -  void add_target_count(int count) { Atomic::add(&_pending_threads,
> >>> count); }
> >>> +  void add_target_count(int count) { Atomic::add(&_pending_threads,
> >>> count, memory_order_relaxed); }
> >>>      bool executed() const { return _executed; }
> >>>      const char* name() { return _handshake_cl->name(); }
> >>>
> >>> @@ -229,9 +229,8 @@
> >>>      // When this is executed by the Handshakee we need a release store
> >>>      // here to make sure memory operations executed in the handshake
> >>>      // closure are visible to the Handshaker after it reads that the
> >>> -  // operation has completed. Atomic::dec() already provides a full
> >>> -  // memory fence.
> >>> -  Atomic::dec(&_pending_threads);
> >>> +  // operation has completed.
> >>> +  Atomic::dec(&_pending_threads, memory_order_release);
> >>>
> >>>      // It is no longer safe to refer to 'this' as the
> >>> VMThread/Handshaker may have destroyed this operation
> >>>    }
> >> The above seems okay. We don't generally try to optimise the barriers
> >> associated with these atomic rmw ops due to the complexity of the
> >> reasoning around the different types of barriers. But in this case the
> >> reasoning seems quite straight-forward. The increment does not indicate
> >> anything to any other thread, it simply sets the right value in the
> >> counter (which may already have been decremented). The decrement
> >> indicates a completion token and so needs at least a release-barrier.
> >> The final check that watches for the count to be zero, needs the
> >> matching acquire barrier.
> >>
> >> Thanks,
> >> David
> >> -----
> >>
> >>> Thanks Martin!
> >>>
> >>> Patricio
> >>>> Best regards,
> >>>> Martin
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
> >>>>> Sent: Freitag, 3. April 2020 16:33
> >>>>> To: Doerr, Martin <martin.doerr at sap.com>; David Holmes
> >>>>> <david.holmes at oracle.com>; Robbin Ehn <robbin.ehn at oracle.com>;
> >>>>> hotspot-runtime-dev at openjdk.java.net
> >>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
> >>>>> VMThread intervention
> >>>>>
> >>>>> Hi Martin,
> >>>>>
> >>>>> On 4/3/20 11:20 AM, Doerr, Martin wrote:
> >>>>>> Hi David,
> >>>>>>
> >>>>>> the acquire belongs to the load of _pending_threads in
> is_completed()
> >>>>> (handshake.cpp:270).
> >>>>>> It's moved below the loop because we don't need to repeat the
> >> barrier.
> >>>>>> The release barrier is integrated in Atomic::dec(&_pending_threads)
> >>>>> (handshake.cpp:234).
> >>>>>> Should we better use Atomic::dec(&_pending_threads,
> >>>>> memory_order_release) to make it precise and explicit?
> >>>>> As you stated in your other email too we can specify explicit memory
> >>>>> ordering to Atomic::dec() which I forgot. I agree with explicitly
> >>>>> changing it to Atomic::dec(&_pending_threads,
> >> memory_order_release).
> >>>>>> Release / acquire semantics are exactly what I was missing in the
> >>>>>> earlier
> >>>>> version.
> >>>>>> I don't understand why "add_target_count" is called after the
> >>>>>> handshakes
> >>>>> were activated (handshake.cpp:171).
> >>>>>> Couldn't it happen that the first handshake completes before that
> >>>>>> (it would
> >>>>> set _pending_threads to 0)?
> >>>>> Yes, _pending_threads can even go negative for the
> >>>>> VM_HandshakeAllThreads case. But we only check _pending_threads
> in
> >> the
> >>>>> VMThread so for the Handshakees it doesn't matter when they
> >> decrement
> >>>>> the counter what's the actual value.
> >>>>>
> >>>>> Thanks,
> >>>>> Patricio
> >>>>>> Best regards,
> >>>>>> Martin
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: David Holmes <david.holmes at oracle.com>
> >>>>>>> Sent: Freitag, 3. April 2020 15:22
> >>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>; Doerr,
> >>>>>>> Martin
> >>>>>>> <martin.doerr at sap.com>; Robbin Ehn <robbin.ehn at oracle.com>;
> >>>>> hotspot-
> >>>>>>> runtime-dev at openjdk.java.net
> >>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes without
> >>>>>>> VMThread intervention
> >>>>>>>
> >>>>>>> Hi Patricio,
> >>>>>>>
> >>>>>>> On 3/04/2020 6:00 pm, Patricio Chilano wrote:
> >>>>>>>> Hi Martin,
> >>>>>>>>
> >>>>>>>> On 4/2/20 5:24 PM, Doerr, Martin wrote:
> >>>>>>>>> Hi Patricio,
> >>>>>>>>>
> >>>>>>>>> right, you got my point. Sorry for not explaining it in greater
> >>>>>>>>> detail
> >>>>>>>>> at the beginning. I was busy with other things.
> >>>>>>>>>
> >>>>>>>>> There should be a memory barrier ordering read of
> >> _pending_threads
> >>>>>>>>> (which is observed as 0) and whatever gets done when returning
> >> from
> >>>>>>>>> execute_direct.
> >>>>>>>>> I don't like relying on barriers which are hidden somewhere and
> >>>>>>>>> may be
> >>>>>>>>> subject to change.
> >>>>>>>> Yes, I agree with that. I added an explicit acquire fence at the
> >>>>>>>> end of
> >>>>>>>> Handshake::execute_direct() with a comment. Thanks for noticing
> it.
> >>>>>>> This is unclear to me:
> >>>>>>>
> >>>>>>> 281   // Prevent future loads from floating above the load of
> >>>>>>> _pending_threads
> >>>>>>>      282   // in is_completed(). This prevents reading stale data
> >>>>>>> modified
> >>>>>>> in the
> >>>>>>>      283   // handshake closure by the Handshakee.
> >>>>>>>      284   OrderAccess::acquire();
> >>>>>>>      285
> >>>>>>>      286   return op.executed();
> >>>>>>>
> >>>>>>> An acquire() by itself has no meaning, it has to be associated with a
> >>>>>>> release() and both have to be associated with the load,
> respectively,
> >>>>>>> store, of the same variable. From the comment it sounds like you
> just
> >>>>>>> want a loadload() barrier here.
> >>>>>>>
> >>>>>>> Other updates seem fine.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> David
> >>>>>>> -----
> >>>>>>>
> >>>>>>>>> I'd also prefer explicit barriers on the writers side over the
> >>>>>>>>> implicit ones from Atomic::add, because explicit barriers with
> >>>>>>>>> comments help a lot understanding the code.
> >>>>>>>>> (Not sure if you want to change that. Just my opinion.)
> >>>>>>>> I added a comment on top of Atomic::dec() on why we need a
> >> memory
> >>>>>>>> barrier there and the fact that we are already using a full memory
> >>>>>>>> fence.
> >>>>>>>> If you still think I should add an explicit barrier for code
> >>>>>>>> readability
> >>>>>>>> I can add it though.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Here is v3 which also addresses Dan's comments.
> >>>>>>>>
> >>>>>>>> Full:
> >>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/webrev/
> >>>>>>>> Inc:
> >>>>>>>>
> http://cr.openjdk.java.net/~pchilanomate/8240918/v3/inc/webrev/
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> Patricio
> >>>>>>>>> Best regards,
> >>>>>>>>> Martin
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Patricio Chilano <patricio.chilano.mateo at oracle.com>
> >>>>>>>>>> Sent: Donnerstag, 2. April 2020 19:56
> >>>>>>>>>> To: Robbin Ehn <robbin.ehn at oracle.com>; Doerr, Martin
> >>>>>>>>>> <martin.doerr at sap.com>; hotspot-runtime-
> >> dev at openjdk.java.net
> >>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
> >> without
> >>>>>>>>>> VMThread intervention
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 4/2/20 1:16 PM, Patricio Chilano wrote:
> >>>>>>>>>>> Hi Martin,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for looking at this.
> >>>>>>>>>>>
> >>>>>>>>>>> On 4/2/20 10:15 AM, Robbin Ehn wrote:
> >>>>>>>>>>>> Hi Martin,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2020-04-02 14:55, Doerr, Martin wrote:
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> first of all, thanks, Patricio, for doing this.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> One question:
> >>>>>>>>>>>>> _pending_threads is used for synchronizing threads.
> >>>>>>>>>>>>> Don't we need release / acquire semantics?
> >>>>>>>>>>>>> I guess otherwise handshaker can read stale data from
> >>>>> handshakee
> >>>>>>>>>>>>> after observing is_completed.
> >>>>>>>>>>>>> Or did I miss anything?
> >>>>>>>>>>>> Reading stale should be fine in this case, since the counter
> goes
> >>>>>>>>>>>> from n->0, thus you can only get false negatives from "bool
> >>>>>>>>>>>> is_completed()" ?
> >>>>>>>>>>> Right, as Robbin points out at most we will just have to loop
> >>>>>>>>>>> again.
> >>>>>>>>>>> Also, when the handshakee processes the handshake, right
> >> after
> >>>>>>>>>>> updating _pending_threads, it will either signal the
> >>>>>>>>>>> _processing_sem
> >>>>>>>>>>> if there are no direct operations, or it will signal the
> >>>>>>>>>>> _handshake_turn_sem, so that will provide the release
> already.
> >> The
> >>>>>>>>>>> next read in the loop should read the updated value. Are you
> >> okay
> >>>>> with
> >>>>>>>>>>> that?
> >>>>>>>>>> I realize you might be thinking of a different case now. You
> >>>>>>>>>> mean the
> >>>>>>>>>> handshaker could execute loads from code after
> >>>>>>>>>> Handshake::execute_direct() while it hasn't yet read
> >>>>> _pending_threads
> >>>>>>>>>> from is_completed()? Although there is a load_acquire() in
> >>>>>>>>>> ~ThreadsListHandle() that will prevent those issues you might
> >> argue
> >>>>> that
> >>>>>>>>>> we can't rely on code which might change.
> >>>>>>>>>> I think (not counting that load_acquire from
> ~ThreadsListHandle)
> >>>>>>>>>> the
> >>>>>>>>>> issue could appear for example if you created a
> ThreadsListHandle
> >>>>> first,
> >>>>>>>>>> then called Handshake::execute_direct(), and when returning
> you
> >> try
> >>>>> to
> >>>>>>>>>> read some field from the handshakee (you might want to
> assert
> >>>>>>> something
> >>>>>>>>>> for example). If there wasn't any ThreadsListHandle created
> >> before
> >>>>>>>>>> calling execute_direct() I think you are not allowed to access
> >>>>>>>>>> other
> >>>>>>>>>> JavaThread's data anyway since the thread might have exited
> >>>>>>>>>> already,
> >>>>>>>>>> unless you create a ThreadsListHandle which will already
> provide
> >>>>>>>>>> the
> >>>>>>>>>> acquire. So I think you are right and we might need a fence
> before
> >>>>>>>>>> returning from execute_direct() or add a comment that we are
> >>>>>>>>>> relying
> >>>>>>> on
> >>>>>>>>>> the load_acquire from ~ThreadsListHandle(). Is that what you
> >> were
> >>>>>>>>>> thinking?
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Patricio
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>> Patricio
> >>>>>>>>>>>> Thanks, Robbin
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>> Martin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
> >>>>>>>>>>>>>> bounces at openjdk.java.net> On Behalf Of Robbin Ehn
> >>>>>>>>>>>>>> Sent: Donnerstag, 2. April 2020 13:36
> >>>>>>>>>>>>>> To: Patricio Chilano <patricio.chilano.mateo at oracle.com>;
> >>>>>>>>>>>>>> hotspot-runtime-
> >>>>>>>>>>>>>> dev at openjdk.java.net
> >>>>>>>>>>>>>> Subject: Re: RFR 8240918: [REDO] Allow direct handshakes
> >>>>> without
> >>>>>>>>>>>>>> VMThread intervention
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Patricio,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for fixing, looks good.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> /Robbin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2020-04-01 19:45, Patricio Chilano wrote:
> >>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Please review this redo for 8230594. Since the reason
> >> behind
> >>>>> the
> >>>>>>>>>>>>>>> Windows
> >>>>>>>>>>>>>>> timeouts turned out to be 8240902, this patch is almost
> >>>>>>>>>>>>>>> identical
> >>>>>>>>>>>>>>> to the
> >>>>>>>>>>>>>>> reverse of the backout modulo the rebase.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> There is only one small fix for an issue not identified in
> the
> >>>>>>>>>>>>>>> original
> >>>>>>>>>>>>>>> patch which I explained in the comments. To avoid this
> >>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>> issue of
> >>>>>>>>>>>>>>> deleting a JavaThread that called Thread::destroy_vm()
> >> while
> >>>>>>> there
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>> handshakers blocked in the handshake_turn_sem
> >> semaphore,
> >>>>> I
> >>>>>>>>>> added a
> >>>>>>>>>>>>>> check
> >>>>>>>>>>>>>>> to verify the JavaThread in question is not protected by
> >> some
> >>>>>>>>>>>>>>> ThreadsList reference before attempting the delete it. In
> >> case
> >>>>>>>>>>>>>>> it is
> >>>>>>>>>>>>>>> protected, we need to at least clear the value for
> >> _thread_key
> >>>>> to
> >>>>>>>>>>>>>>> avoid
> >>>>>>>>>>>>>>> possible infinite loops at thread exit (this can happen in
> >>>>>>>>>>>>>>> Solaris).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The only other change is that I removed the check for
> local
> >>>>>>>>>>>>>>> polls in
> >>>>>>>>>>>>>>> Handshake::execute_direct() since after the backout,
> >> thread-
> >>>>> local
> >>>>>>>>>>>>>>> handshakes was implemented for arm32 which was the
> >> only
> >>>>>>>>>> remaining
> >>>>>>>>>>>>>> platform.
> >>>>>>>>>>>>>>> I tested it several times in mach5 tiers 1-6 and once in t7
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> saw no
> >>>>>>>>>>>>>>> failures.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Bug:
> >>>>>>>>>>>>>>>          https://bugs.openjdk.java.net/browse/JDK-8240918
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Webrev:
> >>>>>>>>>>>>>>>
> >>>>> http://cr.openjdk.java.net/~pchilanomate/8240918/v1/webrev/
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Patricio



More information about the hotspot-runtime-dev mailing list