RFR(S) Contended Locking fast exit bucket (8073165)
David Holmes
david.holmes at oracle.com
Fri Mar 13 04:50:09 UTC 2015
Hi Dan,
Well that was fun ... :)
On 10/03/2015 3:11 AM, Daniel D. Daugherty wrote:
> Greetings,
>
> I have the Contended Locking fast exit bucket ready for review.
>
> The code changes in this bucket are primarily assembly language
> tweaks that come into play when an inflated ObjectMonitor is in
> use. There are tweaks that apply to the default code path and
> there are tweaks that only apply when an experimental flag is
> set. This code code review is focused on the default code path
> or the "out of the box" configuration.
>
> This work is being tracked by the following bug ID:
>
> JDK-8073165 Contended Locking fast exit bucket
> https://bugs.openjdk.java.net/browse/JDK-8073165
>
> Here is the webrev URL:
>
> http://cr.openjdk.java.net/~dcubed/8073165-webrev/0-jdk9-hs-rt/
>
> Here is the JEP link:
>
> https://bugs.openjdk.java.net/browse/JDK-8046133
>
> Testing:
>
> - Aurora Adhoc RT/SVC baseline batch
> - JPRT test jobs
> - MonitorExitStresser micro-benchmark (in process)
> - MonitorEnterExitStresser micro-benchmark (in process)
> - CallTimerGrid stress testing (in process)
> - Aurora performance testing:
> - out of the box for the "promotion" and 32-bit server configs
> - heavy weight monitors for the "promotion" and 32-bit server configs
> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
> (in process)
>
>
> 8073165 summary of changes:
>
> macroAssembler_sparc.cpp: MacroAssembler::compiler_unlock_object()
>
> - default (OOTB) EmitSync value == 0
> - an _owner field optimization is in the non-default path under
> a (EmitSync & 1024) check
>
> - classic lock release without 1-0 support is in the non-default
> path under a (EmitSync & 512) check
>
> - an EntryList, cxq and _succ field optimization is in the
> non-default path under a (EmitSync & 16384) check
>
> - the default path is now the 1-0 form that avoids CAS and membar
> in the common case; see the "1-0 exit" section in objectMonitor.cpp
I don't pretend that I could follow all these changes (I don't read
sparc very well) - the main thing of interest is the EmitSync==0 path -
everything else is just experimental (did you actually test to any great
extent with all these different possible EmitSync values?).
So looking at the default path ... I could not figure out what logic
this (pre-existing) code is effecting:
3071 ld_ptr(Address(Rmark,
OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions)), Rbox);
3072 orcc(Rbox, G0, G0);
3073 brx(Assembler::notZero, false, Assembler::pn, done);
It seems to be a check for:
if (recursions != 0) return;
which would be fine except where is recursions decremented ??
But that aside the description seems reasonable.
> sharedRuntime_sparc.cpp:
>
> - add JavaThread* param for
> SharedRuntime::complete_monitor_unlocking_C call
Ok.
> macroAssembler_x86.cpp: MacroAssembler::fast_unlock()
>
> - with all the new/modified comments and all the optional
> code path updates, it is hard to see that the only real
> changes to the default code path are 1) the _owner==Self
> check has been made optional and 2) the optional succ
> field check for the slow path is now on the default path
>
> - default (OOTB) EmitSync value == 0
> - remove optional (EmitSync & 8) code
> - refactor code in MacroAssembler::fast_unlock
> - remove old mfence() support
So I had expected to see this change conditional on the runtime
processor type. But then I checked Dave's blog and saw this was
determined 5+ years ago (!!!) so those older processors are no longer a
consideration. I do hope the performance tests have been re-run since
then though ;-)
> - an _owner field optimization is in the non-default path under
> a (EmitSync & 1024) check
Doesn't the comment:
2163 if (EmitSync & 1024) {
2164 // Don't bother to ratify that m_Owner==Self
apply to else part?
> - an _owner field optimization is in the non-default path under
> a (EmitSync & 16) check
> - a barrier optimization is in the non-default path under a
> (EmitSync & 32) check
> - remove old AMD optional work around
2229 // box is really RAX -- the following CMPXCHG depends on that
binding
2230 // cmpxchg R,[M] is equivalent to rax = CAS(M,rax,R)
2231
2232 movptr(boxReg, (int32_t)NULL_WORD); // box is really RAX
You've repeated the comment about box.
> sharedRuntime_x86_32.cpp:
> sharedRuntime_x86_64.cpp:
>
> - add thread param for SharedRuntime::complete_monitor_unlocking_C
> call
Ok.
> macro.[ch]pp: PhaseMacroExpand::make_slow_call()
>
> - add param2 and update callers
Ok.
> runtime.[ch]pp: OptoRuntime::complete_monitor_exit_Type()
>
> - add JavaThread* parameter
Ok
> sharedRuntime.[ch]pp: SharedRuntime::complete_monitor_unlocking_C()
>
> - add JavaThread* parameter
Ok. So all that passing through of the Thread was just to avoid a call
to JavaThread::current()? I thought that was already optimized to just
be a register load (at least on some platforms) ?
> synchronizer.cpp: ObjectSynchronizer::omRelease()
>
> - cleanup in use list traversal
> - add assert that ObjectMonitor being released was
> actually found and extracted
>
> synchronizer.[ch]pp:
>
> - add some comments in various places
> - rename walk_monitor_list() -> deflate_monitor_list()
> and update callers
> - minor logic refactoring
Generally okay.
Minor nit: you changed inuse to "in use" but for the "inuse list" it
should really be "in-use list" (and not in-use-list as appears in a
couple of existing places)
1507 // if deflate_monitor succeeded,/moribund
Not sure what that comment is supposed to mean :)
> thread.hpp:
>
> - add _TSelf field for mfence() replacement
Hmmmm, so we add a new pointer field to the Thread structure purely for
use on x86 if running with a particular experimental option set ? That
doesn't seem like it fits with our normal considerations for space/time
trade-offs. Maybe the xchg T,T->TSelf stuff should be under a
compile-flag instead?
>
> Thanks, in advance, for any comments, questions or suggestions.
Cheers,
David
> Dan
More information about the hotspot-runtime-dev
mailing list