RFR(S) Contended Locking fast exit bucket (8073165)
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Mar 13 13:53:08 UTC 2015
David,
Thanks for the thorough review! It will take me some time to
read through and address your comments so please look for
another reply early next week.
Dan
On 3/12/15 10:50 PM, David Holmes wrote:
> Hi Dan,
>
> Well that was fun ... :)
>
> On 10/03/2015 3:11 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have the Contended Locking fast exit bucket ready for review.
>>
>> The code changes in this bucket are primarily assembly language
>> tweaks that come into play when an inflated ObjectMonitor is in
>> use. There are tweaks that apply to the default code path and
>> there are tweaks that only apply when an experimental flag is
>> set. This code code review is focused on the default code path
>> or the "out of the box" configuration.
>>
>> This work is being tracked by the following bug ID:
>>
>> JDK-8073165 Contended Locking fast exit bucket
>> https://bugs.openjdk.java.net/browse/JDK-8073165
>>
>> Here is the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8073165-webrev/0-jdk9-hs-rt/
>>
>> Here is the JEP link:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8046133
>>
>> Testing:
>>
>> - Aurora Adhoc RT/SVC baseline batch
>> - JPRT test jobs
>> - MonitorExitStresser micro-benchmark (in process)
>> - MonitorEnterExitStresser micro-benchmark (in process)
>> - CallTimerGrid stress testing (in process)
>> - Aurora performance testing:
>> - out of the box for the "promotion" and 32-bit server configs
>> - heavy weight monitors for the "promotion" and 32-bit server configs
>> (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
>> (in process)
>>
>>
>> 8073165 summary of changes:
>>
>> macroAssembler_sparc.cpp: MacroAssembler::compiler_unlock_object()
>>
>> - default (OOTB) EmitSync value == 0
>> - an _owner field optimization is in the non-default path under
>> a (EmitSync & 1024) check
>>
>> - classic lock release without 1-0 support is in the non-default
>> path under a (EmitSync & 512) check
>>
>> - an EntryList, cxq and _succ field optimization is in the
>> non-default path under a (EmitSync & 16384) check
>>
>> - the default path is now the 1-0 form that avoids CAS and membar
>> in the common case; see the "1-0 exit" section in objectMonitor.cpp
>
> I don't pretend that I could follow all these changes (I don't read
> sparc very well) - the main thing of interest is the EmitSync==0 path
> - everything else is just experimental (did you actually test to any
> great extent with all these different possible EmitSync values?).
>
> So looking at the default path ... I could not figure out what logic
> this (pre-existing) code is effecting:
>
> 3071 ld_ptr(Address(Rmark,
> OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions)), Rbox);
> 3072 orcc(Rbox, G0, G0);
> 3073 brx(Assembler::notZero, false, Assembler::pn, done);
>
> It seems to be a check for:
>
> if (recursions != 0) return;
>
> which would be fine except where is recursions decremented ??
>
> But that aside the description seems reasonable.
>
>> sharedRuntime_sparc.cpp:
>>
>> - add JavaThread* param for
>> SharedRuntime::complete_monitor_unlocking_C call
>
> Ok.
>
>> macroAssembler_x86.cpp: MacroAssembler::fast_unlock()
>>
>> - with all the new/modified comments and all the optional
>> code path updates, it is hard to see that the only real
>> changes to the default code path are 1) the _owner==Self
>> check has been made optional and 2) the optional succ
>> field check for the slow path is now on the default path
>>
>> - default (OOTB) EmitSync value == 0
>> - remove optional (EmitSync & 8) code
>> - refactor code in MacroAssembler::fast_unlock
>> - remove old mfence() support
>
> So I had expected to see this change conditional on the runtime
> processor type. But then I checked Dave's blog and saw this was
> determined 5+ years ago (!!!) so those older processors are no longer
> a consideration. I do hope the performance tests have been re-run
> since then though ;-)
>
>> - an _owner field optimization is in the non-default path under
>> a (EmitSync & 1024) check
>
> Doesn't the comment:
>
> 2163 if (EmitSync & 1024) {
> 2164 // Don't bother to ratify that m_Owner==Self
>
> apply to else part?
>
>> - an _owner field optimization is in the non-default path under
>> a (EmitSync & 16) check
>> - a barrier optimization is in the non-default path under a
>> (EmitSync & 32) check
>> - remove old AMD optional work around
>
> 2229 // box is really RAX -- the following CMPXCHG depends on
> that binding
> 2230 // cmpxchg R,[M] is equivalent to rax = CAS(M,rax,R)
> 2231
> 2232 movptr(boxReg, (int32_t)NULL_WORD); // box is really RAX
>
> You've repeated the comment about box.
>
>> sharedRuntime_x86_32.cpp:
>> sharedRuntime_x86_64.cpp:
>>
>> - add thread param for SharedRuntime::complete_monitor_unlocking_C
>> call
>
> Ok.
>
>> macro.[ch]pp: PhaseMacroExpand::make_slow_call()
>>
>> - add param2 and update callers
>
> Ok.
>
>> runtime.[ch]pp: OptoRuntime::complete_monitor_exit_Type()
>>
>> - add JavaThread* parameter
>
> Ok
>
>> sharedRuntime.[ch]pp: SharedRuntime::complete_monitor_unlocking_C()
>>
>> - add JavaThread* parameter
>
> Ok. So all that passing through of the Thread was just to avoid a call
> to JavaThread::current()? I thought that was already optimized to just
> be a register load (at least on some platforms) ?
>
>> synchronizer.cpp: ObjectSynchronizer::omRelease()
>>
>> - cleanup in use list traversal
>> - add assert that ObjectMonitor being released was
>> actually found and extracted
>>
>> synchronizer.[ch]pp:
>>
>> - add some comments in various places
>> - rename walk_monitor_list() -> deflate_monitor_list()
>> and update callers
>> - minor logic refactoring
>
> Generally okay.
>
> Minor nit: you changed inuse to "in use" but for the "inuse list" it
> should really be "in-use list" (and not in-use-list as appears in a
> couple of existing places)
>
> 1507 // if deflate_monitor succeeded,/moribund
>
> Not sure what that comment is supposed to mean :)
>
>> thread.hpp:
>>
>> - add _TSelf field for mfence() replacement
>
> Hmmmm, so we add a new pointer field to the Thread structure purely
> for use on x86 if running with a particular experimental option set ?
> That doesn't seem like it fits with our normal considerations for
> space/time trade-offs. Maybe the xchg T,T->TSelf stuff should be under
> a compile-flag instead?
>
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>
> Cheers,
> David
>
>> Dan
More information about the hotspot-runtime-dev
mailing list