RFR(S) Contended Locking fast exit bucket (8073165)

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Mar 13 13:53:08 UTC 2015


David,

Thanks for the thorough review! It will take me some time to
read through and address your comments so please look for
another reply early next week.

Dan


On 3/12/15 10:50 PM, David Holmes wrote:
> Hi Dan,
>
> Well that was fun ... :)
>
> On 10/03/2015 3:11 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have the Contended Locking fast exit bucket ready for review.
>>
>> The code changes in this bucket are primarily assembly language
>> tweaks that come into play when an inflated ObjectMonitor is in
>> use. There are tweaks that apply to the default code path and
>> there are tweaks that only apply when an experimental flag is
>> set. This code code review is focused on the default code path
>> or the "out of the box" configuration.
>>
>> This work is being tracked by the following bug ID:
>>
>>      JDK-8073165 Contended Locking fast exit bucket
>>      https://bugs.openjdk.java.net/browse/JDK-8073165
>>
>> Here is the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8073165-webrev/0-jdk9-hs-rt/
>>
>> Here is the JEP link:
>>
>>      https://bugs.openjdk.java.net/browse/JDK-8046133
>>
>> Testing:
>>
>> - Aurora Adhoc RT/SVC baseline batch
>> - JPRT test jobs
>> - MonitorExitStresser micro-benchmark (in process)
>> - MonitorEnterExitStresser micro-benchmark (in process)
>> - CallTimerGrid stress testing (in process)
>> - Aurora performance testing:
>>    - out of the box for the "promotion" and 32-bit server configs
>>    - heavy weight monitors for the "promotion" and 32-bit server configs
>>      (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
>>      (in process)
>>
>>
>> 8073165 summary of changes:
>>
>> macroAssembler_sparc.cpp: MacroAssembler::compiler_unlock_object()
>>
>> - default (OOTB) EmitSync value == 0
>> - an _owner field optimization is in the non-default path under
>>    a (EmitSync & 1024) check
>>
>> - classic lock release without 1-0 support is in the non-default
>>    path under a (EmitSync & 512) check
>>
>> - an EntryList, cxq and _succ field optimization is in the
>>    non-default path under a (EmitSync & 16384) check
>>
>> - the default path is now the 1-0 form that avoids CAS and membar
>>    in the common case; see the "1-0 exit" section in objectMonitor.cpp
>
> I don't pretend that I could follow all these changes (I don't read 
> sparc very well) - the main thing of interest is the EmitSync==0 path 
> - everything else is just experimental (did you actually test to any 
> great extent with all these different possible EmitSync values?).
>
> So looking at the default path ... I could not figure out what logic 
> this (pre-existing) code is effecting:
>
> 3071      ld_ptr(Address(Rmark,
> OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions)), Rbox);
> 3072      orcc(Rbox, G0, G0);
> 3073      brx(Assembler::notZero, false, Assembler::pn, done);
>
> It seems to be a check for:
>
> if (recursions != 0) return;
>
> which would be fine except where is recursions decremented ??
>
> But that aside the description seems reasonable.
>
>> sharedRuntime_sparc.cpp:
>>
>> - add JavaThread* param for
>>    SharedRuntime::complete_monitor_unlocking_C call
>
> Ok.
>
>> macroAssembler_x86.cpp: MacroAssembler::fast_unlock()
>>
>> - with all the new/modified comments and all the optional
>>    code path updates, it is hard to see that the only real
>>    changes to the default code path are 1) the _owner==Self
>>    check has been made optional and 2) the optional succ
>>    field check for the slow path is now on the default path
>>
>> - default (OOTB) EmitSync value == 0
>> - remove optional (EmitSync & 8) code
>> - refactor code in MacroAssembler::fast_unlock
>> - remove old mfence() support
>
> So I had expected to see this change conditional on the runtime 
> processor type. But then I checked Dave's blog and saw this was 
> determined 5+ years ago (!!!) so those older processors are no longer 
> a consideration. I do hope the performance tests have been re-run 
> since then though ;-)
>
>> - an _owner field optimization is in the non-default path under
>>    a (EmitSync & 1024) check
>
> Doesn't the comment:
>
> 2163     if (EmitSync & 1024) {
> 2164       // Don't bother to ratify that m_Owner==Self
>
> apply to else part?
>
>> - an _owner field optimization is in the non-default path under
>>    a (EmitSync & 16) check
>> - a barrier optimization is in the non-default path under a
>>    (EmitSync & 32) check
>> - remove old AMD optional work around
>
> 2229       // box is really RAX -- the following CMPXCHG depends on 
> that binding
> 2230       // cmpxchg R,[M] is equivalent to rax = CAS(M,rax,R)
> 2231
> 2232       movptr(boxReg, (int32_t)NULL_WORD);     // box is really RAX
>
> You've repeated the comment about box.
>
>> sharedRuntime_x86_32.cpp:
>> sharedRuntime_x86_64.cpp:
>>
>> - add thread param for SharedRuntime::complete_monitor_unlocking_C
>>    call
>
> Ok.
>
>> macro.[ch]pp: PhaseMacroExpand::make_slow_call()
>>
>> - add param2 and update callers
>
> Ok.
>
>> runtime.[ch]pp: OptoRuntime::complete_monitor_exit_Type()
>>
>> - add JavaThread* parameter
>
> Ok
>
>> sharedRuntime.[ch]pp: SharedRuntime::complete_monitor_unlocking_C()
>>
>> - add JavaThread* parameter
>
> Ok. So all that passing through of the Thread was just to avoid a call 
> to JavaThread::current()? I thought that was already optimized to just 
> be a register load (at least on some platforms) ?
>
>> synchronizer.cpp: ObjectSynchronizer::omRelease()
>>
>> - cleanup in use list traversal
>> - add assert that ObjectMonitor being released was
>>    actually found and extracted
>>
>> synchronizer.[ch]pp:
>>
>> - add some comments in various places
>> - rename walk_monitor_list() -> deflate_monitor_list()
>>    and update callers
>> - minor logic refactoring
>
> Generally okay.
>
> Minor nit: you changed inuse to "in use" but for the "inuse list" it 
> should really be "in-use list" (and not in-use-list as appears in a 
> couple of existing places)
>
> 1507       // if deflate_monitor succeeded,/moribund
>
> Not sure what that comment is supposed to mean :)
>
>> thread.hpp:
>>
>> - add _TSelf field for mfence() replacement
>
> Hmmmm, so we add a new pointer field to the Thread structure purely 
> for use on x86 if running with a particular experimental option set ? 
> That doesn't seem like it fits with our normal considerations for 
> space/time trade-offs. Maybe the xchg T,T->TSelf stuff should be under 
> a compile-flag instead?
>
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>
> Cheers,
> David
>
>> Dan



More information about the hotspot-runtime-dev mailing list