RFR(S) Contended Locking fast exit bucket (8073165)

Coleen Phillimore coleen.phillimore at oracle.com
Thu Mar 26 00:56:27 UTC 2015


Dan,

I've finally gotten around to seeing this.  What is this?

+   if (EmitSync & 1024) {

and

+   if (EmitSync & 512) {


I thought this sort of unintelligible code was going to get cleaned up? 
   Do these constants mean anything?  Other than being powers of two?

+     if (EmitSync & 16384) {


This one isn't a power of two.

In 
http://cr.openjdk.java.net/~dcubed/8073165-webrev/0-jdk9-hs-rt/src/share/vm/runtime/synchronizer.cpp.udiff.html

+  assert(inusetally == Self->omInUseCount, "in use count off");


is this in_use_tally ?   is  curmidinuse

cur_mid_in_use?

I wonder what gOmInUseList is.

global object monitor in use list?

http://cr.openjdk.java.net/~dcubed/8073165-webrev/0-jdk9-hs-rt/src/share/vm/runtime/synchronizer.hpp.udiff.html

deflate_monitor_list appears to be  more descriptive.

Sorry I can only comment on surface things.  It's a shame this code has 
to be in macro assembler.   It appears to be called from 
generate_native_wrapper so affects performance of synchronized native 
methods?


Thanks,
Coleen


On 3/13/15, 12:50 AM, David Holmes wrote:
> Hi Dan,
>
> Well that was fun ... :)
>
> On 10/03/2015 3:11 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have the Contended Locking fast exit bucket ready for review.
>>
>> The code changes in this bucket are primarily assembly language
>> tweaks that come into play when an inflated ObjectMonitor is in
>> use. There are tweaks that apply to the default code path and
>> there are tweaks that only apply when an experimental flag is
>> set. This code code review is focused on the default code path
>> or the "out of the box" configuration.
>>
>> This work is being tracked by the following bug ID:
>>
>>      JDK-8073165 Contended Locking fast exit bucket
>>      https://bugs.openjdk.java.net/browse/JDK-8073165
>>
>> Here is the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8073165-webrev/0-jdk9-hs-rt/
>>
>> Here is the JEP link:
>>
>>      https://bugs.openjdk.java.net/browse/JDK-8046133
>>
>> Testing:
>>
>> - Aurora Adhoc RT/SVC baseline batch
>> - JPRT test jobs
>> - MonitorExitStresser micro-benchmark (in process)
>> - MonitorEnterExitStresser micro-benchmark (in process)
>> - CallTimerGrid stress testing (in process)
>> - Aurora performance testing:
>>    - out of the box for the "promotion" and 32-bit server configs
>>    - heavy weight monitors for the "promotion" and 32-bit server configs
>>      (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
>>      (in process)
>>
>>
>> 8073165 summary of changes:
>>
>> macroAssembler_sparc.cpp: MacroAssembler::compiler_unlock_object()
>>
>> - default (OOTB) EmitSync value == 0
>> - an _owner field optimization is in the non-default path under
>>    a (EmitSync & 1024) check
>>
>> - classic lock release without 1-0 support is in the non-default
>>    path under a (EmitSync & 512) check
>>
>> - an EntryList, cxq and _succ field optimization is in the
>>    non-default path under a (EmitSync & 16384) check
>>
>> - the default path is now the 1-0 form that avoids CAS and membar
>>    in the common case; see the "1-0 exit" section in objectMonitor.cpp
>
> I don't pretend that I could follow all these changes (I don't read 
> sparc very well) - the main thing of interest is the EmitSync==0 path 
> - everything else is just experimental (did you actually test to any 
> great extent with all these different possible EmitSync values?).
>
> So looking at the default path ... I could not figure out what logic 
> this (pre-existing) code is effecting:
>
> 3071      ld_ptr(Address(Rmark,
> OM_OFFSET_NO_MONITOR_VALUE_TAG(recursions)), Rbox);
> 3072      orcc(Rbox, G0, G0);
> 3073      brx(Assembler::notZero, false, Assembler::pn, done);
>
> It seems to be a check for:
>
> if (recursions != 0) return;
>
> which would be fine except where is recursions decremented ??
>
> But that aside the description seems reasonable.
>
>> sharedRuntime_sparc.cpp:
>>
>> - add JavaThread* param for
>>    SharedRuntime::complete_monitor_unlocking_C call
>
> Ok.
>
>> macroAssembler_x86.cpp: MacroAssembler::fast_unlock()
>>
>> - with all the new/modified comments and all the optional
>>    code path updates, it is hard to see that the only real
>>    changes to the default code path are 1) the _owner==Self
>>    check has been made optional and 2) the optional succ
>>    field check for the slow path is now on the default path
>>
>> - default (OOTB) EmitSync value == 0
>> - remove optional (EmitSync & 8) code
>> - refactor code in MacroAssembler::fast_unlock
>> - remove old mfence() support
>
> So I had expected to see this change conditional on the runtime 
> processor type. But then I checked Dave's blog and saw this was 
> determined 5+ years ago (!!!) so those older processors are no longer 
> a consideration. I do hope the performance tests have been re-run 
> since then though ;-)
>
>> - an _owner field optimization is in the non-default path under
>>    a (EmitSync & 1024) check
>
> Doesn't the comment:
>
> 2163     if (EmitSync & 1024) {
> 2164       // Don't bother to ratify that m_Owner==Self
>
> apply to else part?
>
>> - an _owner field optimization is in the non-default path under
>>    a (EmitSync & 16) check
>> - a barrier optimization is in the non-default path under a
>>    (EmitSync & 32) check
>> - remove old AMD optional work around
>
> 2229       // box is really RAX -- the following CMPXCHG depends on 
> that binding
> 2230       // cmpxchg R,[M] is equivalent to rax = CAS(M,rax,R)
> 2231
> 2232       movptr(boxReg, (int32_t)NULL_WORD);     // box is really RAX
>
> You've repeated the comment about box.
>
>> sharedRuntime_x86_32.cpp:
>> sharedRuntime_x86_64.cpp:
>>
>> - add thread param for SharedRuntime::complete_monitor_unlocking_C
>>    call
>
> Ok.
>
>> macro.[ch]pp: PhaseMacroExpand::make_slow_call()
>>
>> - add param2 and update callers
>
> Ok.
>
>> runtime.[ch]pp: OptoRuntime::complete_monitor_exit_Type()
>>
>> - add JavaThread* parameter
>
> Ok
>
>> sharedRuntime.[ch]pp: SharedRuntime::complete_monitor_unlocking_C()
>>
>> - add JavaThread* parameter
>
> Ok. So all that passing through of the Thread was just to avoid a call 
> to JavaThread::current()? I thought that was already optimized to just 
> be a register load (at least on some platforms) ?
>
>> synchronizer.cpp: ObjectSynchronizer::omRelease()
>>
>> - cleanup in use list traversal
>> - add assert that ObjectMonitor being released was
>>    actually found and extracted
>>
>> synchronizer.[ch]pp:
>>
>> - add some comments in various places
>> - rename walk_monitor_list() -> deflate_monitor_list()
>>    and update callers
>> - minor logic refactoring
>
> Generally okay.
>
> Minor nit: you changed inuse to "in use" but for the "inuse list" it 
> should really be "in-use list" (and not in-use-list as appears in a 
> couple of existing places)
>
> 1507       // if deflate_monitor succeeded,/moribund
>
> Not sure what that comment is supposed to mean :)
>
>> thread.hpp:
>>
>> - add _TSelf field for mfence() replacement
>
> Hmmmm, so we add a new pointer field to the Thread structure purely 
> for use on x86 if running with a particular experimental option set ? 
> That doesn't seem like it fits with our normal considerations for 
> space/time trade-offs. Maybe the xchg T,T->TSelf stuff should be under 
> a compile-flag instead?
>
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>
> Cheers,
> David
>
>> Dan



More information about the hotspot-runtime-dev mailing list