RFR(S) Contended Locking fast enter bucket (8061553)

Tue Nov 4 18:26:02 UTC 2014

The cleanup is turning into a bigger change than the fast enter
bucket itself so I'm spinning the cleanup into a new bug:

     JDK-8062851 cleanup ObjectMonitor offset adjustments
     https://bugs.openjdk.java.net/browse/JDK-8062851

Yes, this means that the Contended Locking cleanup bucket has reopened
for yet another change...

We'll get back to "fast enter" after the dust has settled...

Dan

On 11/3/14 6:59 PM, Daniel D. Daugherty wrote:
> David,
>
> Thanks for the review! As usual, replies are embedded below...
>
>
> On 11/2/14 9:44 PM, David Holmes wrote:
>> Hi Dan,
>>
>> Looks good.
>
> Thanks!
>
>
>> Couple of nits and one semantic query below ...
>>
>> src/cpu/sparc/vm/macroAssembler_sparc.cpp
>>
>> Formatting changes were a bit of a distraction.
>
> Yes, I have no idea what got into me. Normally I do formatting
> changes separately so the noise does not distract...
>
> It turns out there is a constant defined that should be used
> instead of all these literal '2's:
>
> src/share/vm/oops/markOop.hpp:         monitor_value            = 2
>
> Typically used as follows:
>
> src/cpu/x86/vm/macroAssembler_x86.cpp:  int owner_offset = 
> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value;
>
> I will clean this up just for the files that I'm touching as
> part of this fix.
>
>
>>
>> ---
>>
>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>
>> Formatting changes were a bit of a distraction.
>
> Same reply as for macroAssembler_sparc.cpp.
>
>
>> 1929     // unconditionally set stackBox->_displaced_header = 3
>> 1930     movptr(Address(boxReg, 0), 
>> (int32_t)intptr_t(markOopDesc::unused_mark()));
>>
>> At 1870 we refer to box rather than stackBox. Also it takes some 
>> sleuthing to realize that "3" here is somehow a pseudonym for 
>> unused_mark(). Back up at 1808 we have a to-do:
>>
>> 1808     //   use markOop::unused_mark() instead of "3".
>>
>> so the current change seems to be implementing that, even though 
>> other uses of "3" are left untouched.
>
> I'll take a look at cleaning those up also...
>
> In some cases markOopDesc::marked_value will work for the literal '3',
> but in other cases we'll use markOop::unused_mark():
>
>   static markOop unused_mark() {
>     return (markOop) marked_value;
>   }
>
> to save us the noise of the (markOop) cast.
>
>
>> ---
>>
>> src/share/vm/runtime/sharedRuntime.cpp
>>
>> 1794 JRT_BLOCK_ENTRY(void, 
>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* 
>> lock, JavaThread* thread))
>> 1795   if (!SafepointSynchronize::is_synchronizing()) {
>> 1796     if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) 
>> return;
>>
>> Is it necessary to check is_synchronizing? If we are executing this 
>> code we are not at a safepoint and the quick_enter wont change that, 
>> so I'm not sure what we are guarding against.
>
> So this first state checker:
>
> src/share/vm/runtime/safepoint.hpp:
> inline static bool is_synchronizing()  { return _state == 
> _synchronizing;  }
>
> means that we want to go to a safepoint and:
>
> inline static bool is_at_safepoint()   { return _state == 
> _synchronized;  }
>
> means that we are at a safepoint. Dice's optimization bails out if
> we want to go to a safepoint and ObjectSynchronizer::quick_enter()
> has a "No_Safepoint_Verifier nsv" in it so we're expecting that
> code to be quick (and not go to a safepoint). I'm not seeing
> anything obvious....
>
> Sometimes we have to be careful with JavaThread suspend requests and
> monitor acquisition, but I don't think that's a problem here... In
> order for the "suspend requesting" thread to be surprised, the suspend
> API, e.g., JVM/TI SuspendThread() has to return to the caller and then
> the suspend target has do something unexpected like acquire a monitor
> that it was previously blocked upon when it was suspended. We've had
> bugs like that in the past... In this optimization case, our target
> thread is not blocked on a contended monitor...
>
> In this particular case, the "suspend requesting" thread will set the
> suspend request state on the target thread, but the target thread is
> busy trying to enter this uncontended monitor (quickly). So the
> "suspend requesting" thread, will request a no-op safepoint, but it
> won't return from the suspend API until that safepoint completes.
> The safepoint won't complete until the target thread is done acquiring
> the previously uncontended monitor... so the target thread will be
> suspended while holding the previous uncontended monitor and the
> "suspend requesting" thread will return from the suspend API all
> happy...
>
> Well, I don't see the reason either so I'll have to ping Dave Dice
> and Karen Kinnear to see if either of them can fill in the history
> here. This could be an abundance of caution case.
>
>
>> ---
>>
>> src/share/vm/runtime/synchronizer.cpp
>>
>> Minor nit: line 153 the usual acronym is NPE (for 
>> NullPointerException) not NPX
>
> I'll do a search for uses of NPX and other uses of 'X' in exception
> acronyms...
>
>
>>
>> Nit:  159     Thread * const ox
>>
>> Please change ox to owner.
>
> Will do.
>
> Thanks again for the review!
>
> Dan
>
>
>>
>> ---
>>
>> Thanks,
>> David
>>
>>
>>
>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I have the Contended Locking fast enter bucket ready for review.
>>>
>>> The code changes in this bucket are primarily a quick_enter()
>>> function that works on inflated but uncontended Java monitors.
>>> This quick_enter() function is used on the "slow path" for Java
>>> Monitor enter operations when the built-in "fast path" (read
>>> assembly code) doesn't work.
>>>
>>> This work is being tracked by the following bug ID:
>>>
>>>      JDK-8061553 Contended Locking fast enter bucket
>>>      https://bugs.openjdk.java.net/browse/JDK-8061553
>>>
>>> Here is the webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/
>>>
>>> Here is the JEP link:
>>>
>>>      https://bugs.openjdk.java.net/browse/JDK-8046133
>>>
>>> 8061553 summary of changes:
>>>
>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object()
>>>
>>> - clean up spacing around some
>>>    'ObjectMonitor::owner_offset_in_bytes() - 2' uses
>>> - remove optional (EmitSync & 64) code
>>> - change from cmp() to andcc() so icc.zf flag is set
>>>
>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock()
>>>
>>> - remove optional (EmitSync & 2) code
>>> - rewrite LP64 inflated lock code that tries to CAS in
>>>    the new owner value to be more efficient
>>>
>>> interfaceSupport.hpp:
>>>
>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a
>>>    JRT_BLOCK_ENTRY into two pieces.
>>>
>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C()
>>>
>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY
>>>    to permit ObjectSynchronizer::quick_enter() call
>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work
>>>    to revert to JRT_ENTRY_NO_ASYNC-like semantics
>>>
>>> synchronizer.[ch]pp:
>>>
>>> - add ObjectSynchronizer::quick_enter() for entering an
>>>    inflated but unowned Java monitor without thread state
>>>    changes
>>>
>>> Testing:
>>>
>>> - Aurora Adhoc RT/SVC baseline batch
>>> - JPRT test jobs
>>> - MonitorEnterStresser micro-benchmark (in process)
>>> - CallTimerGrid stress testing (in process)
>>> - Aurora performance testing:
>>>    - out of the box for the "promotion" and 32-bit server configs
>>>    - heavy weight monitors for the "promotion" and 32-bit server 
>>> configs
>>>      (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
>>>      (in process)
>>>
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>
>