RFR(S) Contended Locking fast enter bucket (8061553)

Tue Nov 4 14:46:51 UTC 2014

On 11/4/14 12:03 AM, David Holmes wrote:
> Hi Dan,
>
> One follow up deep below ...
>
> On 4/11/2014 11:59 AM, Daniel D. Daugherty wrote:
>> David,
>>
>> Thanks for the review! As usual, replies are embedded below...
>>
>>
>> On 11/2/14 9:44 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Looks good.
>>
>> Thanks!
>>
>>
>>> Couple of nits and one semantic query below ...
>>>
>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp
>>>
>>> Formatting changes were a bit of a distraction.
>>
>> Yes, I have no idea what got into me. Normally I do formatting
>> changes separately so the noise does not distract...
>>
>> It turns out there is a constant defined that should be used
>> instead of all these literal '2's:
>>
>> src/share/vm/oops/markOop.hpp:         monitor_value = 2
>>
>> Typically used as follows:
>>
>> src/cpu/x86/vm/macroAssembler_x86.cpp:  int owner_offset =
>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value;
>>
>> I will clean this up just for the files that I'm touching as
>> part of this fix.
>>
>>
>>>
>>> ---
>>>
>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>
>>> Formatting changes were a bit of a distraction.
>>
>> Same reply as for macroAssembler_sparc.cpp.
>>
>>
>>> 1929     // unconditionally set stackBox->_displaced_header = 3
>>> 1930     movptr(Address(boxReg, 0),
>>> (int32_t)intptr_t(markOopDesc::unused_mark()));
>>>
>>> At 1870 we refer to box rather than stackBox. Also it takes some
>>> sleuthing to realize that "3" here is somehow a pseudonym for
>>> unused_mark(). Back up at 1808 we have a to-do:
>>>
>>> 1808     //   use markOop::unused_mark() instead of "3".
>>>
>>> so the current change seems to be implementing that, even though other
>>> uses of "3" are left untouched.
>>
>> I'll take a look at cleaning those up also...
>>
>> In some cases markOopDesc::marked_value will work for the literal '3',
>> but in other cases we'll use markOop::unused_mark():
>>
>>    static markOop unused_mark() {
>>      return (markOop) marked_value;
>>    }
>>
>> to save us the noise of the (markOop) cast.
>>
>>
>>> ---
>>>
>>> src/share/vm/runtime/sharedRuntime.cpp
>>>
>>> 1794 JRT_BLOCK_ENTRY(void,
>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock*
>>> lock, JavaThread* thread))
>>> 1795   if (!SafepointSynchronize::is_synchronizing()) {
>>> 1796     if (ObjectSynchronizer::quick_enter(_obj, thread, lock)) 
>>> return;
>>>
>>> Is it necessary to check is_synchronizing? If we are executing this
>>> code we are not at a safepoint and the quick_enter wont change that,
>>> so I'm not sure what we are guarding against.
>>
>> So this first state checker:
>>
>> src/share/vm/runtime/safepoint.hpp:
>> inline static bool is_synchronizing()  { return _state ==
>> _synchronizing;  }
>>
>> means that we want to go to a safepoint and:
>>
>> inline static bool is_at_safepoint()   { return _state == 
>> _synchronized;  }
>>
>> means that we are at a safepoint. Dice's optimization bails out if
>> we want to go to a safepoint and ObjectSynchronizer::quick_enter()
>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that
>> code to be quick (and not go to a safepoint). I'm not seeing
>> anything obvious....
>
> So it occurred to me that this is just an optimization not a true 
> guard - as the safepoint could be initiated just after we do the 
> check. So it's basically trying to ensure that if a safepoint has been 
> requested then we don't unduly delay it by taking the non-safepointing 
> quick_enter path.

Sounds reasonable to me.

Dan

>
> Cheers,
> David
>
>> Sometimes we have to be careful with JavaThread suspend requests and
>> monitor acquisition, but I don't think that's a problem here... In
>> order for the "suspend requesting" thread to be surprised, the suspend
>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then
>> the suspend target has do something unexpected like acquire a monitor
>> that it was previously blocked upon when it was suspended. We've had
>> bugs like that in the past... In this optimization case, our target
>> thread is not blocked on a contended monitor...
>>
>> In this particular case, the "suspend requesting" thread will set the
>> suspend request state on the target thread, but the target thread is
>> busy trying to enter this uncontended monitor (quickly). So the
>> "suspend requesting" thread, will request a no-op safepoint, but it
>> won't return from the suspend API until that safepoint completes.
>> The safepoint won't complete until the target thread is done acquiring
>> the previously uncontended monitor... so the target thread will be
>> suspended while holding the previous uncontended monitor and the
>> "suspend requesting" thread will return from the suspend API all
>> happy...
>>
>> Well, I don't see the reason either so I'll have to ping Dave Dice
>> and Karen Kinnear to see if either of them can fill in the history
>> here. This could be an abundance of caution case.
>>
>>
>>> ---
>>>
>>> src/share/vm/runtime/synchronizer.cpp
>>>
>>> Minor nit: line 153 the usual acronym is NPE (for
>>> NullPointerException) not NPX
>>
>> I'll do a search for uses of NPX and other uses of 'X' in exception
>> acronyms...
>>
>>
>>>
>>> Nit:  159     Thread * const ox
>>>
>>> Please change ox to owner.
>>
>> Will do.
>>
>> Thanks again for the review!
>>
>> Dan
>>
>>
>>>
>>> ---
>>>
>>> Thanks,
>>> David
>>>
>>>
>>>
>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I have the Contended Locking fast enter bucket ready for review.
>>>>
>>>> The code changes in this bucket are primarily a quick_enter()
>>>> function that works on inflated but uncontended Java monitors.
>>>> This quick_enter() function is used on the "slow path" for Java
>>>> Monitor enter operations when the built-in "fast path" (read
>>>> assembly code) doesn't work.
>>>>
>>>> This work is being tracked by the following bug ID:
>>>>
>>>>      JDK-8061553 Contended Locking fast enter bucket
>>>>      https://bugs.openjdk.java.net/browse/JDK-8061553
>>>>
>>>> Here is the webrev URL:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/
>>>>
>>>> Here is the JEP link:
>>>>
>>>>      https://bugs.openjdk.java.net/browse/JDK-8046133
>>>>
>>>> 8061553 summary of changes:
>>>>
>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object()
>>>>
>>>> - clean up spacing around some
>>>>    'ObjectMonitor::owner_offset_in_bytes() - 2' uses
>>>> - remove optional (EmitSync & 64) code
>>>> - change from cmp() to andcc() so icc.zf flag is set
>>>>
>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock()
>>>>
>>>> - remove optional (EmitSync & 2) code
>>>> - rewrite LP64 inflated lock code that tries to CAS in
>>>>    the new owner value to be more efficient
>>>>
>>>> interfaceSupport.hpp:
>>>>
>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a
>>>>    JRT_BLOCK_ENTRY into two pieces.
>>>>
>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C()
>>>>
>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY
>>>>    to permit ObjectSynchronizer::quick_enter() call
>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work
>>>>    to revert to JRT_ENTRY_NO_ASYNC-like semantics
>>>>
>>>> synchronizer.[ch]pp:
>>>>
>>>> - add ObjectSynchronizer::quick_enter() for entering an
>>>>    inflated but unowned Java monitor without thread state
>>>>    changes
>>>>
>>>> Testing:
>>>>
>>>> - Aurora Adhoc RT/SVC baseline batch
>>>> - JPRT test jobs
>>>> - MonitorEnterStresser micro-benchmark (in process)
>>>> - CallTimerGrid stress testing (in process)
>>>> - Aurora performance testing:
>>>>    - out of the box for the "promotion" and 32-bit server configs
>>>>    - heavy weight monitors for the "promotion" and 32-bit server 
>>>> configs
>>>>      (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
>>>>      (in process)
>>>>
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>