RFR(S) Contended Locking fast enter bucket (8061553)

Tue Nov 11 21:23:06 UTC 2014

Thanks for the review!

As usual, replies embedded below...

On 11/11/14 12:35 PM, Karen Kinnear wrote:
> Dan,
>
> Code looks good.

Thanks! However, it is yours and Dice's code with a few tweaks
from my brain... This bucket will also have a triple contributed
by entry...

>    I like your choices of changes to pick up.

Thanks! This bucket was fairly easy to sift/tease out...

> Couple of minor questions/comments:
>
> 1. synchronizer.cpp: What does TLE stand for?

Transactional Lock Elision is my guess. If Dice confirms, then I'll
make sure the first use has it spelled out... Dave likes his TLAs!

> 2. in macrosAssembler_x86.cpp - mind keeping the comment about // Without cat to int32_t a movptr will destroy R10 which is typically obj

Yes, I kept looking at that and wondering why the comments was removed...
I'll put it back...

> thanks,
> Karen
>
> p.s. I've forgotten - is the fast_notify in a different bucket?

fast_enter is optimization #3, bucket #7
fast_exit is optimization #4, bucket #8
fast_notify is optimization #5, bucket #2

Dan

>
> On Nov 11, 2014, at 8:55 AM, Daniel D. Daugherty wrote:
>
>> On 11/11/14 1:02 AM, David Holmes wrote:
>>> On 7/11/2014 12:17 PM, Daniel D. Daugherty wrote:
>>>> The fix for JDK-8062851 has been reviewed, tested and pushed to
>>>> RT_Baseline.
>>>>
>>>> Time to get back to this review thread so here's an updated webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/
>>>>
>>>> David H., I believe I've addressed all of your comments. Please
>>>> let me know if I missed something...
>>> Looks good to me - thanks Dan!
>> Thanks for the re-review!
>>
>> Dan
>>
>>
>>> David
>>> -----
>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 11/4/14 11:26 AM, Daniel D. Daugherty wrote:
>>>>> The cleanup is turning into a bigger change than the fast enter
>>>>> bucket itself so I'm spinning the cleanup into a new bug:
>>>>>
>>>>>     JDK-8062851 cleanup ObjectMonitor offset adjustments
>>>>> https://bugs.openjdk.java.net/browse/JDK-8062851
>>>>>
>>>>> Yes, this means that the Contended Locking cleanup bucket has reopened
>>>>> for yet another change...
>>>>>
>>>>> We'll get back to "fast enter" after the dust has settled...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 11/3/14 6:59 PM, Daniel D. Daugherty wrote:
>>>>>> David,
>>>>>>
>>>>>> Thanks for the review! As usual, replies are embedded below...
>>>>>>
>>>>>>
>>>>>> On 11/2/14 9:44 PM, David Holmes wrote:
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> Looks good.
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>> Couple of nits and one semantic query below ...
>>>>>>>
>>>>>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp
>>>>>>>
>>>>>>> Formatting changes were a bit of a distraction.
>>>>>> Yes, I have no idea what got into me. Normally I do formatting
>>>>>> changes separately so the noise does not distract...
>>>>>>
>>>>>> It turns out there is a constant defined that should be used
>>>>>> instead of all these literal '2's:
>>>>>>
>>>>>> src/share/vm/oops/markOop.hpp:         monitor_value = 2
>>>>>>
>>>>>> Typically used as follows:
>>>>>>
>>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp:  int owner_offset =
>>>>>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value;
>>>>>>
>>>>>> I will clean this up just for the files that I'm touching as
>>>>>> part of this fix.
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>>>>>
>>>>>>> Formatting changes were a bit of a distraction.
>>>>>> Same reply as for macroAssembler_sparc.cpp.
>>>>>>
>>>>>>
>>>>>>> 1929     // unconditionally set stackBox->_displaced_header = 3
>>>>>>> 1930     movptr(Address(boxReg, 0),
>>>>>>> (int32_t)intptr_t(markOopDesc::unused_mark()));
>>>>>>>
>>>>>>> At 1870 we refer to box rather than stackBox. Also it takes some
>>>>>>> sleuthing to realize that "3" here is somehow a pseudonym for
>>>>>>> unused_mark(). Back up at 1808 we have a to-do:
>>>>>>>
>>>>>>> 1808     //   use markOop::unused_mark() instead of "3".
>>>>>>>
>>>>>>> so the current change seems to be implementing that, even though
>>>>>>> other uses of "3" are left untouched.
>>>>>> I'll take a look at cleaning those up also...
>>>>>>
>>>>>> In some cases markOopDesc::marked_value will work for the literal '3',
>>>>>> but in other cases we'll use markOop::unused_mark():
>>>>>>
>>>>>>   static markOop unused_mark() {
>>>>>>     return (markOop) marked_value;
>>>>>>   }
>>>>>>
>>>>>> to save us the noise of the (markOop) cast.
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> src/share/vm/runtime/sharedRuntime.cpp
>>>>>>>
>>>>>>> 1794 JRT_BLOCK_ENTRY(void,
>>>>>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock*
>>>>>>> lock, JavaThread* thread))
>>>>>>> 1795   if (!SafepointSynchronize::is_synchronizing()) {
>>>>>>> 1796     if (ObjectSynchronizer::quick_enter(_obj, thread, lock))
>>>>>>> return;
>>>>>>>
>>>>>>> Is it necessary to check is_synchronizing? If we are executing this
>>>>>>> code we are not at a safepoint and the quick_enter wont change that,
>>>>>>> so I'm not sure what we are guarding against.
>>>>>> So this first state checker:
>>>>>>
>>>>>> src/share/vm/runtime/safepoint.hpp:
>>>>>> inline static bool is_synchronizing()  { return _state ==
>>>>>> _synchronizing;  }
>>>>>>
>>>>>> means that we want to go to a safepoint and:
>>>>>>
>>>>>> inline static bool is_at_safepoint()   { return _state ==
>>>>>> _synchronized;  }
>>>>>>
>>>>>> means that we are at a safepoint. Dice's optimization bails out if
>>>>>> we want to go to a safepoint and ObjectSynchronizer::quick_enter()
>>>>>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that
>>>>>> code to be quick (and not go to a safepoint). I'm not seeing
>>>>>> anything obvious....
>>>>>>
>>>>>> Sometimes we have to be careful with JavaThread suspend requests and
>>>>>> monitor acquisition, but I don't think that's a problem here... In
>>>>>> order for the "suspend requesting" thread to be surprised, the suspend
>>>>>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then
>>>>>> the suspend target has do something unexpected like acquire a monitor
>>>>>> that it was previously blocked upon when it was suspended. We've had
>>>>>> bugs like that in the past... In this optimization case, our target
>>>>>> thread is not blocked on a contended monitor...
>>>>>>
>>>>>> In this particular case, the "suspend requesting" thread will set the
>>>>>> suspend request state on the target thread, but the target thread is
>>>>>> busy trying to enter this uncontended monitor (quickly). So the
>>>>>> "suspend requesting" thread, will request a no-op safepoint, but it
>>>>>> won't return from the suspend API until that safepoint completes.
>>>>>> The safepoint won't complete until the target thread is done acquiring
>>>>>> the previously uncontended monitor... so the target thread will be
>>>>>> suspended while holding the previous uncontended monitor and the
>>>>>> "suspend requesting" thread will return from the suspend API all
>>>>>> happy...
>>>>>>
>>>>>> Well, I don't see the reason either so I'll have to ping Dave Dice
>>>>>> and Karen Kinnear to see if either of them can fill in the history
>>>>>> here. This could be an abundance of caution case.
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> src/share/vm/runtime/synchronizer.cpp
>>>>>>>
>>>>>>> Minor nit: line 153 the usual acronym is NPE (for
>>>>>>> NullPointerException) not NPX
>>>>>> I'll do a search for uses of NPX and other uses of 'X' in exception
>>>>>> acronyms...
>>>>>>
>>>>>>
>>>>>>> Nit:  159     Thread * const ox
>>>>>>>
>>>>>>> Please change ox to owner.
>>>>>> Will do.
>>>>>>
>>>>>> Thanks again for the review!
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have the Contended Locking fast enter bucket ready for review.
>>>>>>>>
>>>>>>>> The code changes in this bucket are primarily a quick_enter()
>>>>>>>> function that works on inflated but uncontended Java monitors.
>>>>>>>> This quick_enter() function is used on the "slow path" for Java
>>>>>>>> Monitor enter operations when the built-in "fast path" (read
>>>>>>>> assembly code) doesn't work.
>>>>>>>>
>>>>>>>> This work is being tracked by the following bug ID:
>>>>>>>>
>>>>>>>>      JDK-8061553 Contended Locking fast enter bucket
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8061553
>>>>>>>>
>>>>>>>> Here is the webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/
>>>>>>>>
>>>>>>>> Here is the JEP link:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133
>>>>>>>>
>>>>>>>> 8061553 summary of changes:
>>>>>>>>
>>>>>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object()
>>>>>>>>
>>>>>>>> - clean up spacing around some
>>>>>>>>    'ObjectMonitor::owner_offset_in_bytes() - 2' uses
>>>>>>>> - remove optional (EmitSync & 64) code
>>>>>>>> - change from cmp() to andcc() so icc.zf flag is set
>>>>>>>>
>>>>>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock()
>>>>>>>>
>>>>>>>> - remove optional (EmitSync & 2) code
>>>>>>>> - rewrite LP64 inflated lock code that tries to CAS in
>>>>>>>>    the new owner value to be more efficient
>>>>>>>>
>>>>>>>> interfaceSupport.hpp:
>>>>>>>>
>>>>>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a
>>>>>>>>    JRT_BLOCK_ENTRY into two pieces.
>>>>>>>>
>>>>>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C()
>>>>>>>>
>>>>>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY
>>>>>>>>    to permit ObjectSynchronizer::quick_enter() call
>>>>>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work
>>>>>>>>    to revert to JRT_ENTRY_NO_ASYNC-like semantics
>>>>>>>>
>>>>>>>> synchronizer.[ch]pp:
>>>>>>>>
>>>>>>>> - add ObjectSynchronizer::quick_enter() for entering an
>>>>>>>>    inflated but unowned Java monitor without thread state
>>>>>>>>    changes
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>>
>>>>>>>> - Aurora Adhoc RT/SVC baseline batch
>>>>>>>> - JPRT test jobs
>>>>>>>> - MonitorEnterStresser micro-benchmark (in process)
>>>>>>>> - CallTimerGrid stress testing (in process)
>>>>>>>> - Aurora performance testing:
>>>>>>>>    - out of the box for the "promotion" and 32-bit server configs
>>>>>>>>    - heavy weight monitors for the "promotion" and 32-bit server
>>>>>>>> configs
>>>>>>>>      (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
>>>>>>>>      (in process)
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>>
>>>>>>>> Dan
>>>>>>
>>>>>
>>>>>