RFR(S) Contended Locking fast enter bucket (8061553)

Karen Kinnear karen.kinnear at oracle.com
Tue Nov 11 19:35:45 UTC 2014


Dan,

Code looks good.  I like your choices of changes to pick up.

Couple of minor questions/comments:

1. synchronizer.cpp: What does TLE stand for?
2. in macrosAssembler_x86.cpp - mind keeping the comment about // Without cat to int32_t a movptr will destroy R10 which is typically obj 

thanks,
Karen

p.s. I've forgotten - is the fast_notify in a different bucket?

On Nov 11, 2014, at 8:55 AM, Daniel D. Daugherty wrote:

> On 11/11/14 1:02 AM, David Holmes wrote:
>> On 7/11/2014 12:17 PM, Daniel D. Daugherty wrote:
>>> The fix for JDK-8062851 has been reviewed, tested and pushed to
>>> RT_Baseline.
>>> 
>>> Time to get back to this review thread so here's an updated webrev:
>>> 
>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/1-jdk9-hs-rt/
>>> 
>>> David H., I believe I've addressed all of your comments. Please
>>> let me know if I missed something...
>> 
>> Looks good to me - thanks Dan!
> 
> Thanks for the re-review!
> 
> Dan
> 
> 
>> 
>> David
>> -----
>> 
>>> Thanks, in advance, for any comments, questions or suggestions.
>>> 
>>> Dan
>>> 
>>> 
>>> On 11/4/14 11:26 AM, Daniel D. Daugherty wrote:
>>>> The cleanup is turning into a bigger change than the fast enter
>>>> bucket itself so I'm spinning the cleanup into a new bug:
>>>> 
>>>>    JDK-8062851 cleanup ObjectMonitor offset adjustments
>>>> https://bugs.openjdk.java.net/browse/JDK-8062851
>>>> 
>>>> Yes, this means that the Contended Locking cleanup bucket has reopened
>>>> for yet another change...
>>>> 
>>>> We'll get back to "fast enter" after the dust has settled...
>>>> 
>>>> Dan
>>>> 
>>>> 
>>>> On 11/3/14 6:59 PM, Daniel D. Daugherty wrote:
>>>>> David,
>>>>> 
>>>>> Thanks for the review! As usual, replies are embedded below...
>>>>> 
>>>>> 
>>>>> On 11/2/14 9:44 PM, David Holmes wrote:
>>>>>> Hi Dan,
>>>>>> 
>>>>>> Looks good.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> 
>>>>>> Couple of nits and one semantic query below ...
>>>>>> 
>>>>>> src/cpu/sparc/vm/macroAssembler_sparc.cpp
>>>>>> 
>>>>>> Formatting changes were a bit of a distraction.
>>>>> 
>>>>> Yes, I have no idea what got into me. Normally I do formatting
>>>>> changes separately so the noise does not distract...
>>>>> 
>>>>> It turns out there is a constant defined that should be used
>>>>> instead of all these literal '2's:
>>>>> 
>>>>> src/share/vm/oops/markOop.hpp:         monitor_value = 2
>>>>> 
>>>>> Typically used as follows:
>>>>> 
>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp:  int owner_offset =
>>>>> ObjectMonitor::owner_offset_in_bytes() - markOopDesc::monitor_value;
>>>>> 
>>>>> I will clean this up just for the files that I'm touching as
>>>>> part of this fix.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>>>> 
>>>>>> Formatting changes were a bit of a distraction.
>>>>> 
>>>>> Same reply as for macroAssembler_sparc.cpp.
>>>>> 
>>>>> 
>>>>>> 1929     // unconditionally set stackBox->_displaced_header = 3
>>>>>> 1930     movptr(Address(boxReg, 0),
>>>>>> (int32_t)intptr_t(markOopDesc::unused_mark()));
>>>>>> 
>>>>>> At 1870 we refer to box rather than stackBox. Also it takes some
>>>>>> sleuthing to realize that "3" here is somehow a pseudonym for
>>>>>> unused_mark(). Back up at 1808 we have a to-do:
>>>>>> 
>>>>>> 1808     //   use markOop::unused_mark() instead of "3".
>>>>>> 
>>>>>> so the current change seems to be implementing that, even though
>>>>>> other uses of "3" are left untouched.
>>>>> 
>>>>> I'll take a look at cleaning those up also...
>>>>> 
>>>>> In some cases markOopDesc::marked_value will work for the literal '3',
>>>>> but in other cases we'll use markOop::unused_mark():
>>>>> 
>>>>>  static markOop unused_mark() {
>>>>>    return (markOop) marked_value;
>>>>>  }
>>>>> 
>>>>> to save us the noise of the (markOop) cast.
>>>>> 
>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> src/share/vm/runtime/sharedRuntime.cpp
>>>>>> 
>>>>>> 1794 JRT_BLOCK_ENTRY(void,
>>>>>> SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock*
>>>>>> lock, JavaThread* thread))
>>>>>> 1795   if (!SafepointSynchronize::is_synchronizing()) {
>>>>>> 1796     if (ObjectSynchronizer::quick_enter(_obj, thread, lock))
>>>>>> return;
>>>>>> 
>>>>>> Is it necessary to check is_synchronizing? If we are executing this
>>>>>> code we are not at a safepoint and the quick_enter wont change that,
>>>>>> so I'm not sure what we are guarding against.
>>>>> 
>>>>> So this first state checker:
>>>>> 
>>>>> src/share/vm/runtime/safepoint.hpp:
>>>>> inline static bool is_synchronizing()  { return _state ==
>>>>> _synchronizing;  }
>>>>> 
>>>>> means that we want to go to a safepoint and:
>>>>> 
>>>>> inline static bool is_at_safepoint()   { return _state ==
>>>>> _synchronized;  }
>>>>> 
>>>>> means that we are at a safepoint. Dice's optimization bails out if
>>>>> we want to go to a safepoint and ObjectSynchronizer::quick_enter()
>>>>> has a "No_Safepoint_Verifier nsv" in it so we're expecting that
>>>>> code to be quick (and not go to a safepoint). I'm not seeing
>>>>> anything obvious....
>>>>> 
>>>>> Sometimes we have to be careful with JavaThread suspend requests and
>>>>> monitor acquisition, but I don't think that's a problem here... In
>>>>> order for the "suspend requesting" thread to be surprised, the suspend
>>>>> API, e.g., JVM/TI SuspendThread() has to return to the caller and then
>>>>> the suspend target has do something unexpected like acquire a monitor
>>>>> that it was previously blocked upon when it was suspended. We've had
>>>>> bugs like that in the past... In this optimization case, our target
>>>>> thread is not blocked on a contended monitor...
>>>>> 
>>>>> In this particular case, the "suspend requesting" thread will set the
>>>>> suspend request state on the target thread, but the target thread is
>>>>> busy trying to enter this uncontended monitor (quickly). So the
>>>>> "suspend requesting" thread, will request a no-op safepoint, but it
>>>>> won't return from the suspend API until that safepoint completes.
>>>>> The safepoint won't complete until the target thread is done acquiring
>>>>> the previously uncontended monitor... so the target thread will be
>>>>> suspended while holding the previous uncontended monitor and the
>>>>> "suspend requesting" thread will return from the suspend API all
>>>>> happy...
>>>>> 
>>>>> Well, I don't see the reason either so I'll have to ping Dave Dice
>>>>> and Karen Kinnear to see if either of them can fill in the history
>>>>> here. This could be an abundance of caution case.
>>>>> 
>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> src/share/vm/runtime/synchronizer.cpp
>>>>>> 
>>>>>> Minor nit: line 153 the usual acronym is NPE (for
>>>>>> NullPointerException) not NPX
>>>>> 
>>>>> I'll do a search for uses of NPX and other uses of 'X' in exception
>>>>> acronyms...
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Nit:  159     Thread * const ox
>>>>>> 
>>>>>> Please change ox to owner.
>>>>> 
>>>>> Will do.
>>>>> 
>>>>> Thanks again for the review!
>>>>> 
>>>>> Dan
>>>>> 
>>>>> 
>>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> Thanks,
>>>>>> David
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 31/10/2014 8:50 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>> 
>>>>>>> I have the Contended Locking fast enter bucket ready for review.
>>>>>>> 
>>>>>>> The code changes in this bucket are primarily a quick_enter()
>>>>>>> function that works on inflated but uncontended Java monitors.
>>>>>>> This quick_enter() function is used on the "slow path" for Java
>>>>>>> Monitor enter operations when the built-in "fast path" (read
>>>>>>> assembly code) doesn't work.
>>>>>>> 
>>>>>>> This work is being tracked by the following bug ID:
>>>>>>> 
>>>>>>>     JDK-8061553 Contended Locking fast enter bucket
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8061553
>>>>>>> 
>>>>>>> Here is the webrev URL:
>>>>>>> 
>>>>>>> http://cr.openjdk.java.net/~dcubed/8061553-webrev/0-jdk9-hs-rt/
>>>>>>> 
>>>>>>> Here is the JEP link:
>>>>>>> 
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8046133
>>>>>>> 
>>>>>>> 8061553 summary of changes:
>>>>>>> 
>>>>>>> macroAssembler_sparc.cpp: MacroAssembler::compiler_lock_object()
>>>>>>> 
>>>>>>> - clean up spacing around some
>>>>>>>   'ObjectMonitor::owner_offset_in_bytes() - 2' uses
>>>>>>> - remove optional (EmitSync & 64) code
>>>>>>> - change from cmp() to andcc() so icc.zf flag is set
>>>>>>> 
>>>>>>> macroAssembler_x86.cpp: MacroAssembler::fast_lock()
>>>>>>> 
>>>>>>> - remove optional (EmitSync & 2) code
>>>>>>> - rewrite LP64 inflated lock code that tries to CAS in
>>>>>>>   the new owner value to be more efficient
>>>>>>> 
>>>>>>> interfaceSupport.hpp:
>>>>>>> 
>>>>>>> - add JRT_BLOCK_NO_ASYNC to permit splitting a
>>>>>>>   JRT_BLOCK_ENTRY into two pieces.
>>>>>>> 
>>>>>>> sharedRuntime.cpp: SharedRuntime::complete_monitor_locking_C()
>>>>>>> 
>>>>>>> - change entry type from JRT_ENTRY_NO_ASYNC to JRT_BLOCK_ENTRY
>>>>>>>   to permit ObjectSynchronizer::quick_enter() call
>>>>>>> - add JRT_BLOCK_NO_ASYNC use if the quick_enter() doesn't work
>>>>>>>   to revert to JRT_ENTRY_NO_ASYNC-like semantics
>>>>>>> 
>>>>>>> synchronizer.[ch]pp:
>>>>>>> 
>>>>>>> - add ObjectSynchronizer::quick_enter() for entering an
>>>>>>>   inflated but unowned Java monitor without thread state
>>>>>>>   changes
>>>>>>> 
>>>>>>> Testing:
>>>>>>> 
>>>>>>> - Aurora Adhoc RT/SVC baseline batch
>>>>>>> - JPRT test jobs
>>>>>>> - MonitorEnterStresser micro-benchmark (in process)
>>>>>>> - CallTimerGrid stress testing (in process)
>>>>>>> - Aurora performance testing:
>>>>>>>   - out of the box for the "promotion" and 32-bit server configs
>>>>>>>   - heavy weight monitors for the "promotion" and 32-bit server
>>>>>>> configs
>>>>>>>     (-XX:-UseBiasedLocking -XX:+UseHeavyMonitors)
>>>>>>>     (in process)
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>> 
>>>>>>> Dan
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
> 



More information about the hotspot-runtime-dev mailing list