RFR (XL) 8031320: Use Intel RTM instructions for locks

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Mar 20 05:09:47 UTC 2014


On 3/19/14 8:48 PM, Igor Veresov wrote:
> Thanks for renaming the registers! But I noticed some inconsistencies in the comments that are the result of that..
>
>
> 1334 // Perform abort ratio calculation, set no_rtm bit if high ratio
> 1335 // input:  rtm_counters_Reg (RTMLockingCounters* address)
> 1336 // tmpReg, scrReg and flags as scratch
> 1337 void MacroAssembler::rtm_abort_ratio_calculation(Register tmpReg,
> 1338                                                  Register rtm_counters_Reg,
> 1339                                                  RTMLockingCounters* rtm_counters,
> 1340                                                  Metadata* method_data) {
>
> Should probably say that rtm_counters_Reg is killed, there’s no scrReg in params anymore.

Done. And I replaced all "as scratch" with "are killed".

>
> 1392 // Update counters and perform abort ratio calculation
> 1393 // input:  boxReg (object monitor address)
> 1394 //         abort_status_Reg
> 1395 // rtm_counters_Reg, flags as scratch
> 1396 void MacroAssembler::rtm_profiling(Register abort_status_Reg,
> 1397                                    Register rtm_counters_Reg,
> 1398                                    RTMLockingCounters* rtm_counters,
> 1399                                    Metadata* method_data,
> 1400                                    bool profile_rtm) {
>
> There doesn’t seen to be a boxReg here.

fixed.

>
> 1412     // Perform abort ratio calculation, set dontelide bit and rtm_state
> 1413     // input:  boxReg (object monitor address)
> 1414     //      :  rtm_counters_Reg
> 1415     // tmpReg, scrReg, flags as scratch
> 1416     assert(rtm_counters != NULL, "should not be NULL when profiling RTM");
> 1417     rtm_abort_ratio_calculation(abort_status_Reg, rtm_counters_Reg, rtm_counters, method_data);
>
>
> Mentions boxReg that is no longer there.

I removed this whole comment because rtm_abort_ratio_calculation() has it.

>
>
> 1426 // Retry on abort if abort's status is 0x6: can retry (0x2) | memory conflict (0x4)
> 1427 // inputs: boxReg (monitor address)
> 1428 //       : retry_count
> 1429 //       : abort_status
> 1430 // output: retry_count decremented by 1
> 1431 // flags as scratch
> 1432 void MacroAssembler::rtm_retry_lock_on_abort(Register retry_count, Register box, Register abort_status, Label& retryLabel) {
>
> May be add the .*_Reg or .*Reg suffix to these guys and update the comments?
>
> 1448 // Spin and retry if lock is busy,
> 1449 // inputs: box (monitor address)
> 1450 //       : retry_count
> 1451 // output: retry_count decremented by 1
> 1452 //       : clear z flag if retry count exceeded
> 1453 // scr as scratch
> 1454 void MacroAssembler::rtm_retry_lock_on_busy(Register retry_count, Register box, Register tmp, Register scr, Label& retryLabel) {
>
> .*_Reg and/or .*Reg?

I added _Reg in these 2 methods

>
> 1542 // Use RTM for inflating locks
> 1543 // Inputes: objReg (object to lock)
> 1544 //          boxReg (on-stack box address (displaced header location) - KILLED)
> 1545 //          tmpReg (ObjectMonitor address + 2(monitor_value))
> 1546 void MacroAssembler::rtm_inflated_locking(Register objReg, Register boxReg, Register tmpReg,
> 1547                                           Register scrReg, Register retry_on_busy_count_Reg,
> 1548                                           Register retry_on_abort_count_Reg,
> 1549                                           RTMLockingCounters* rtm_counters,
> 1550                                           Metadata* method_data, bool profile_rtm,
> 1551                                           Label& DONE_LABEL) {
>
> Typo in “Inputes”.

Fixed.

>
>
> 1596     // retry on lock abort if abort status is one of 0xD
> 1597     // inputs: boxReg (monitor address)
> 1598     //       : retry_on_abort_count_Reg
> 1599     //       : abort_status_Reg
> 1600     // output: tmpReg set to boxReg, cx2Reg decremented by 1
> 1601     rtm_retry_lock_on_abort(retry_on_abort_count_Reg, boxReg, abort_status_Reg, L_rtm_retry);
>
> The output section of the comment mentions regs that are not passed.

Replaced with one line comment:

     // retry on lock abort if abort status is 'can retry' (0x2) or 
'memory conflict' (0x4)

Thanks,
Vladimir

>
>
> igor
>
> On Mar 19, 2014, at 7:47 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> I updated changes based reviews.
>>
>> http://cr.openjdk.java.net/~kvn/8031320_9/webrev.01/
>>
>> Main changes are in macroAssembler_x86.cpp. I moved RTM code from fast_lock() method into separate methods:  rtm_stack_locking(), rtm_inflated_locking() and for common rtm code rtm_profiling().
>> I did registers renaming in local scopes to reflect what values they contain.
>> I removed some asm instructions which results are not used (experimental code leftover).
>>
>> 3 flags were converted to product flags: UseRTMLocking, UseRTMDeopt, RTMRetryCount.
>>
>> In phase1.cpp used TypeMetadataPtr for MDO pointer instead of RawPtr. Hit bug in TypeMetadataPtr::xmeet() and fixed it.
>>
>> Thanks,
>> Vladimir
>>
>> On 3/17/14 12:11 PM, Vladimir Kozlov wrote:
>>> https://bugs.openjdk.java.net/browse/JDK-8031320
>>> http://cr.openjdk.java.net/~kvn/8031320_9/webrev/
>>>
>>> The Intel architectures codenamed Haswell has support for RTM
>>> (Restricted Transactional Memory) instructions xbegin, xabort, xend and
>>> xtest as part of Intel Transactional Synchronization Extension (TSX).
>>> The xbegin and xend instructions enclose a set of instructions to be
>>> executed as a transaction. If no conflict found during execution of the
>>> transaction, the memory and register modifications are committed
>>> together at xend. xabort instruction can be used for explicit abort of
>>> transaction and xtest to check if we are in transaction.
>>>
>>> RTM is useful for highly contended locks with low conflict in the
>>> critical region. The highly contended locks don't scale well otherwise
>>> but with RTM they show good scaling. RTM allows using coarse grain
>>> locking for applications. Also for lightly contended locks which are
>>> used by different threads RTM can reduce cache line ping pong and
>>> thereby show performance improvement too.
>>>
>>> Implementation:
>>>
>>> Generate RTM locking code for all inflated locks when "UseRTMLocking"
>>> option is on with normal locking as fall back mechanism. On abort or
>>> lock busy the lock will be retried a fixed number of times as specified
>>> by "RTMRetryCount" option.  The locks which abort too often can be auto
>>> tuned or manually tuned.
>>>
>>> Auto-tuning can be done using "UseRTMDeopt" flag which will add an abort
>>> ratio calculation code for each lock. The abort ratio will be calculated
>>> after "RTMAbortThreshold" aborts are encountered.
>>> With "UseRTMDeopt" if the aborts ratio reaches "RTMAbortRatio" the
>>> nmethod containing the lock will be deoptimized and recompiled with all
>>> locks as normal (stack) locks. If the abort ratio continues to remain
>>> low after "RTMLockingThreshold" attempted locks, then the method will be
>>> deoptimized and recompiled with all locks as RTM locks without abort
>>> ratio calculation code. The abort ratio calculation can be delayed by
>>> specifying -XX:RTMLockingCalculationDelay=<millisec> flag.
>>> Deoptimization of nmethod is done by adding an uncommon trap at the
>>> beginning of the code which checks rtm state field in MDO which is
>>> modified by the abort calculation code.
>>>
>>> For manual tuning the abort statistics for each lock could be provided
>>> to a user using "PrintPreciseRTMLockingStatistics" diagnostic flag.
>>> Based on the abort statistics users can create a .hotspot_compiler file
>>> or use -XX:CompileCommand=<option> flag to specify for which methods
>>> disable RTM locking using <option> "NoRTMLockEliding" or always enable
>>> RTM locking using <option> "UseRTMLockEliding".
>>>
>>> The abort calculation and statistic collection are done using
>>> RTMLockingCounters wrapped into RTMLockingNamedCounter counters which
>>> are generated for each lock. To reduce burden on cache line RTM lock
>>> total counter is updated randomly with RTMTotalCountIncrRate rate.
>>>
>>> Note, both auto and manually tuning is done for whole method. There is
>>> no a mechanism to tune an individual lock.
>>>
>>> RTM locking can be used for normal (stack) locks by specifying
>>> "UseRTMForStackLocks" flag.
>>>
>>> RTM locking code requires that biased locking is switched off because it
>>> conflicts with it. RTM locking is most useful when there is high lock
>>> contention and low data contention.  With high lock contention the lock
>>> is usually inflated and biased locking is not suitable for that case
>>> anyway.
>>>
>>> It was requested that this code did not affect other platforms. For that
>>> the most of the code is put under #if INCLUDE_RTM_OPT which is defined
>>> only for X86 and C2 and not EMBEDDED.
>>>
>>> All new RTM flags are declared as experimental and require to specify
>>> "UnlockExperimentalVMOptions" flag.
>>>
>>>
>>> SQE did full testing on these changes. Additional tests were developed.
>>>
>>> Thanks,
>>> Vladimir
>


More information about the hotspot-dev mailing list