RFR (XL) 8031320: Use Intel RTM instructions for locks
Igor Veresov
igor.veresov at oracle.com
Thu Mar 20 03:48:17 UTC 2014
Thanks for renaming the registers! But I noticed some inconsistencies in the comments that are the result of that..
1334 // Perform abort ratio calculation, set no_rtm bit if high ratio
1335 // input: rtm_counters_Reg (RTMLockingCounters* address)
1336 // tmpReg, scrReg and flags as scratch
1337 void MacroAssembler::rtm_abort_ratio_calculation(Register tmpReg,
1338 Register rtm_counters_Reg,
1339 RTMLockingCounters* rtm_counters,
1340 Metadata* method_data) {
Should probably say that rtm_counters_Reg is killed, there’s no scrReg in params anymore.
1392 // Update counters and perform abort ratio calculation
1393 // input: boxReg (object monitor address)
1394 // abort_status_Reg
1395 // rtm_counters_Reg, flags as scratch
1396 void MacroAssembler::rtm_profiling(Register abort_status_Reg,
1397 Register rtm_counters_Reg,
1398 RTMLockingCounters* rtm_counters,
1399 Metadata* method_data,
1400 bool profile_rtm) {
There doesn’t seen to be a boxReg here.
1412 // Perform abort ratio calculation, set dontelide bit and rtm_state
1413 // input: boxReg (object monitor address)
1414 // : rtm_counters_Reg
1415 // tmpReg, scrReg, flags as scratch
1416 assert(rtm_counters != NULL, "should not be NULL when profiling RTM");
1417 rtm_abort_ratio_calculation(abort_status_Reg, rtm_counters_Reg, rtm_counters, method_data);
Mentions boxReg that is no longer there.
1426 // Retry on abort if abort's status is 0x6: can retry (0x2) | memory conflict (0x4)
1427 // inputs: boxReg (monitor address)
1428 // : retry_count
1429 // : abort_status
1430 // output: retry_count decremented by 1
1431 // flags as scratch
1432 void MacroAssembler::rtm_retry_lock_on_abort(Register retry_count, Register box, Register abort_status, Label& retryLabel) {
May be add the .*_Reg or .*Reg suffix to these guys and update the comments?
1448 // Spin and retry if lock is busy,
1449 // inputs: box (monitor address)
1450 // : retry_count
1451 // output: retry_count decremented by 1
1452 // : clear z flag if retry count exceeded
1453 // scr as scratch
1454 void MacroAssembler::rtm_retry_lock_on_busy(Register retry_count, Register box, Register tmp, Register scr, Label& retryLabel) {
.*_Reg and/or .*Reg?
1542 // Use RTM for inflating locks
1543 // Inputes: objReg (object to lock)
1544 // boxReg (on-stack box address (displaced header location) - KILLED)
1545 // tmpReg (ObjectMonitor address + 2(monitor_value))
1546 void MacroAssembler::rtm_inflated_locking(Register objReg, Register boxReg, Register tmpReg,
1547 Register scrReg, Register retry_on_busy_count_Reg,
1548 Register retry_on_abort_count_Reg,
1549 RTMLockingCounters* rtm_counters,
1550 Metadata* method_data, bool profile_rtm,
1551 Label& DONE_LABEL) {
Typo in “Inputes”.
1596 // retry on lock abort if abort status is one of 0xD
1597 // inputs: boxReg (monitor address)
1598 // : retry_on_abort_count_Reg
1599 // : abort_status_Reg
1600 // output: tmpReg set to boxReg, cx2Reg decremented by 1
1601 rtm_retry_lock_on_abort(retry_on_abort_count_Reg, boxReg, abort_status_Reg, L_rtm_retry);
The output section of the comment mentions regs that are not passed.
igor
On Mar 19, 2014, at 7:47 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
> I updated changes based reviews.
>
> http://cr.openjdk.java.net/~kvn/8031320_9/webrev.01/
>
> Main changes are in macroAssembler_x86.cpp. I moved RTM code from fast_lock() method into separate methods: rtm_stack_locking(), rtm_inflated_locking() and for common rtm code rtm_profiling().
> I did registers renaming in local scopes to reflect what values they contain.
> I removed some asm instructions which results are not used (experimental code leftover).
>
> 3 flags were converted to product flags: UseRTMLocking, UseRTMDeopt, RTMRetryCount.
>
> In phase1.cpp used TypeMetadataPtr for MDO pointer instead of RawPtr. Hit bug in TypeMetadataPtr::xmeet() and fixed it.
>
> Thanks,
> Vladimir
>
> On 3/17/14 12:11 PM, Vladimir Kozlov wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8031320
>> http://cr.openjdk.java.net/~kvn/8031320_9/webrev/
>>
>> The Intel architectures codenamed Haswell has support for RTM
>> (Restricted Transactional Memory) instructions xbegin, xabort, xend and
>> xtest as part of Intel Transactional Synchronization Extension (TSX).
>> The xbegin and xend instructions enclose a set of instructions to be
>> executed as a transaction. If no conflict found during execution of the
>> transaction, the memory and register modifications are committed
>> together at xend. xabort instruction can be used for explicit abort of
>> transaction and xtest to check if we are in transaction.
>>
>> RTM is useful for highly contended locks with low conflict in the
>> critical region. The highly contended locks don't scale well otherwise
>> but with RTM they show good scaling. RTM allows using coarse grain
>> locking for applications. Also for lightly contended locks which are
>> used by different threads RTM can reduce cache line ping pong and
>> thereby show performance improvement too.
>>
>> Implementation:
>>
>> Generate RTM locking code for all inflated locks when "UseRTMLocking"
>> option is on with normal locking as fall back mechanism. On abort or
>> lock busy the lock will be retried a fixed number of times as specified
>> by "RTMRetryCount" option. The locks which abort too often can be auto
>> tuned or manually tuned.
>>
>> Auto-tuning can be done using "UseRTMDeopt" flag which will add an abort
>> ratio calculation code for each lock. The abort ratio will be calculated
>> after "RTMAbortThreshold" aborts are encountered.
>> With "UseRTMDeopt" if the aborts ratio reaches "RTMAbortRatio" the
>> nmethod containing the lock will be deoptimized and recompiled with all
>> locks as normal (stack) locks. If the abort ratio continues to remain
>> low after "RTMLockingThreshold" attempted locks, then the method will be
>> deoptimized and recompiled with all locks as RTM locks without abort
>> ratio calculation code. The abort ratio calculation can be delayed by
>> specifying -XX:RTMLockingCalculationDelay=<millisec> flag.
>> Deoptimization of nmethod is done by adding an uncommon trap at the
>> beginning of the code which checks rtm state field in MDO which is
>> modified by the abort calculation code.
>>
>> For manual tuning the abort statistics for each lock could be provided
>> to a user using "PrintPreciseRTMLockingStatistics" diagnostic flag.
>> Based on the abort statistics users can create a .hotspot_compiler file
>> or use -XX:CompileCommand=<option> flag to specify for which methods
>> disable RTM locking using <option> "NoRTMLockEliding" or always enable
>> RTM locking using <option> "UseRTMLockEliding".
>>
>> The abort calculation and statistic collection are done using
>> RTMLockingCounters wrapped into RTMLockingNamedCounter counters which
>> are generated for each lock. To reduce burden on cache line RTM lock
>> total counter is updated randomly with RTMTotalCountIncrRate rate.
>>
>> Note, both auto and manually tuning is done for whole method. There is
>> no a mechanism to tune an individual lock.
>>
>> RTM locking can be used for normal (stack) locks by specifying
>> "UseRTMForStackLocks" flag.
>>
>> RTM locking code requires that biased locking is switched off because it
>> conflicts with it. RTM locking is most useful when there is high lock
>> contention and low data contention. With high lock contention the lock
>> is usually inflated and biased locking is not suitable for that case
>> anyway.
>>
>> It was requested that this code did not affect other platforms. For that
>> the most of the code is put under #if INCLUDE_RTM_OPT which is defined
>> only for X86 and C2 and not EMBEDDED.
>>
>> All new RTM flags are declared as experimental and require to specify
>> "UnlockExperimentalVMOptions" flag.
>>
>>
>> SQE did full testing on these changes. Additional tests were developed.
>>
>> Thanks,
>> Vladimir
More information about the hotspot-dev
mailing list