RFR (XL) 8031320: Use Intel RTM instructions for locks

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Mar 20 02:47:25 UTC 2014


I updated changes based reviews.

http://cr.openjdk.java.net/~kvn/8031320_9/webrev.01/

Main changes are in macroAssembler_x86.cpp. I moved RTM code from 
fast_lock() method into separate methods:  rtm_stack_locking(), 
rtm_inflated_locking() and for common rtm code rtm_profiling().
I did registers renaming in local scopes to reflect what values they 
contain.
I removed some asm instructions which results are not used (experimental 
code leftover).

3 flags were converted to product flags: UseRTMLocking, UseRTMDeopt, 
RTMRetryCount.

In phase1.cpp used TypeMetadataPtr for MDO pointer instead of RawPtr. 
Hit bug in TypeMetadataPtr::xmeet() and fixed it.

Thanks,
Vladimir

On 3/17/14 12:11 PM, Vladimir Kozlov wrote:
> https://bugs.openjdk.java.net/browse/JDK-8031320
> http://cr.openjdk.java.net/~kvn/8031320_9/webrev/
>
> The Intel architectures codenamed Haswell has support for RTM
> (Restricted Transactional Memory) instructions xbegin, xabort, xend and
> xtest as part of Intel Transactional Synchronization Extension (TSX).
> The xbegin and xend instructions enclose a set of instructions to be
> executed as a transaction. If no conflict found during execution of the
> transaction, the memory and register modifications are committed
> together at xend. xabort instruction can be used for explicit abort of
> transaction and xtest to check if we are in transaction.
>
> RTM is useful for highly contended locks with low conflict in the
> critical region. The highly contended locks don't scale well otherwise
> but with RTM they show good scaling. RTM allows using coarse grain
> locking for applications. Also for lightly contended locks which are
> used by different threads RTM can reduce cache line ping pong and
> thereby show performance improvement too.
>
> Implementation:
>
> Generate RTM locking code for all inflated locks when "UseRTMLocking"
> option is on with normal locking as fall back mechanism. On abort or
> lock busy the lock will be retried a fixed number of times as specified
> by "RTMRetryCount" option.  The locks which abort too often can be auto
> tuned or manually tuned.
>
> Auto-tuning can be done using "UseRTMDeopt" flag which will add an abort
> ratio calculation code for each lock. The abort ratio will be calculated
> after "RTMAbortThreshold" aborts are encountered.
> With "UseRTMDeopt" if the aborts ratio reaches "RTMAbortRatio" the
> nmethod containing the lock will be deoptimized and recompiled with all
> locks as normal (stack) locks. If the abort ratio continues to remain
> low after "RTMLockingThreshold" attempted locks, then the method will be
> deoptimized and recompiled with all locks as RTM locks without abort
> ratio calculation code. The abort ratio calculation can be delayed by
> specifying -XX:RTMLockingCalculationDelay=<millisec> flag.
> Deoptimization of nmethod is done by adding an uncommon trap at the
> beginning of the code which checks rtm state field in MDO which is
> modified by the abort calculation code.
>
> For manual tuning the abort statistics for each lock could be provided
> to a user using "PrintPreciseRTMLockingStatistics" diagnostic flag.
> Based on the abort statistics users can create a .hotspot_compiler file
> or use -XX:CompileCommand=<option> flag to specify for which methods
> disable RTM locking using <option> "NoRTMLockEliding" or always enable
> RTM locking using <option> "UseRTMLockEliding".
>
> The abort calculation and statistic collection are done using
> RTMLockingCounters wrapped into RTMLockingNamedCounter counters which
> are generated for each lock. To reduce burden on cache line RTM lock
> total counter is updated randomly with RTMTotalCountIncrRate rate.
>
> Note, both auto and manually tuning is done for whole method. There is
> no a mechanism to tune an individual lock.
>
> RTM locking can be used for normal (stack) locks by specifying
> "UseRTMForStackLocks" flag.
>
> RTM locking code requires that biased locking is switched off because it
> conflicts with it. RTM locking is most useful when there is high lock
> contention and low data contention.  With high lock contention the lock
> is usually inflated and biased locking is not suitable for that case
> anyway.
>
> It was requested that this code did not affect other platforms. For that
> the most of the code is put under #if INCLUDE_RTM_OPT which is defined
> only for X86 and C2 and not EMBEDDED.
>
> All new RTM flags are declared as experimental and require to specify
> "UnlockExperimentalVMOptions" flag.
>
>
> SQE did full testing on these changes. Additional tests were developed.
>
> Thanks,
> Vladimir


More information about the hotspot-dev mailing list