RFR (XL) 8031320: Use Intel RTM instructions for locks

Mon Mar 17 19:11:43 UTC 2014

https://bugs.openjdk.java.net/browse/JDK-8031320
http://cr.openjdk.java.net/~kvn/8031320_9/webrev/

The Intel architectures codenamed Haswell has support for RTM 
(Restricted Transactional Memory) instructions xbegin, xabort, xend and 
xtest as part of Intel Transactional Synchronization Extension (TSX). 
The xbegin and xend instructions enclose a set of instructions to be 
executed as a transaction. If no conflict found during execution of the 
transaction, the memory and register modifications are committed 
together at xend. xabort instruction can be used for explicit abort of 
transaction and xtest to check if we are in transaction.

RTM is useful for highly contended locks with low conflict in the 
critical region. The highly contended locks don't scale well otherwise 
but with RTM they show good scaling. RTM allows using coarse grain 
locking for applications. Also for lightly contended locks which are 
used by different threads RTM can reduce cache line ping pong and 
thereby show performance improvement too.

Implementation:

Generate RTM locking code for all inflated locks when "UseRTMLocking" 
option is on with normal locking as fall back mechanism. On abort or 
lock busy the lock will be retried a fixed number of times as specified 
by "RTMRetryCount" option.  The locks which abort too often can be auto 
tuned or manually tuned.

Auto-tuning can be done using "UseRTMDeopt" flag which will add an abort 
ratio calculation code for each lock. The abort ratio will be calculated 
after "RTMAbortThreshold" aborts are encountered.
With "UseRTMDeopt" if the aborts ratio reaches "RTMAbortRatio" the 
nmethod containing the lock will be deoptimized and recompiled with all 
locks as normal (stack) locks. If the abort ratio continues to remain 
low after "RTMLockingThreshold" attempted locks, then the method will be 
deoptimized and recompiled with all locks as RTM locks without abort 
ratio calculation code. The abort ratio calculation can be delayed by 
specifying -XX:RTMLockingCalculationDelay=<millisec> flag.
Deoptimization of nmethod is done by adding an uncommon trap at the 
beginning of the code which checks rtm state field in MDO which is 
modified by the abort calculation code.

For manual tuning the abort statistics for each lock could be provided 
to a user using "PrintPreciseRTMLockingStatistics" diagnostic flag. 
Based on the abort statistics users can create a .hotspot_compiler file 
or use -XX:CompileCommand=<option> flag to specify for which methods 
disable RTM locking using <option> "NoRTMLockEliding" or always enable 
RTM locking using <option> "UseRTMLockEliding".

The abort calculation and statistic collection are done using 
RTMLockingCounters wrapped into RTMLockingNamedCounter counters which 
are generated for each lock. To reduce burden on cache line RTM lock 
total counter is updated randomly with RTMTotalCountIncrRate rate.

Note, both auto and manually tuning is done for whole method. There is 
no a mechanism to tune an individual lock.

RTM locking can be used for normal (stack) locks by specifying 
"UseRTMForStackLocks" flag.

RTM locking code requires that biased locking is switched off because it 
conflicts with it. RTM locking is most useful when there is high lock 
contention and low data contention.  With high lock contention the lock 
is usually inflated and biased locking is not suitable for that case 
anyway.

It was requested that this code did not affect other platforms. For that 
the most of the code is put under #if INCLUDE_RTM_OPT which is defined 
only for X86 and C2 and not EMBEDDED.

All new RTM flags are declared as experimental and require to specify 
"UnlockExperimentalVMOptions" flag.

SQE did full testing on these changes. Additional tests were developed.

Thanks,
Vladimir