RFR (XL) 8031320: Use Intel RTM instructions for locks

Wed Mar 19 03:56:19 UTC 2014

Thank you, Roland

Most of assembler code was copy/paste from existing fast_lock code. I 
copied it because current fast_lock code is a mess (a lot of EmitSync 
experimental code Dave Dice left) and it would be totally unreadable if 
I insert RTM code there.

On 3/18/14 5:20 AM, Roland Westrelin wrote:
> Hi Vladimir,
>
>> http://cr.openjdk.java.net/~kvn/8031320_9/webrev/
>
> Typos:
>
> src/share/vm/ci/ciMethodData.hpp
>
> 483   // return chached value
>
> src/share/vm/opto/macro.cpp
>
> 2510         assert(C->profile_rtm(), "only when rtm deop code is added”);

fixed.

> src/cpu/x86/vm/macroAssembler_x86.cpp
>
> 1440   movptr(tmpReg, Address(boxReg, ObjectMonitor::owner_offset_in_bytes()-2)) ;
>
> Why -2?

-2 removes a bit to get correct pointer.
For inflated locks the bit 1 is set in markword:
monitor_value            = 2

I modified code to use next:

   // Clean monitor_value bit to get valid pointer
   int owner_offset = ObjectMonitor::owner_offset_in_bytes() - 
markOopDesc::monitor_value;

> In MacroAssembler::fast_lock(), Moving code starting at:
> 1608     if (UseRTMForStackLocks && use_rtm) {
> and
> 1715     if (use_rtm) {
>
> to their own methods would be nice.

Done.

> 1787       // Use either "Self" (in threadReg) or rsp as thread identity in _owner.
>
> Is this comment accurate? I don’t see rsp being used.

Removed that line because rtm code uses only thread pointer.

>
> I don’t understand that part:

Ask Dave Dice who wrote it. An other code in 32-bit VM use stack pointer 
for (EmitSync & 128) == 0:

   // Ideally, I'd manifest "Self" with get_thread and then attempt
   // to CAS the register containing Self into m->Owner.
   // But we don't have enough registers, so instead we can either try 
to CAS
   // rsp or the address of the box (in scr) into &m->owner.  If the CAS 
succeeds
   // we later store "Self" into m->Owner.  Transiently storing a stack 
address
   // (rsp or the address of the box) into  m->owner is harmless.

>
> 1786       // Appears unlocked - try to swing _owner from null to non-null.
> 1787       // Use either "Self" (in threadReg) or rsp as thread identity in _owner.
> 1788       // Invariant: tmpReg == 0.  tmpReg is EAX which is the implicit cmpxchg comparand.
> 1789 #ifdef _LP64
> 1790       Register threadReg = r15_thread;
> 1791 #else
> 1792       get_thread(scrReg);
> 1793       Register threadReg = scrReg;
> 1794 #endif
> 1795       if (os::is_MP()) {
> 1796         lock();
> 1797       }
> 1798       cmpxchgptr(threadReg, Address(boxReg, ObjectMonitor::owner_offset_in_bytes()-2)); // Updates tmpReg
>
> src/cpu/x86/vm/sharedRuntime_x86_32.cpp
> src/cpu/x86/vm/sharedRuntime_x86_64.cpp
>
> Comments why you need xabort would be helpful

I think to be safe. We go into runtime/native code at those points.

>
> src/cpu/x86/vm/vm_version_x86.cpp
>
> Why are the changes to UseBiasedLocking done here:
> 964 bool VM_Version::use_biased_locking() {
>
> rather than when the other options are validated, that is code that starts with:
>
> 575   // Adjust RTM (Restricted Transactional Memory) flags

It is too late. VM_Version::initialize() is called after all flags 
already processed because it needs initialized CodeCache. We need to 
switch off UseBiasedLocking during arguments processing because it is 
used by Thread::allocate() which is used before VM_Version::initialize().

>
> src/share/vm/opto/compile.cpp
>
> 1082     if (method_has_option("NoRTMLockEliding") || ((rtm_state & NoRTM) != 0)) {
>
> Why do we need to check the compiler oracle? Doesn't the MDO initialization already take the compiler oracle into account?

The rtm_state in MDO is mutable. When we compile a method next time it 
could be different. For example, the code in deoptimization.cpp can set 
it to ProfileRTM.

I will send updated webrev when I address comments from all reviews.

Thanks,
Vladimir

>
> Roland.
>
>