RFR: 8236179: C1 register allocation error with T_ADDRESS

Aditya Mandaleeka adityam at microsoft.com
Wed Dec 18 16:44:26 UTC 2019


Hi all,

I encountered a crashing bug when running a jcstress test with Shenandoah, and tracked down the problem to how C1 generates code for call nodes that use T_ADDRESS operands. Specifically, in my failure case, the C1-generated code for a load reference barrier was treating the load address parameter (which is typed T_ADDRESS in the BasicTypeList for the runtime call) as a 32-bit value and therefore missing the upper bits when calling into the native code for the barrier on my x86-64 machine.

The fix modifies the FrameMap logic to use address operands for T_ADDRESS, and also modifies the x86 LIR assembler to use pointer-sized movs when moving address-typed stack values to and from registers.

Bug: bugs.openjdk.java.net/browse/JDK-8236179
Webrev: adityamandaleeka.github.io/webrevs/c1_address_64_bit/webrev

The rest of this email contains more information about the issue and the analysis that led to the fix.

Thank you,
Aditya Mandaleeka

========
How this was found

As mentioned I was running the jcstress suite when I ran into this bug. I was running it on Windows, but the problem and the fix are OS-agnostic. I was trying to reproduce a separate issue at the time, which involved setting several options such as aggressive Shenandoah heuristics and disabling C2 (limiting tiering to level 1). I was never able to reproduce that other bug but I noticed a crash on WeakCASTest.WeakCompareAndSetPlainString, which was consistently hitting access violations on an atomic cmpxchg. I decided to investigate, running this test as my reproducer. 

========
Analysis

Looking at the dump for this crash under a debugger, it was apparent that it was something to do with the LRB code; there was an access violation when trying to do the CAS operation on the reference location after we determined the new location of the object. The reference address was indeed bogus, and I tracked down where it was coming from. Here is some code from the caller (Intel syntax ahead):

00000261`9beaf4da 488b4978         mov     rcx,qword ptr [rcx+78h]
00000261`9beaf4de 488b11           mov     rdx,qword ptr [rcx]
00000261`9beaf4e1 488d09           lea     rcx,[rcx]
00000261`9beaf4e4 48898c24b8010000 mov     qword ptr [rsp+1B8h],rcx
00000261`9beaf4ec 488bca           mov     rcx,rdx 
00000261`9beaf4ef 8b9424b8010000   mov     edx,dword ptr [rsp+1B8h]
00000261`9beaf4f6 49baa0139ecef87f0000 mov r10,offset jvm!ShenandoahRuntime::load_reference_barrier_native (00007ff8`ce9e13a0)
00000261`9beaf500 41ffd2           call    r10

The second argument (which goes in *DX) for the load_reference_barrier_native is the load address. I noticed in the code above that, in the process of moving that value around, we stored it as 64-bit to the stack but then restored it as a 32-bit value when we put it in EDX. And sure enough, when I look at the memory in that location, there are a few extra bits which go with the address to make it valid.

I found the following in the IR which seemed suspicious:

   wide_move [Base:[R653|M] Disp: 0|L] [R654|L] 
   leal [Base:[R653|M] Disp: 0|L] [R655|L] 
   move [R654|L] [rcx|L] 
   move [R655|L] [rdx|I] 
   rtcall ShenandoahRuntime::load_reference_barrier_native

As you can see, the move to RDX is done as I while the load was done as L. This explained the reason for the code being the way it was, so the next step was to figure out why the discrepancy exists in the IR itself.

In ShenandoahBarrierSetC1::load_at_resolved, we are treating this second argument as T_ADDRESS which seemed appropriate. I experimented by changing the argument type to T_OBJECT and verified that correct code was generated, though being new to C1 (and OpenJDK in general :)) it wasn't clear to me whether that was an appropriate fix or whether T_ADDRESS should indeed be fixed to work correctly. Thankfully at this point I was put into contact with Roland Westrelin who guided me through fixing C1 to make it correctly handle T_ADDRESS, which is what this patch does. Thanks Roland!

========
Testing done

Apart from verifying that the codegen issue in my jcstress repro is fixed, I've run these tests with this patch:
  tier1
  tier2
  hotspot_gc_shenandoah


More information about the shenandoah-dev mailing list