Loom x86_32 (?) and C2 problem

Aleksey Shipilev shade at redhat.com
Thu Oct 6 09:44:29 UTC 2022


Hi,

I have been struggling with an interesting bug in x86_32 port. Writing this down in hopes it would 
become obvious to me after I hit "Send".

The bug manifests only with C2, not with C1. It also seems to happen when deopt happened recently, 
but I am not sure if that is related. The reproducer I use is:

$ CONF=linux-x86-server-fastdebug make test TEST=java/lang/Thread/virtual/stress/Skynet.java 
TEST_VM_OPTS="-XX:-TieredCompilation -XX:+VerifyContinuations -XX:ActiveProcessorCount=1 
-XX:+DeoptimizeALot"

#
#  Internal Error (/home/shade/trunks/jdk/src/hotspot/share/oops/stackChunkOop.cpp:513), 
pid=1645519, tid=1645536
#  fatal error: Bit not set at index 49 corresponding to 0xd9fbcee4


The underlying reason, as far as I can see is as follows. There is a frozen chunk like this (showing 
only the interesting part):

   0xd9fbcef8: 0xc3f56b68 #3 nmethod 0xf27f1988 for method J 
java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object;
                        - #0 scope java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object; @ 7
                          local 0 for #4 (scope 1) oop
                          oop for #4
                          unextended_sp for #4
                          sp for #4
   0xd9fbcef4: 0xf27f000c return address
   0xd9fbcef0: 0x00000009 saved fp
   0xd9fbceec: 0x0001b211
   0xd9fbcee8: 0xc63f1850
   0xd9fbcee4: 0xc63f0e40 local 0 for #3 (scope 0) oop
                          oop for #3
   0xd9fbcee0: 0x00000008 param 1 boolean for #2
                          derived pointer (base: 0xd9fbce94) for #2
   0xd9fbcedc: 0x00000000 local 3 for #2 (scope 7) normal
   0xd9fbced8: 0x00000000 #2 nmethod 0xf27de988 for method J 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(Ljava/lang/Object;ZJ)Ljava/lang/Object;
                          - #7 scope
                         ...
                          local 4 for #2 (scope 7) normal
                          param 2 long for #2
                          unextended_sp for #3
                          sp for #3


...which gets partially thawed up to `SynchronousQueue.take()`, and while doing so, we over-clear 
the bitmap bits, so verification catches fire when seeing this:

   0xd9fbcef8: 0xdb5c2988 #0 nmethod 0xf27f1988 for method J 
java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object;
                          - #0 scope java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object; @ 7
                          local 0 for #1 (scope 1) oop
                          oop for #1
                          unextended_sp for #1
                          sp for #1
   0xd9fbcef4: 0xf27f000c return address
   0xd9fbcef0: 0x00000009 saved fp
   0xd9fbceec: 0x0001b211
   0xd9fbcee8: 0xc63f1850
   0xd9fbcee4: 0xd90a3ed8 local 0 for #0 (scope 0) oop   ; <----- this has no bitmap bit anymore
                          oop for #0
   0xd9fbcee0: 0x00000008
   0xd9fbcedc: 0x00000000
   0xd9fbced8: 0x00000000 CHUNK SP
                          unextended_sp for #0
                          sp for #0


The clearing happens here, in the code that is supposed to clear the oop-ness bitmap that covers the 
argument parts of the frame that is now gone:

void ThawBase::recurse_thaw_compiled_frame(const frame& hf, frame& caller, int num_frames, bool 
stub_caller) {
    ...
     clear_bitmap_bits(heap_frame_top + ContinuationHelper::CompiledFrame::size(hf), added_argsize);
    ...
}

Debugging logging says, for that chunk:
   heap_frame_top: 0xd9fbce88
   CompiledFrame::size: 20
   argsize: 4
   num_stack_arg_slots: 4
   clearing bitmap for [0xd9fbced8; 0xd9fbcee8)

The `added_argsize` for `TransferStack.transfer(Ljava/lang/Object;ZJ)Ljava/lang/Object;` is indeed 
4. Meaning, the stack passed arguments take 4 slots: "this" and 1-st oop parameter get in registers, 
stack gets a boolean param (1 slot) and a long param (2 slots), rounded up to 4 slots. AFAICS this 
matches SharedRuntime::java_calling_convention too.

Yet, in the stack description above, I *looks* that "param 1 boolean for #2" and "param 2 long for 
#2" only take 3 slots? The bitmap code then clears one slot more, which corrupts the oop bitmap.

It seems to manifest on x86_32, because passing the arguments is a normal thing to do, in contrast 
to x86_64 which passes most arguments in registers. But it does not look like something that x86_64 
would be immune to.

I read the C2 frame setup code in Matcher::match, and it seem to be fine with proper 2-slot 
alignment for the incoming argument block.

Does it look like a C2 bug to you? Are there any other clues I am missing here?

-- 
Thanks,
-Aleksey



More information about the loom-dev mailing list