Loom x86_32 (?) and C2 problem

Ron Pressler ron.pressler at oracle.com
Thu Oct 6 12:07:33 UTC 2022


I would first look at the stack layout before the freeze and see how things look there. It’s possible that it’s frozen in slow mode and the spacing between frames could be different in the chunk (than the stack), possibly due to an assumption about frame alignment that might not hold in 32-bit.

— Ron

> On 6 Oct 2022, at 10:44, Aleksey Shipilev <shade at redhat.com> wrote:
> 
> Hi,
> 
> I have been struggling with an interesting bug in x86_32 port. Writing this down in hopes it would become obvious to me after I hit "Send".
> 
> The bug manifests only with C2, not with C1. It also seems to happen when deopt happened recently, but I am not sure if that is related. The reproducer I use is:
> 
> $ CONF=linux-x86-server-fastdebug make test TEST=java/lang/Thread/virtual/stress/Skynet.java TEST_VM_OPTS="-XX:-TieredCompilation -XX:+VerifyContinuations -XX:ActiveProcessorCount=1 -XX:+DeoptimizeALot"
> 
> #
> #  Internal Error (/home/shade/trunks/jdk/src/hotspot/share/oops/stackChunkOop.cpp:513), pid=1645519, tid=1645536
> #  fatal error: Bit not set at index 49 corresponding to 0xd9fbcee4
> 
> 
> The underlying reason, as far as I can see is as follows. There is a frozen chunk like this (showing only the interesting part):
> 
>  0xd9fbcef8: 0xc3f56b68 #3 nmethod 0xf27f1988 for method J java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object;
>                       - #0 scope java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object; @ 7
>                         local 0 for #4 (scope 1) oop
>                         oop for #4
>                         unextended_sp for #4
>                         sp for #4
>  0xd9fbcef4: 0xf27f000c return address
>  0xd9fbcef0: 0x00000009 saved fp
>  0xd9fbceec: 0x0001b211
>  0xd9fbcee8: 0xc63f1850
>  0xd9fbcee4: 0xc63f0e40 local 0 for #3 (scope 0) oop
>                         oop for #3
>  0xd9fbcee0: 0x00000008 param 1 boolean for #2
>                         derived pointer (base: 0xd9fbce94) for #2
>  0xd9fbcedc: 0x00000000 local 3 for #2 (scope 7) normal
>  0xd9fbced8: 0x00000000 #2 nmethod 0xf27de988 for method J java.util.concurrent.SynchronousQueue$TransferStack.transfer(Ljava/lang/Object;ZJ)Ljava/lang/Object;
>                         - #7 scope
>                        ...
>                         local 4 for #2 (scope 7) normal
>                         param 2 long for #2
>                         unextended_sp for #3
>                         sp for #3
> 
> 
> ...which gets partially thawed up to `SynchronousQueue.take()`, and while doing so, we over-clear the bitmap bits, so verification catches fire when seeing this:
> 
>  0xd9fbcef8: 0xdb5c2988 #0 nmethod 0xf27f1988 for method J java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object;
>                         - #0 scope java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object; @ 7
>                         local 0 for #1 (scope 1) oop
>                         oop for #1
>                         unextended_sp for #1
>                         sp for #1
>  0xd9fbcef4: 0xf27f000c return address
>  0xd9fbcef0: 0x00000009 saved fp
>  0xd9fbceec: 0x0001b211
>  0xd9fbcee8: 0xc63f1850
>  0xd9fbcee4: 0xd90a3ed8 local 0 for #0 (scope 0) oop   ; <----- this has no bitmap bit anymore
>                         oop for #0
>  0xd9fbcee0: 0x00000008
>  0xd9fbcedc: 0x00000000
>  0xd9fbced8: 0x00000000 CHUNK SP
>                         unextended_sp for #0
>                         sp for #0
> 
> 
> The clearing happens here, in the code that is supposed to clear the oop-ness bitmap that covers the argument parts of the frame that is now gone:
> 
> void ThawBase::recurse_thaw_compiled_frame(const frame& hf, frame& caller, int num_frames, bool stub_caller) {
>   ...
>    clear_bitmap_bits(heap_frame_top + ContinuationHelper::CompiledFrame::size(hf), added_argsize);
>   ...
> }
> 
> Debugging logging says, for that chunk:
>  heap_frame_top: 0xd9fbce88
>  CompiledFrame::size: 20
>  argsize: 4
>  num_stack_arg_slots: 4
>  clearing bitmap for [0xd9fbced8; 0xd9fbcee8)
> 
> The `added_argsize` for `TransferStack.transfer(Ljava/lang/Object;ZJ)Ljava/lang/Object;` is indeed 4. Meaning, the stack passed arguments take 4 slots: "this" and 1-st oop parameter get in registers, stack gets a boolean param (1 slot) and a long param (2 slots), rounded up to 4 slots. AFAICS this matches SharedRuntime::java_calling_convention too.
> 
> Yet, in the stack description above, I *looks* that "param 1 boolean for #2" and "param 2 long for #2" only take 3 slots? The bitmap code then clears one slot more, which corrupts the oop bitmap.
> 
> It seems to manifest on x86_32, because passing the arguments is a normal thing to do, in contrast to x86_64 which passes most arguments in registers. But it does not look like something that x86_64 would be immune to.
> 
> I read the C2 frame setup code in Matcher::match, and it seem to be fine with proper 2-slot alignment for the incoming argument block.
> 
> Does it look like a C2 bug to you? Are there any other clues I am missing here?
> 
> -- 
> Thanks,
> -Aleksey
> 



More information about the loom-dev mailing list