Loom x86_32 (?) and C2 problem
dean.long at oracle.com
dean.long at oracle.com
Thu Oct 6 22:20:14 UTC 2022
It looks like x86_32 does not always store stack parameters in the same
order they were declared. DOUBLE and LONG parameters come first, to
preserve alignment. So if there is an assumption that the stack slot
for the last argument in the signature is the last one belonging to the
frame, then that would break here, because the slot for the BOOLEAN is
actually last, not the LONG.
dl
On 10/6/22 2:44 AM, Aleksey Shipilev wrote:
> Hi,
>
> I have been struggling with an interesting bug in x86_32 port. Writing
> this down in hopes it would become obvious to me after I hit "Send".
>
> The bug manifests only with C2, not with C1. It also seems to happen
> when deopt happened recently, but I am not sure if that is related.
> The reproducer I use is:
>
> $ CONF=linux-x86-server-fastdebug make test
> TEST=java/lang/Thread/virtual/stress/Skynet.java
> TEST_VM_OPTS="-XX:-TieredCompilation -XX:+VerifyContinuations
> -XX:ActiveProcessorCount=1 -XX:+DeoptimizeALot"
>
> #
> # Internal Error
> (/home/shade/trunks/jdk/src/hotspot/share/oops/stackChunkOop.cpp:513),
> pid=1645519, tid=1645536
> # fatal error: Bit not set at index 49 corresponding to 0xd9fbcee4
>
>
> The underlying reason, as far as I can see is as follows. There is a
> frozen chunk like this (showing only the interesting part):
>
> 0xd9fbcef8: 0xc3f56b68 #3 nmethod 0xf27f1988 for method J
> java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object;
> - #0 scope
> java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object; @ 7
> local 0 for #4 (scope 1) oop
> oop for #4
> unextended_sp for #4
> sp for #4
> 0xd9fbcef4: 0xf27f000c return address
> 0xd9fbcef0: 0x00000009 saved fp
> 0xd9fbceec: 0x0001b211
> 0xd9fbcee8: 0xc63f1850
> 0xd9fbcee4: 0xc63f0e40 local 0 for #3 (scope 0) oop
> oop for #3
> 0xd9fbcee0: 0x00000008 param 1 boolean for #2
> derived pointer (base: 0xd9fbce94) for #2
> 0xd9fbcedc: 0x00000000 local 3 for #2 (scope 7) normal
> 0xd9fbced8: 0x00000000 #2 nmethod 0xf27de988 for method J
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(Ljava/lang/Object;ZJ)Ljava/lang/Object;
> - #7 scope
> ...
> local 4 for #2 (scope 7) normal
> param 2 long for #2
> unextended_sp for #3
> sp for #3
>
>
> ...which gets partially thawed up to `SynchronousQueue.take()`, and
> while doing so, we over-clear the bitmap bits, so verification catches
> fire when seeing this:
>
> 0xd9fbcef8: 0xdb5c2988 #0 nmethod 0xf27f1988 for method J
> java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object;
> - #0 scope
> java.util.concurrent.SynchronousQueue.take()Ljava/lang/Object; @ 7
> local 0 for #1 (scope 1) oop
> oop for #1
> unextended_sp for #1
> sp for #1
> 0xd9fbcef4: 0xf27f000c return address
> 0xd9fbcef0: 0x00000009 saved fp
> 0xd9fbceec: 0x0001b211
> 0xd9fbcee8: 0xc63f1850
> 0xd9fbcee4: 0xd90a3ed8 local 0 for #0 (scope 0) oop ; <----- this
> has no bitmap bit anymore
> oop for #0
> 0xd9fbcee0: 0x00000008
> 0xd9fbcedc: 0x00000000
> 0xd9fbced8: 0x00000000 CHUNK SP
> unextended_sp for #0
> sp for #0
>
>
> The clearing happens here, in the code that is supposed to clear the
> oop-ness bitmap that covers the argument parts of the frame that is
> now gone:
>
> void ThawBase::recurse_thaw_compiled_frame(const frame& hf, frame&
> caller, int num_frames, bool stub_caller) {
> ...
> clear_bitmap_bits(heap_frame_top +
> ContinuationHelper::CompiledFrame::size(hf), added_argsize);
> ...
> }
>
> Debugging logging says, for that chunk:
> heap_frame_top: 0xd9fbce88
> CompiledFrame::size: 20
> argsize: 4
> num_stack_arg_slots: 4
> clearing bitmap for [0xd9fbced8; 0xd9fbcee8)
>
> The `added_argsize` for
> `TransferStack.transfer(Ljava/lang/Object;ZJ)Ljava/lang/Object;` is
> indeed 4. Meaning, the stack passed arguments take 4 slots: "this" and
> 1-st oop parameter get in registers, stack gets a boolean param (1
> slot) and a long param (2 slots), rounded up to 4 slots. AFAICS this
> matches SharedRuntime::java_calling_convention too.
>
> Yet, in the stack description above, I *looks* that "param 1 boolean
> for #2" and "param 2 long for #2" only take 3 slots? The bitmap code
> then clears one slot more, which corrupts the oop bitmap.
>
> It seems to manifest on x86_32, because passing the arguments is a
> normal thing to do, in contrast to x86_64 which passes most arguments
> in registers. But it does not look like something that x86_64 would be
> immune to.
>
> I read the C2 frame setup code in Matcher::match, and it seem to be
> fine with proper 2-slot alignment for the incoming argument block.
>
> Does it look like a C2 bug to you? Are there any other clues I am
> missing here?
>
More information about the loom-dev
mailing list