[aarch64-port-dev ] [15] RFR: 8248048: ZGC: AArch64: SIGILL in load barrier register spilling
Andrew Dinn
adinn at redhat.com
Fri Jun 26 10:05:35 UTC 2020
Hi Stefan,
Yes, nice catch. zr is clearly the wrong choice here. In the context of
an FP register it ends up being interpreted as q31 which, as you show,
clashes when r31 is the last register in an odd register set.
Your fix looks ok to me (so count it as reviewed).
Just for your interest, another alternative would be to continue to use
stpq instructions but replace zr with an fp register that is a) in the
save set but b) guaranteed to differ from the last register,-- the
obvious choice being regs[0]. That will work in all cases where count >=
2 (well 3 actually since the problem only arises in odd cases). So, you
would still need to special case count == 0/1 and only add regs[0] to
the set after handling those cases.
I'm agnostic over which of these two is better as I don't think either a
stpq or a strq pre/post is preferable to the other.
regards,
Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill
On 26/06/2020 09:50, Stefan Karlsson wrote:
> Hi all,
>
> Please review this patch to fix a ZGC load barrier register spilling bug.
>
> https://cr.openjdk.java.net/~stefank/8248048/webrev.01/
> https://bugs.openjdk.java.net/browse/JDK-8248048
>
> The JVM crashed with an ILL_ILLOPC when executing this instruction in
> our load barrier stub:
>
> ldp q31, q31, [sp, #224]
>
> The entire load barrier stub:
>
> 0x0000ffff998ab964: stp x10, x13, [sp, #-32]!
> 0x0000ffff998ab968: stp x14, x17, [sp, #16]
> 0x0000ffff998ab96c: stp q1, q2, [sp, #-256]!
> 0x0000ffff998ab970: stp q3, q19, [sp, #32]
> 0x0000ffff998ab974: stp q20, q21, [sp, #64]
> 0x0000ffff998ab978: stp q22, q23, [sp, #96]
> 0x0000ffff998ab97c: stp q24, q25, [sp, #128]
> 0x0000ffff998ab980: stp q26, q28, [sp, #160]
> 0x0000ffff998ab984: stp q29, q30, [sp, #192]
> 0x0000ffff998ab988: stp q31, q31, [sp, #224]
> 0x0000ffff998ab98c: sub x1, x10, #0x0
> 0x0000ffff998ab990: mov x0, x11
> 0x0000ffff998ab994: mov x8, #0xfc28 // #64552
> 0x0000ffff998ab998: movk x8, #0xaf11, lsl #16
> 0x0000ffff998ab99c: movk x8, #0xffff, lsl #32
> 0x0000ffff998ab9a0: blr x8 ; branch into the JVM
> 0x0000ffff998ab9a4: mov x11, x0
> 0x0000ffff998ab9a8: ldp q3, q19, [sp, #32]
> 0x0000ffff998ab9ac: ldp q20, q21, [sp, #64]
> 0x0000ffff998ab9b0: ldp q22, q23, [sp, #96]
> 0x0000ffff998ab9b4: ldp q24, q25, [sp, #128]
> 0x0000ffff998ab9b8: ldp q26, q28, [sp, #160]
> 0x0000ffff998ab9bc: ldp q29, q30, [sp, #192]
> => 0x0000ffff998ab9c0: ldp q31, q31, [sp, #224]
> 0x0000ffff998ab9c4: ldp q1, q2, [sp], #256
> 0x0000ffff998ab9c8: ldp x14, x17, [sp, #16]
> 0x0000ffff998ab9cc: ldp x10, x13, [sp], #32
> 0x0000ffff998ab9d0: b 0xffff998aa718
>
> It seems to be illegal to use the same register twice when loading into
> a pair of registers. I verified that that was the problem, and not the
> usage of zr (see below) that caused some weird encoding, by changing the
> code to always generate stp/ldp with the same register:
>
> => 0x0000ffff757d22fc: ldp q20, q20, [sp, #32] ; Crash here as well
> 0x0000ffff757d2300: ldp q21, q21, [sp, #48]
> 0x0000ffff757d2304: ldp q22, q22, [sp, #64]
>
> The code that generates this instruction is MacroAssembler::push_fp,
> which spills the necessary registers in pairs with stp/ldp calls. If the
> number of registers to spill is odd it needs to deal with one of the
> registers separately. This is done by adding a dummy register here:
>
> 2136 regs[count++] = zr->encoding_nocheck();
> 2137 count &= ~1; // Only push an even number of regs
>
> This scheme seems to work for the normal registers
> (MacroAssembler::push), but the usage of zr seems dubious when we're
> dealing with the fp/simd version of stp/ldp.
>
> My proposed patch replaces the stp/ldp for the odd numbered register
> with the single-register versions: str/ldr. I make sure to keep the
> stack 16 bytes aligned by still bumping 16 bytes, but skipping the
> store/load to the second 8 bytes half.
>
> Note that right now MacroAssembler::push_fp is only used by ZGC.
>
> This fixes the crash. I've run this code through jtreg groups :tier1,
> tier2, tier3, and an Oracle-internal stress suite without any new problems.
>
> The smallest reproducer I have is:
> make -C ../build/fastdebug test TEST=test/jdk/java/util/concurrent/
> JTREG="JAVA_OPTIONS=-XX:+UseZGC"
>
> Does this look OK?
>
> Thanks,
> StefanK
>
More information about the aarch64-port-dev
mailing list