[aarch64-port-dev ] [15] RFR: 8248048: ZGC: AArch64: SIGILL in load barrier register spilling
Stefan Karlsson
stefan.karlsson at oracle.com
Fri Jun 26 11:35:50 UTC 2020
Hi Andrew,
On 2020-06-26 12:05, Andrew Dinn wrote:
> Hi Stefan,
>
> Yes, nice catch. zr is clearly the wrong choice here. In the context of
> an FP register it ends up being interpreted as q31 which, as you show,
> clashes when r31 is the last register in an odd register set.
>
> Your fix looks ok to me (so count it as reviewed).
Thanks for reviewing!
>
> Just for your interest, another alternative would be to continue to use
> stpq instructions but replace zr with an fp register that is a) in the
> save set but b) guaranteed to differ from the last register,-- the
> obvious choice being regs[0]. That will work in all cases where count >=
> 2 (well 3 actually since the problem only arises in odd cases). So, you
> would still need to special case count == 0/1 and only add regs[0] to
> the set after handling those cases.
I was thinking along the same lines first. The problem with the count ==
1 case made me look for another solution.
>
> I'm agnostic over which of these two is better as I don't think either a
> stpq or a strq pre/post is preferable to the other.
OK. Again, thanks for taking the time to think about the different ways
to fix this.
StefanK
>
> regards,
>
>
> Andrew Dinn
> -----------
> Red Hat Distinguished Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>
> On 26/06/2020 09:50, Stefan Karlsson wrote:
>> Hi all,
>>
>> Please review this patch to fix a ZGC load barrier register spilling bug.
>>
>> https://cr.openjdk.java.net/~stefank/8248048/webrev.01/
>> https://bugs.openjdk.java.net/browse/JDK-8248048
>>
>> The JVM crashed with an ILL_ILLOPC when executing this instruction in
>> our load barrier stub:
>>
>> ldp q31, q31, [sp, #224]
>>
>> The entire load barrier stub:
>>
>> 0x0000ffff998ab964: stp x10, x13, [sp, #-32]!
>> 0x0000ffff998ab968: stp x14, x17, [sp, #16]
>> 0x0000ffff998ab96c: stp q1, q2, [sp, #-256]!
>> 0x0000ffff998ab970: stp q3, q19, [sp, #32]
>> 0x0000ffff998ab974: stp q20, q21, [sp, #64]
>> 0x0000ffff998ab978: stp q22, q23, [sp, #96]
>> 0x0000ffff998ab97c: stp q24, q25, [sp, #128]
>> 0x0000ffff998ab980: stp q26, q28, [sp, #160]
>> 0x0000ffff998ab984: stp q29, q30, [sp, #192]
>> 0x0000ffff998ab988: stp q31, q31, [sp, #224]
>> 0x0000ffff998ab98c: sub x1, x10, #0x0
>> 0x0000ffff998ab990: mov x0, x11
>> 0x0000ffff998ab994: mov x8, #0xfc28 // #64552
>> 0x0000ffff998ab998: movk x8, #0xaf11, lsl #16
>> 0x0000ffff998ab99c: movk x8, #0xffff, lsl #32
>> 0x0000ffff998ab9a0: blr x8 ; branch into the JVM
>> 0x0000ffff998ab9a4: mov x11, x0
>> 0x0000ffff998ab9a8: ldp q3, q19, [sp, #32]
>> 0x0000ffff998ab9ac: ldp q20, q21, [sp, #64]
>> 0x0000ffff998ab9b0: ldp q22, q23, [sp, #96]
>> 0x0000ffff998ab9b4: ldp q24, q25, [sp, #128]
>> 0x0000ffff998ab9b8: ldp q26, q28, [sp, #160]
>> 0x0000ffff998ab9bc: ldp q29, q30, [sp, #192]
>> => 0x0000ffff998ab9c0: ldp q31, q31, [sp, #224]
>> 0x0000ffff998ab9c4: ldp q1, q2, [sp], #256
>> 0x0000ffff998ab9c8: ldp x14, x17, [sp, #16]
>> 0x0000ffff998ab9cc: ldp x10, x13, [sp], #32
>> 0x0000ffff998ab9d0: b 0xffff998aa718
>>
>> It seems to be illegal to use the same register twice when loading into
>> a pair of registers. I verified that that was the problem, and not the
>> usage of zr (see below) that caused some weird encoding, by changing the
>> code to always generate stp/ldp with the same register:
>>
>> => 0x0000ffff757d22fc: ldp q20, q20, [sp, #32] ; Crash here as well
>> 0x0000ffff757d2300: ldp q21, q21, [sp, #48]
>> 0x0000ffff757d2304: ldp q22, q22, [sp, #64]
>>
>> The code that generates this instruction is MacroAssembler::push_fp,
>> which spills the necessary registers in pairs with stp/ldp calls. If the
>> number of registers to spill is odd it needs to deal with one of the
>> registers separately. This is done by adding a dummy register here:
>>
>> 2136 regs[count++] = zr->encoding_nocheck();
>> 2137 count &= ~1; // Only push an even number of regs
>>
>> This scheme seems to work for the normal registers
>> (MacroAssembler::push), but the usage of zr seems dubious when we're
>> dealing with the fp/simd version of stp/ldp.
>>
>> My proposed patch replaces the stp/ldp for the odd numbered register
>> with the single-register versions: str/ldr. I make sure to keep the
>> stack 16 bytes aligned by still bumping 16 bytes, but skipping the
>> store/load to the second 8 bytes half.
>>
>> Note that right now MacroAssembler::push_fp is only used by ZGC.
>>
>> This fixes the crash. I've run this code through jtreg groups :tier1,
>> tier2, tier3, and an Oracle-internal stress suite without any new problems.
>>
>> The smallest reproducer I have is:
>> make -C ../build/fastdebug test TEST=test/jdk/java/util/concurrent/
>> JTREG="JAVA_OPTIONS=-XX:+UseZGC"
>>
>> Does this look OK?
>>
>> Thanks,
>> StefanK
>>
>
More information about the aarch64-port-dev
mailing list