RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64
SendaoYan
syan at openjdk.org
Sat May 31 03:13:53 UTC 2025
On Fri, 30 May 2025 15:33:14 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:
> ### Problem
>
> On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue:
>
> ; Load lFld into local x
> ldr x11, [x10, #120]
> ; popCountI
> mov w11, w11
> mov v16.d[0], x11
> cnt v16.8b, v16.8b
> addv b16, v16.8b
> mov x13, v16.d[0]
> ; [...]
> ; store local x (which is believed to still contain lFld) into result
> str x11, [x10, #128]
>
>
> The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it:
>
> instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{
> match(Set dst (PopCountI src));
> effect(TEMP tmp);
> [...]
> %}
>
>
> But then, why resetting the upper word of `x11`? It all starts with vector instructions:
>
> cnt v16.8b, v16.8b
> addv b16, v16.8b
>
> The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing
>
> mov v16.s[0], w11
>
> would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which
>
> mov w11, w11
> mov v16.d[0], x11
>
> does, but by destroying `x11`.
>
> ### Solution
>
> Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists.
>
> The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately:
>
> mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0
> mov v16.s[0], w11 ; Set the 0-ind...
test/hotspot/jtreg/compiler/intrinsics/BitCountIAarch64PreservesArgument.java line 58:
> 56: if (result != 0xfedc_ba98_7654_3210L) {
> 57: // Wrongly outputs the cut input 0x7654_3210 == 1985229328
> 58: throw new RuntimeException("Wrong result. lFld=" + lFld + "; result=" + result);
How about:
throw new RuntimeException("Wrong result. Expected result = " + lFld + "; Actual result = " + result);
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25551#discussion_r2117132388
More information about the hotspot-compiler-dev
mailing list