RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 [v2]

Marc Chevalier mchevalier at openjdk.org
Tue Jun 3 08:08:57 UTC 2025


On Mon, 2 Jun 2025 10:37:11 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

>> ### Problem
>> 
>> On Aarch64, using  `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue:
>> 
>> ; Load lFld into local x
>> ldr  x11,      [x10, #120]
>> ; popCountI
>> mov  w11,      w11
>> mov  v16.d[0], x11
>> cnt  v16.8b,   v16.8b
>> addv b16,      v16.8b
>> mov  x13,      v16.d[0]
>> ; [...]
>> ; store local x (which is believed to still contain lFld) into result
>> str  x11,      [x10, #128]
>> 
>> 
>> The instruction `mov  w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it:
>> 
>> instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{
>>   match(Set dst (PopCountI src));
>>   effect(TEMP tmp);
>>   [...]
>> %}
>> 
>> 
>> But then, why resetting the upper word of `x11`? It all starts with vector instructions:
>> 
>> cnt  v16.8b,   v16.8b
>> addv b16,      v16.8b
>> 
>> The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing
>> 
>> mov  v16.s[0], w11
>> 
>> would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which
>> 
>> mov  w11,      w11
>> mov  v16.d[0], x11
>> 
>> does, but by destroying `x11`.
>> 
>> ### Solution
>> 
>> Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists.
>> 
>> The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately:
>> 
>> mov  v16.s[1], wzr      ...
>
> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Apply suggestions

Thanks @sendaoYan, @theRealAph and @TobiHartmann for review and nice suggestions!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2934041591


More information about the hotspot-compiler-dev mailing list