RFR: 8280476: [macOS] : hotspot arm64 bug exposed by latest clang

Wed Feb 2 16:13:14 UTC 2022

On Tue, 1 Feb 2022 16:23:22 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> @kimbarrett - Thanks for the review! Good catch! I believe you are correct,
>> but let's wait for @adinn to chime in here.
>
> Returning 0 if count == 0 and bits otherwise is consistent with the old code,
> assuming the problematic shift doesn't cause UB data corruption or something
> otherwise strange.  It might do the "mathematically correct" thing of shifting
> out all the bits (so zeroing the value).  Another possibility is that the
> implemented shift amount for N is N mod word-size (some platforms do that, but
> I haven't looked up aarch64), which in this case is 64 mod 64, or 0.  I don't
> think there are any other non-strange non-UB-data-corruption possibilities.
> 
> 
> result is initially 0.
> with the old code:
> - if count == 0, there are no iterations, and final result is still 0.
> - if count == 1, there is one iteration.
>   result <<= nbits
>     -- if UB corruption, all bets are off
>     -- if shifts by 64, result is unchanged (== 0)
>     -- if shifts by 64 mod 64 (== 0), result is unchanged (== 0)
>   result |= (bits & mask)
>     -- result == bits
> - if count > 1, for additional iterations
>   result <<= nbits
>     -- if UB corruption, all bets are off
>     -- if shifts by 64, result == 0
>     -- if shifts by 64 mod 64 (== 0), result is unchanged (== bits)
>   result |= (bits & mask)
>     -- result == bits
> 
> So with old code, for count > 0, result == bits (or UB corrupted data).

First thing to note is that `replicate` is never actually called with `count == 0`. It's questionable whether it ever should be (especially as this function is really only meant to be used locally to this code) but logically I think the result ought to be 0 if it ever does get called that way.

What `replicate` is meant to do is replicate the bottom `nbits` bits of the `uint64_t` passed via argument `bits` across successive subfields of `result` up to `(count * bits)` worth of bits (sorry ... the names are somewhat inconveniently chosen as far as this discussion is concerned). So, anyway, a count of zero ought to insert 0 bits giving a zero result.

As to how it is used, `replicate` is only called from method `expandLogicalImmediate` with two alternative calling patterns. It is sometimes called like this

    for (int i = 0; i < 6; i++) {
      int nbits = 1 << i;
      . . .
        replicate(and_bit, 1, nbits)

n.b. argument `count` is being supplied as `nbits` and argument `nbits` is supplied as 1 (yeah, naming is hard). These calls serve to insert anywhere between 1 and 32 copies of a single bit into the rightmost portion of the result.

It is also called like this:

    for (int i = 0; i < 6; i++) {
      int nbits = 1 << i;
      . . .
          replicate(and_bits_top, 2 * nbits, 32 / nbits)

This is used to replicate a bit pattern taken from the lower end of the first input across the whole of a uint64_t result. For these calls the input arguments `count` and `nbits` satisfy the following invariants:

    nbits | 64
    count | 64
    nbits * count == 64
    2 <= nbits <= 32
    32 >= count >= 2

I am not actually clear why the caller is calling `replicate` in the way it does. The algorithm is way more complicated than it actually needs to be. The basic idea is to insert imms 1s into a 2^k bit field, right rotate them by immr and then replicate the result across the full 64 bits. There's a few wrinkles which are explained here for those who are interested:

expandLogicalImmediate(uint32_t immN, uint32_t immr, uint32_t imms, uint64_t& bimm)

-  outputs a bit pattern of width 64 by replicating a bit field of width 2^k and returns 1 or fails and returns 0

- when immN is 1 then k is 6 and immr/imms are masked to 6 bit integers.
- when immN is 0 then k is the count of the first 0 bit in imms and immr/imms are masked to k-bit integers (i.e. leading 1s in imms determine dead bits of imms and immr)

A k value of 0 (i.e. immN == 0 and imms = all 1s) fails and returns 0

After masking

An imms value of all 1s (i.e. 2^k - 1) fails and returns 0 (note given the previous check we only get here when  immN == 1)

The bit field can now be constructed as follows:

 - imms + 1 specifies the number of 1s to insert into the 2^k-field (note that imms + 1 < 2^k)
 - immr counts how far to right rotate the 2^k-field

The resulting 2^k-field is then replicated across the full 64-bit result.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7270