RFR: 8280476: [macOS] : hotspot arm64 bug exposed by latest clang

Wed Feb 2 20:14:12 UTC 2022

On Wed, 2 Feb 2022 16:09:50 GMT, Andrew Dinn <adinn at openjdk.org> wrote:

>> Returning 0 if count == 0 and bits otherwise is consistent with the old code,
>> assuming the problematic shift doesn't cause UB data corruption or something
>> otherwise strange.  It might do the "mathematically correct" thing of shifting
>> out all the bits (so zeroing the value).  Another possibility is that the
>> implemented shift amount for N is N mod word-size (some platforms do that, but
>> I haven't looked up aarch64), which in this case is 64 mod 64, or 0.  I don't
>> think there are any other non-strange non-UB-data-corruption possibilities.
>> 
>> 
>> result is initially 0.
>> with the old code:
>> - if count == 0, there are no iterations, and final result is still 0.
>> - if count == 1, there is one iteration.
>>   result <<= nbits
>>     -- if UB corruption, all bets are off
>>     -- if shifts by 64, result is unchanged (== 0)
>>     -- if shifts by 64 mod 64 (== 0), result is unchanged (== 0)
>>   result |= (bits & mask)
>>     -- result == bits
>> - if count > 1, for additional iterations
>>   result <<= nbits
>>     -- if UB corruption, all bets are off
>>     -- if shifts by 64, result == 0
>>     -- if shifts by 64 mod 64 (== 0), result is unchanged (== bits)
>>   result |= (bits & mask)
>>     -- result == bits
>> 
>> So with old code, for count > 0, result == bits (or UB corrupted data).
>
> First thing to note is that `replicate` is never actually called with `count == 0`. It's questionable whether it ever should be (especially as this function is really only meant to be used locally to this code) but logically I think the result ought to be 0 if it ever does get called that way.
> 
> What `replicate` is meant to do is replicate the bottom `nbits` bits of the `uint64_t` passed via argument `bits` across successive subfields of `result` up to `(count * bits)` worth of bits (sorry ... the names are somewhat inconveniently chosen as far as this discussion is concerned). So, anyway, a count of zero ought to insert 0 bits giving a zero result.
> 
> As to how it is used, `replicate` is only called from method `expandLogicalImmediate` with two alternative calling patterns. It is sometimes called like this
> 
>     for (int i = 0; i < 6; i++) {
>       int nbits = 1 << i;
>       . . .
>         replicate(and_bit, 1, nbits)
> 
> n.b. argument `count` is being supplied as `nbits` and argument `nbits` is supplied as 1 (yeah, naming is hard). These calls serve to insert anywhere between 1 and 32 copies of a single bit into the rightmost portion of the result.
> 
> It is also called like this:
> 
>     for (int i = 0; i < 6; i++) {
>       int nbits = 1 << i;
>       . . .
>           replicate(and_bits_top, 2 * nbits, 32 / nbits)
> 
> This is used to replicate a bit pattern taken from the lower end of the first input across the whole of a uint64_t result. For these calls the input arguments `count` and `nbits` satisfy the following invariants:
> 
>     nbits | 64
>     count | 64
>     nbits * count == 64
>     2 <= nbits <= 32
>     32 >= count >= 2
> 
> I am not actually clear why the caller is calling `replicate` in the way it does. The algorithm is way more complicated than it actually needs to be. The basic idea is to insert imms 1s into a 2^k bit field, right rotate them by immr and then replicate the result across the full 64 bits. There's a few wrinkles which are explained here for those who are interested:
> 
> expandLogicalImmediate(uint32_t immN, uint32_t immr, uint32_t imms, uint64_t& bimm)
> 
> -  outputs a bit pattern of width 64 by replicating a bit field of width 2^k and returns 1 or fails and returns 0
> 
> - when immN is 1 then k is 6 and immr/imms are masked to 6 bit integers.
> - when immN is 0 then k is the count of the first 0 bit in imms and immr/imms are masked to k-bit integers (i.e. leading 1s in imms determine dead bits of imms and immr)
> 
> A k value of 0 (i.e. immN == 0 and imms = all 1s) fails and returns 0
> 
> After masking
> 
> An imms value of all 1s (i.e. 2^k - 1) fails and returns 0 (note given the previous check we only get here when  immN == 1)
> 
> The bit field can now be constructed as follows:
> 
>  - imms + 1 specifies the number of 1s to insert into the 2^k-field (note that imms + 1 < 2^k)
>  - immr counts how far to right rotate the 2^k-field
> 
> The resulting 2^k-field is then replicated across the full 64-bit result.

@adinn - Thanks for chiming in on this review thread.

I'm reading and re-reading your comments and it will take me a while
to put it all in proper context.

I do agree with one of your points:

> `replicate` is never actually called with `count == 0`

so that means that this:

    `assert(count <= 1, "must be");`

is a little too loose and should change to:

    `assert(count == 1, "must be");`

if we're going to keep it at all. I think we should keep it, but
I can be convinced otherwise since the original code had no
similar `assert()`.

@kimbarrett - I've re-read your comments and I'm not clear
on whether you think that `result <<= nbits` is UB when
`nbits == 64`. Can you please clarify?

Of course, it's not helping matters that I can't reproduce the
failure mode reported by Apple and we're getting no response
about our request for version info.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7270