RFR: 8280476: [macOS] : hotspot arm64 bug exposed by latest clang
Daniel D.Daugherty
dcubed at openjdk.java.net
Wed Feb 2 20:14:12 UTC 2022
On Wed, 2 Feb 2022 16:09:50 GMT, Andrew Dinn <adinn at openjdk.org> wrote:
>> Returning 0 if count == 0 and bits otherwise is consistent with the old code,
>> assuming the problematic shift doesn't cause UB data corruption or something
>> otherwise strange. It might do the "mathematically correct" thing of shifting
>> out all the bits (so zeroing the value). Another possibility is that the
>> implemented shift amount for N is N mod word-size (some platforms do that, but
>> I haven't looked up aarch64), which in this case is 64 mod 64, or 0. I don't
>> think there are any other non-strange non-UB-data-corruption possibilities.
>>
>>
>> result is initially 0.
>> with the old code:
>> - if count == 0, there are no iterations, and final result is still 0.
>> - if count == 1, there is one iteration.
>> result <<= nbits
>> -- if UB corruption, all bets are off
>> -- if shifts by 64, result is unchanged (== 0)
>> -- if shifts by 64 mod 64 (== 0), result is unchanged (== 0)
>> result |= (bits & mask)
>> -- result == bits
>> - if count > 1, for additional iterations
>> result <<= nbits
>> -- if UB corruption, all bets are off
>> -- if shifts by 64, result == 0
>> -- if shifts by 64 mod 64 (== 0), result is unchanged (== bits)
>> result |= (bits & mask)
>> -- result == bits
>>
>> So with old code, for count > 0, result == bits (or UB corrupted data).
>
> First thing to note is that `replicate` is never actually called with `count == 0`. It's questionable whether it ever should be (especially as this function is really only meant to be used locally to this code) but logically I think the result ought to be 0 if it ever does get called that way.
>
> What `replicate` is meant to do is replicate the bottom `nbits` bits of the `uint64_t` passed via argument `bits` across successive subfields of `result` up to `(count * bits)` worth of bits (sorry ... the names are somewhat inconveniently chosen as far as this discussion is concerned). So, anyway, a count of zero ought to insert 0 bits giving a zero result.
>
> As to how it is used, `replicate` is only called from method `expandLogicalImmediate` with two alternative calling patterns. It is sometimes called like this
>
> for (int i = 0; i < 6; i++) {
> int nbits = 1 << i;
> . . .
> replicate(and_bit, 1, nbits)
>
> n.b. argument `count` is being supplied as `nbits` and argument `nbits` is supplied as 1 (yeah, naming is hard). These calls serve to insert anywhere between 1 and 32 copies of a single bit into the rightmost portion of the result.
>
> It is also called like this:
>
> for (int i = 0; i < 6; i++) {
> int nbits = 1 << i;
> . . .
> replicate(and_bits_top, 2 * nbits, 32 / nbits)
>
> This is used to replicate a bit pattern taken from the lower end of the first input across the whole of a uint64_t result. For these calls the input arguments `count` and `nbits` satisfy the following invariants:
>
> nbits | 64
> count | 64
> nbits * count == 64
> 2 <= nbits <= 32
> 32 >= count >= 2
>
> I am not actually clear why the caller is calling `replicate` in the way it does. The algorithm is way more complicated than it actually needs to be. The basic idea is to insert imms 1s into a 2^k bit field, right rotate them by immr and then replicate the result across the full 64 bits. There's a few wrinkles which are explained here for those who are interested:
>
> expandLogicalImmediate(uint32_t immN, uint32_t immr, uint32_t imms, uint64_t& bimm)
>
> - outputs a bit pattern of width 64 by replicating a bit field of width 2^k and returns 1 or fails and returns 0
>
> - when immN is 1 then k is 6 and immr/imms are masked to 6 bit integers.
> - when immN is 0 then k is the count of the first 0 bit in imms and immr/imms are masked to k-bit integers (i.e. leading 1s in imms determine dead bits of imms and immr)
>
> A k value of 0 (i.e. immN == 0 and imms = all 1s) fails and returns 0
>
> After masking
>
> An imms value of all 1s (i.e. 2^k - 1) fails and returns 0 (note given the previous check we only get here when immN == 1)
>
> The bit field can now be constructed as follows:
>
> - imms + 1 specifies the number of 1s to insert into the 2^k-field (note that imms + 1 < 2^k)
> - immr counts how far to right rotate the 2^k-field
>
> The resulting 2^k-field is then replicated across the full 64-bit result.
@adinn - Thanks for chiming in on this review thread.
I'm reading and re-reading your comments and it will take me a while
to put it all in proper context.
I do agree with one of your points:
> `replicate` is never actually called with `count == 0`
so that means that this:
`assert(count <= 1, "must be");`
is a little too loose and should change to:
`assert(count == 1, "must be");`
if we're going to keep it at all. I think we should keep it, but
I can be convinced otherwise since the original code had no
similar `assert()`.
@kimbarrett - I've re-read your comments and I'm not clear
on whether you think that `result <<= nbits` is UB when
`nbits == 64`. Can you please clarify?
Of course, it's not helping matters that I can't reproduce the
failure mode reported by Apple and we're getting no response
about our request for version info.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7270
More information about the hotspot-compiler-dev
mailing list