RFR: 8280476: [macOS] : hotspot arm64 bug exposed by latest clang
Andrew Dinn
adinn at openjdk.java.net
Wed Feb 2 16:13:14 UTC 2022
On Tue, 1 Feb 2022 16:23:22 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> @kimbarrett - Thanks for the review! Good catch! I believe you are correct,
>> but let's wait for @adinn to chime in here.
>
> Returning 0 if count == 0 and bits otherwise is consistent with the old code,
> assuming the problematic shift doesn't cause UB data corruption or something
> otherwise strange. It might do the "mathematically correct" thing of shifting
> out all the bits (so zeroing the value). Another possibility is that the
> implemented shift amount for N is N mod word-size (some platforms do that, but
> I haven't looked up aarch64), which in this case is 64 mod 64, or 0. I don't
> think there are any other non-strange non-UB-data-corruption possibilities.
>
>
> result is initially 0.
> with the old code:
> - if count == 0, there are no iterations, and final result is still 0.
> - if count == 1, there is one iteration.
> result <<= nbits
> -- if UB corruption, all bets are off
> -- if shifts by 64, result is unchanged (== 0)
> -- if shifts by 64 mod 64 (== 0), result is unchanged (== 0)
> result |= (bits & mask)
> -- result == bits
> - if count > 1, for additional iterations
> result <<= nbits
> -- if UB corruption, all bets are off
> -- if shifts by 64, result == 0
> -- if shifts by 64 mod 64 (== 0), result is unchanged (== bits)
> result |= (bits & mask)
> -- result == bits
>
> So with old code, for count > 0, result == bits (or UB corrupted data).
First thing to note is that `replicate` is never actually called with `count == 0`. It's questionable whether it ever should be (especially as this function is really only meant to be used locally to this code) but logically I think the result ought to be 0 if it ever does get called that way.
What `replicate` is meant to do is replicate the bottom `nbits` bits of the `uint64_t` passed via argument `bits` across successive subfields of `result` up to `(count * bits)` worth of bits (sorry ... the names are somewhat inconveniently chosen as far as this discussion is concerned). So, anyway, a count of zero ought to insert 0 bits giving a zero result.
As to how it is used, `replicate` is only called from method `expandLogicalImmediate` with two alternative calling patterns. It is sometimes called like this
for (int i = 0; i < 6; i++) {
int nbits = 1 << i;
. . .
replicate(and_bit, 1, nbits)
n.b. argument `count` is being supplied as `nbits` and argument `nbits` is supplied as 1 (yeah, naming is hard). These calls serve to insert anywhere between 1 and 32 copies of a single bit into the rightmost portion of the result.
It is also called like this:
for (int i = 0; i < 6; i++) {
int nbits = 1 << i;
. . .
replicate(and_bits_top, 2 * nbits, 32 / nbits)
This is used to replicate a bit pattern taken from the lower end of the first input across the whole of a uint64_t result. For these calls the input arguments `count` and `nbits` satisfy the following invariants:
nbits | 64
count | 64
nbits * count == 64
2 <= nbits <= 32
32 >= count >= 2
I am not actually clear why the caller is calling `replicate` in the way it does. The algorithm is way more complicated than it actually needs to be. The basic idea is to insert imms 1s into a 2^k bit field, right rotate them by immr and then replicate the result across the full 64 bits. There's a few wrinkles which are explained here for those who are interested:
expandLogicalImmediate(uint32_t immN, uint32_t immr, uint32_t imms, uint64_t& bimm)
- outputs a bit pattern of width 64 by replicating a bit field of width 2^k and returns 1 or fails and returns 0
- when immN is 1 then k is 6 and immr/imms are masked to 6 bit integers.
- when immN is 0 then k is the count of the first 0 bit in imms and immr/imms are masked to k-bit integers (i.e. leading 1s in imms determine dead bits of imms and immr)
A k value of 0 (i.e. immN == 0 and imms = all 1s) fails and returns 0
After masking
An imms value of all 1s (i.e. 2^k - 1) fails and returns 0 (note given the previous check we only get here when immN == 1)
The bit field can now be constructed as follows:
- imms + 1 specifies the number of 1s to insert into the 2^k-field (note that imms + 1 < 2^k)
- immr counts how far to right rotate the 2^k-field
The resulting 2^k-field is then replicated across the full 64-bit result.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7270
More information about the hotspot-compiler-dev
mailing list