RFR: 8287925: AArch64: intrinsics for compareUnsigned method in Integer and Long

Quan Anh Mai qamai at openjdk.org
Mon Dec 5 09:17:05 UTC 2022


On Mon, 5 Dec 2022 07:30:08 GMT, Hao Sun <haosun at openjdk.org> wrote:

>> The motivation for these intrinsics, aside from unsigned comparison being a fairly basic operation, is for range checks of a load/store, which has the form of `0 <= offset && offset <= length - size`, by transforming this into `0 <= length - size && offset u<= length - size`, the first comparison as well as the computation of `length - size` can be hoisted out of the loop, which results in a single operation being loop varying. The reason this may not be recognised effectively by the idealiser is that if `size` is a constant, `(length - size) + MIN_VALUE` being folded into `length + (MIN_VALUE - size)`, breaks the pattern.
>> 
>> If we simply want to throw an exception in out-of-bound cases, then `Precondition::checkIndex` may suffice. This however may not be adequate if:
>> 
>> - We want to do something else. If the hardware does not support masked load, currently we do a load followed by a blend if the whole vector is inbound and fall back out of intrinsic otherwise.
>> - The bound is not provably loop-invariant, and not obviously non-negative. This may arise in `ArrayList` accesses, where bound checks are performed against the `size` field, which may need to be reloaded on each iteration and not obviously nonnegative to the compiler.
>> 
>> IMO the direct result of the method is less important, because the contract does not have any promise with respect to the exact return value, and the only thing that can be done with it is to compare it with 0, which will certainly be folded into a `CmpU` node.
>> 
>> @shqking I think your benchmark is not good, as the error is higher than the actual difference between samples, and the cost of the division may swallow everything inside the loop.
>> 
>> Thanks a lot.
>
>> The motivation for these intrinsics, aside from unsigned comparison being a fairly basic operation, is for range checks of a load/store, which has the form of `0 <= offset && offset <= length - size`, by transforming this into `0 <= length - size && offset u<= length - size`, the first comparison as well as the computation of `length - size` can be hoisted out of the loop, which results in a single operation being loop varying. The reason this may not be recognised effectively by the idealiser is that if `size` is a constant, `(length - size) + MIN_VALUE` being folded into `length + (MIN_VALUE - size)`, breaks the pattern.
>> 
> @merykitty Thanks for your explanation. 
> But I'm afraid I didn't fully get your point.
> I think `offset u<= length - size` will be matched with `CmpU` node, rather than `CmpU3` node, right?
> 
>> @shqking I think your benchmark is not good, as the error is higher than the actual difference between samples, and the cost of the division may swallow everything inside the loop.
>>
> How about the case as suggested by aph, i.e. shown in the previous comment?
> 
> Thanks~

@shqking

> But I'm afraid I didn't fully get your point.
I think offset u<= length - size will be matched with CmpU node, rather than CmpU3 node, right?

`CmpU3` serves the same purpose as `CmpL3` or `CmpD3`, that is to be an intermediate node that gets elided immediately. For example, this piece of code:

    static int square(long x, long y) {
        return x > y ? 1 : 0;
    }

gets compiled into:

    static int square(long, long);
        0: lload_0
        1: lload_2
        2: lcmp
        3: ifle          10
        6: iconst_1
        7: goto          11
       10: iconst_0
       11: ireturn

`lcmp` is parsed into a `CmpL3`, so the comparison part is parsed as `CmpI (CmpL3 x y) 0`, which is then transformed into `CmpL x y`.

Similarly, `Integer.compareUnsigned(x, y) > 0` is parsed as `CmpI (CmpU3 x y) 0`, which is then transformed into `CmpU x y`.

> How about the case as suggested by aph, i.e. shown in the previous comment?

I think it is better but I still think measuring the performance of `CmpU3` node is quite irrelevent.

Thanks.

-------------

PR: https://git.openjdk.org/jdk/pull/11383


More information about the hotspot-compiler-dev mailing list