RFR: 8287925: AArch64: intrinsics for compareUnsigned method in Integer and Long

Quan Anh Mai qamai at openjdk.org
Thu Dec 1 17:33:38 UTC 2022


On Mon, 28 Nov 2022 02:31:25 GMT, Hao Sun <haosun at openjdk.org> wrote:

> x86 implemented the intrinsics for compareUnsigned() method in Integer and Long. See JDK-8283726. We add the corresponding AArch64 backend support in this patch.
> 
> Note-1: minor style issues are fixed for CmpL3 related rules.
> 
> Note-2: Jtreg case TestCompareUnsigned.java is updated to cover the matching rules for "comparing reg with imm" case.
> 
> Testing: tier1~3 passed on Linux/AArch64 platform with no new failures.
> 
> Following is the performance data for the JMH case:
> 
> 
>                                                        Before          After
> Benchmark                         (size) Mode  Cnt   Score   Error  Score   Error  Units
> Integers.compareUnsignedDirect      500  avgt    5   0.994 ± 0.001  0.872 ± 0.015  us/op
> Integers.compareUnsignedIndirect    500  avgt    5   0.991 ± 0.001  0.833 ± 0.055  us/op
> Longs.compareUnsignedDirect         500  avgt    5   1.052 ± 0.001  0.974 ± 0.057  us/op
> Longs.compareUnsignedIndirect       500  avgt    5   1.053 ± 0.001  0.916 ± 0.038  us/op

The motivation for these intrinsics, aside from unsigned comparison being a fairly basic operation, is for range checks of a load/store, which has the form of `0 <= offset && offset <= length - size`, by transforming this into `0 <= length - size && offset u<= length - size`, the first comparison as well as the computation of `length - size` can be hoisted out of the loop, which results in a single operation being loop varying. The reason this may not be recognised effectively by the idealiser is that if `size` is a constant, `(length - size) + MIN_VALUE` being folded into `length + (MIN_VALUE - size)`, breaks the pattern.

If we simply want to throw an exception in out-of-bound cases, then `Precondition::checkIndex` may suffice. This however may not be adequate if:

- We want to do something else. If the hardware does not support masked load, currently we do a load followed by a blend if the whole vector is inbound and fall back out of intrinsic otherwise.
- The bound is not provably loop-invariant, and not obviously non-negative. This may arise in `ArrayList` accesses, where bound checks are performed against the `size` field, which may need to be reloaded on each iteration and not obviously nonnegative to the compiler.

IMO the direct result of the method is less important, because the contract does not have any promise with respect to the exact return value, and the only thing that can be done with it is to compare it with 0, which will certainly be folded into a `CmpU` node.

@shqking I think your benchmark is not good, as the error is higher than the actual difference between samples, and the cost of the division may swallow everything inside the loop.

Thanks a lot.

-------------

PR: https://git.openjdk.org/jdk/pull/11383


More information about the hotspot-compiler-dev mailing list