RFR: 8287925: AArch64: intrinsics for compareUnsigned method in Integer and Long
Quan Anh Mai
qamai at openjdk.org
Thu Dec 1 17:33:38 UTC 2022
On Mon, 28 Nov 2022 02:31:25 GMT, Hao Sun <haosun at openjdk.org> wrote:
> x86 implemented the intrinsics for compareUnsigned() method in Integer and Long. See JDK-8283726. We add the corresponding AArch64 backend support in this patch.
>
> Note-1: minor style issues are fixed for CmpL3 related rules.
>
> Note-2: Jtreg case TestCompareUnsigned.java is updated to cover the matching rules for "comparing reg with imm" case.
>
> Testing: tier1~3 passed on Linux/AArch64 platform with no new failures.
>
> Following is the performance data for the JMH case:
>
>
> Before After
> Benchmark (size) Mode Cnt Score Error Score Error Units
> Integers.compareUnsignedDirect 500 avgt 5 0.994 ± 0.001 0.872 ± 0.015 us/op
> Integers.compareUnsignedIndirect 500 avgt 5 0.991 ± 0.001 0.833 ± 0.055 us/op
> Longs.compareUnsignedDirect 500 avgt 5 1.052 ± 0.001 0.974 ± 0.057 us/op
> Longs.compareUnsignedIndirect 500 avgt 5 1.053 ± 0.001 0.916 ± 0.038 us/op
The motivation for these intrinsics, aside from unsigned comparison being a fairly basic operation, is for range checks of a load/store, which has the form of `0 <= offset && offset <= length - size`, by transforming this into `0 <= length - size && offset u<= length - size`, the first comparison as well as the computation of `length - size` can be hoisted out of the loop, which results in a single operation being loop varying. The reason this may not be recognised effectively by the idealiser is that if `size` is a constant, `(length - size) + MIN_VALUE` being folded into `length + (MIN_VALUE - size)`, breaks the pattern.
If we simply want to throw an exception in out-of-bound cases, then `Precondition::checkIndex` may suffice. This however may not be adequate if:
- We want to do something else. If the hardware does not support masked load, currently we do a load followed by a blend if the whole vector is inbound and fall back out of intrinsic otherwise.
- The bound is not provably loop-invariant, and not obviously non-negative. This may arise in `ArrayList` accesses, where bound checks are performed against the `size` field, which may need to be reloaded on each iteration and not obviously nonnegative to the compiler.
IMO the direct result of the method is less important, because the contract does not have any promise with respect to the exact return value, and the only thing that can be done with it is to compare it with 0, which will certainly be folded into a `CmpU` node.
@shqking I think your benchmark is not good, as the error is higher than the actual difference between samples, and the cost of the division may swallow everything inside the loop.
Thanks a lot.
-------------
PR: https://git.openjdk.org/jdk/pull/11383
More information about the hotspot-compiler-dev
mailing list