RFR: 8282541: AArch64: Auto-vectorize Math.round API
Nick Gasson
ngasson at openjdk.java.net
Wed Apr 13 10:14:15 UTC 2022
On Tue, 12 Apr 2022 13:26:02 GMT, Andrew Haley <aph at openjdk.org> wrote:
> Before, Apple M1:
>
> +-----------------------------------------+---------------------------------+
> |Benchmark | (TESTSIZE) Mode Score Units|
> +-----------------------------------------+---------------------------------+
> |FpRoundingBenchmark.test_round_double | 1024 thrpt 1612.391 ops/ms|
> |FpRoundingBenchmark.test_round_double | 2048 thrpt 804.291 ops/ms|
> |FpRoundingBenchmark.test_round_float | 1024 thrpt 1558.202 ops/ms|
> |FpRoundingBenchmark.test_round_float | 2048 thrpt 775.730 ops/ms|
> +------------------------------------------+--------------------------------+
>
> After:
>
> +-----------------------------------------+----------------------------------+
> |Benchmark | (TESTSIZE) Mode Score Units|
> +-----------------------------------------+----------------------------------+
> |FpRoundingBenchmark.test_round_double | 1024 thrpt 2720.153 ops/ms|
> |FpRoundingBenchmark.test_round_double | 2048 thrpt 1371.750 ops/ms|
> |FpRoundingBenchmark.test_round_float | 1024 thrpt 5940.263 ops/ms|
> |FpRoundingBenchmark.test_round_float | 2048 thrpt 3036.201 ops/ms|
> +-----------------------------------------+----------------------------------+
>
> About the algorithm:
>
> `Math.round()` is tricky. Its specification corresponds to no standard
> rounding mode: it "returns the closest long to the argument, with ties
> rounding to positive infinity." For positive inputs this is the same
> as IEEE-754's `convertToIntegerTiesToAway` operation, which rounds
> away from zero, but there's no equivalent for negative inputs.
>
> `Math.round()` used simply to add 0.5 and convert to integer by taking
> the floor of the result, but that wasn't right because it suffers from
> double rounding. This breaks several cases, in particular because
>
> `0.4999999... (+) 0.5 == 1.0`
>
> (Here, `(+)` indicates an addition followed by roundTiesToEven.)
>
> There is no corresponding problem with `-0.4999999...` or `-0.9999999...`
>
> Also, in the 32-bit `float` case,
>
> `8388609 (+) 0.5 == 8388610`
>
> because 8388609 (0x1.000002p+23) as a 32-bit integer has no fraction
> bits, so adding 0.5, followed by roundTiesToEven, rounds upwards. This
> problem manifests for every odd integer within the binade from
> 0x1.000002p+23 to 0x1.fffffep+23, whether positive or negative. There
> is a corresponding problem for the `double` range.
>
> The patch for JDK-8279508 handles this by flipping the floating-point
> rounding mode to roundTowardNegative, adding 0.5, and taking the
> floor. While this does work on AArch64, the performance is
> tragic. AArch64 implementations seem to wait for all instructions in
> flight to retire, change the rounding mode, and do the operation. This
> effectively serializes the entire thread.
>
> This patch takes a different approach. Firstly, we can observe that we
> can use the `frinta` instruction for the entire positive range. The
> negative range is a bit trickier, but we can observe that any x,
> abs{x) >= -0x1.000000p+23, has no fractional bits so it must be an
> integer. For convenence, we can convert that range with the `frinta`
> instruction as well.
>
> All that remains are x < 0, abs{x) < -0x1.000000p+23. Adding 0.5
> followed by roundTiesToEven doesn't lead to a problem because for
> x < 0 && abs{x) >= 0.5, adding 0.5 only reduces the magnitude of x;
> for all x < 0 && abs{x) < 0.5, adding 0.5 followed by roundTiesToEven
> return 0.
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5198:
> 5196: fcvtasd(dst, src);
> 5197: // Test if src >= 0 || abs(src) >= 0x1.0p52
> 5198: eor(rscratch1, rscratch1, 1ul << 63); // flip sign bit
This doesn't compile on Windows AArch64:
d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\macroAssembler_aarch64.cpp(5198): error C2220: the following warning is treated as an error
d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\macroAssembler_aarch64.cpp(5198): warning C4293: '<<': shift count negative or too big, undefined behavior
Windows is LLP64 isn't it? So you probably want 1ull or `UCONST64(1)` here.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8204
More information about the hotspot-dev
mailing list