RFR: 8282541: AArch64: Auto-vectorize Math.round API
Andrew Haley
aph at openjdk.java.net
Tue Apr 12 14:23:23 UTC 2022
Before, Apple M1:
+-----------------------------------------+---------------------------------+
|Benchmark | (TESTSIZE) Mode Score Units|
+-----------------------------------------+---------------------------------+
|FpRoundingBenchmark.test_round_double | 1024 thrpt 1612.391 ops/ms|
|FpRoundingBenchmark.test_round_double | 2048 thrpt 804.291 ops/ms|
|FpRoundingBenchmark.test_round_float | 1024 thrpt 1558.202 ops/ms|
|FpRoundingBenchmark.test_round_float | 2048 thrpt 775.730 ops/ms|
+------------------------------------------+--------------------------------+
After:
+-----------------------------------------+----------------------------------+
|Benchmark | (TESTSIZE) Mode Score Units|
+-----------------------------------------+----------------------------------+
|FpRoundingBenchmark.test_round_double | 1024 thrpt 2720.153 ops/ms|
|FpRoundingBenchmark.test_round_double | 2048 thrpt 1371.750 ops/ms|
|FpRoundingBenchmark.test_round_float | 1024 thrpt 5940.263 ops/ms|
|FpRoundingBenchmark.test_round_float | 2048 thrpt 3036.201 ops/ms|
+-----------------------------------------+----------------------------------+
About the algorithm:
`Math.round()` is tricky. Its specification corresponds to no standard
rounding mode: it "returns the closest long to the argument, with ties
rounding to positive infinity." For positive inputs this is the same
as IEEE-754's `convertToIntegerTiesToAway` operation, which rounds
away from zero, but there's no equivalent for negative inputs.
`Math.round()` used simply to add 0.5 and convert to integer by taking
the floor of the result, but that wasn't right because it suffers from
double rounding. This breaks several cases, in particular because
`0.4999999... (+) 0.5 == 1.0`
(Here, `(+)` indicates an addition followed by roundTiesToEven.)
There is no corresponding problem with `-0.4999999...` or `-0.9999999...`
Also, in the 32-bit `float` case,
`8388609 (+) 0.5 == 8388610`
because 8388609 (0x1.000002p+23) as a 32-bit integer has no fraction
bits, so adding 0.5, followed by roundTiesToEven, rounds upwards. This
problem manifests for every odd integer within the binade from
0x1.000002p+23 to 0x1.fffffep+23, whether positive or negative. There
is a corresponding problem for the `double` range.
The patch for JDK-8279508 handles this by flipping the floating-point
rounding mode to roundTowardNegative, adding 0.5, and taking the
floor. While this does work on AArch64, the performance is
tragic. AArch64 implementations seem to wait for all instructions in
flight to retire, change the rounding mode, and do the operation. This
effectively serializes the entire thread.
This patch takes a different approach. Firstly, we can observe that we
can use the `frinta` instruction for the entire positive range. The
negative range is a bit trickier, but we can observe that any x,
abs{x) >= -0x1.000000p+23, has no fractional bits so it must be an
integer. For convenence, we can convert that range with the `frinta`
instruction as well.
All that remains are x < 0, abs{x) < -0x1.000000p+23. Adding 0.5
followed by roundTiesToEven doesn't lead to a problem because for
x < 0 && abs{x) >= 0.5, adding 0.5 only reduces the magnitude of x;
for all x < 0 && abs{x) < 0.5, adding 0.5 followed by roundTiesToEven
return 0.
-------------
Commit messages:
- Delete dead code
- Untabify
- Fix assertion
- 8282541: AArch64: Auto-vectorize Math.round API
- Cleanup
- Cleanup
- Rebase
- Rebase
Changes: https://git.openjdk.java.net/jdk/pull/8204/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8204&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8282541
Stats: 1054 lines in 13 files changed: 489 ins; 9 del; 556 mod
Patch: https://git.openjdk.java.net/jdk/pull/8204.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/8204/head:pull/8204
PR: https://git.openjdk.java.net/jdk/pull/8204
More information about the hotspot-dev
mailing list