RFR: 8282541: AArch64: Auto-vectorize Math.round API

Tue Apr 12 14:23:23 UTC 2022

Before, Apple M1:

+-----------------------------------------+---------------------------------+
|Benchmark                                | (TESTSIZE) Mode     Score  Units|
+-----------------------------------------+---------------------------------+
|FpRoundingBenchmark.test_round_double    |   1024  thrpt    1612.391 ops/ms|
|FpRoundingBenchmark.test_round_double    |   2048  thrpt     804.291 ops/ms|
|FpRoundingBenchmark.test_round_float     |   1024  thrpt    1558.202 ops/ms|
|FpRoundingBenchmark.test_round_float     |   2048  thrpt     775.730 ops/ms|
+------------------------------------------+--------------------------------+

After:

+-----------------------------------------+----------------------------------+
|Benchmark                                | (TESTSIZE) Mode      Score  Units|
+-----------------------------------------+----------------------------------+
|FpRoundingBenchmark.test_round_double    |    1024  thrpt   2720.153  ops/ms|
|FpRoundingBenchmark.test_round_double    |    2048  thrpt   1371.750  ops/ms|
|FpRoundingBenchmark.test_round_float     |    1024  thrpt   5940.263  ops/ms|
|FpRoundingBenchmark.test_round_float     |    2048  thrpt   3036.201  ops/ms|
+-----------------------------------------+----------------------------------+

About the algorithm:

`Math.round()` is tricky. Its specification corresponds to no standard
rounding mode: it "returns the closest long to the argument, with ties
rounding to positive infinity." For positive inputs this is the same
as IEEE-754's `convertToIntegerTiesToAway` operation, which rounds
away from zero, but there's no equivalent for negative inputs.

`Math.round()` used simply to add 0.5 and convert to integer by taking
the floor of the result, but that wasn't right because it suffers from
double rounding. This breaks several cases, in particular because

 `0.4999999... (+) 0.5 == 1.0`

 (Here, `(+)` indicates an addition followed by roundTiesToEven.)

There is no corresponding problem with `-0.4999999...` or `-0.9999999...`

Also, in the 32-bit `float` case,

  `8388609 (+) 0.5 == 8388610`

because 8388609 (0x1.000002p+23) as a 32-bit integer has no fraction
bits, so adding 0.5, followed by roundTiesToEven, rounds upwards. This
problem manifests for every odd integer within the binade from
0x1.000002p+23 to 0x1.fffffep+23, whether positive or negative. There
is a corresponding problem for the `double` range.

The patch for JDK-8279508 handles this by flipping the floating-point
rounding mode to roundTowardNegative, adding 0.5, and taking the
floor. While this does work on AArch64, the performance is
tragic. AArch64 implementations seem to wait for all instructions in
flight to retire, change the rounding mode, and do the operation. This
effectively serializes the entire thread.

This patch takes a different approach. Firstly, we can observe that we
can use the `frinta` instruction for the entire positive range. The
negative range is a bit trickier, but we can observe that any x,
abs{x) >= -0x1.000000p+23, has no fractional bits so it must be an
integer. For convenence, we can convert that range with the `frinta`
instruction as well.

All that remains are x < 0, abs{x) < -0x1.000000p+23. Adding 0.5
followed by roundTiesToEven doesn't lead to a problem because for
x < 0 && abs{x) >= 0.5, adding 0.5 only reduces the magnitude of x;
for all x < 0 && abs{x) < 0.5, adding 0.5 followed by roundTiesToEven
return 0.

-------------

Commit messages:
 - Delete dead code
 - Untabify
 - Fix assertion
 - 8282541: AArch64: Auto-vectorize Math.round API
 - Cleanup
 - Cleanup
 - Rebase
 - Rebase

Changes: https://git.openjdk.java.net/jdk/pull/8204/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8204&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8282541
  Stats: 1054 lines in 13 files changed: 489 ins; 9 del; 556 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8204.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8204/head:pull/8204

PR: https://git.openjdk.java.net/jdk/pull/8204