RFR: 8271128: InlineIntrinsics support for 32-bit ARM [v4]

Fri Aug 6 08:49:50 UTC 2021

> Hi,
> 
> please review this patch, which adds support for InlineIntrinsics to the 32-bit ARM port. The old aarch32 port had this intrinsic implemented and enabled by default.
> 
> Like on many other platforms, the 32-bit ARM port simply calls into the `SharedRuntime` to intrinsify the basic `java.lang.Math` methods. InlineIntrinsics is already implemented for C1 on 32-bit ARM, which does the same thing.
> 
> testing: hotspot tier1 on ARMv5TE (soft-float) and ARMv7-A (hard-float)
> 
> There is already the micro benchmark `test/micro/org/openjdk/bench/java/lang/MathBench.java` which I used. The soft-float benchmarks are not that meaningful, since I performed them in QEMU.
> 
> __hard-float__ `-Xint -XX:+InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 | 1169.574 | +/- 133.694 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  759.902 | +/- 573.852 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |  854.753 | +/-  67.217 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  902.034 | +/-  22.413 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  895.470 | +/- 113.811 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |  936.136 | +/-  40.661 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  864.670 | +/-  68.329 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 | 1082.589 | +/-  92.570 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  853.715 | +/- 122.427 | ops/ms |
> 
> __hard-float__ `-Xint -XX:-InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 |  450.907 | +/-  10.402 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  592.242 | +/-  14.011 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |  167.614 | +/-   7.530 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  572.099 | +/-  55.089 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  596.588 | +/-  24.976 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |  212.673 | +/-   4.060 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  584.873 | +/-  42.774 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  514.690 | +/-  30.568 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  566.586 | +/-  23.995 | ops/ms |
> 
> __soft-float__ `-Xint -XX:+InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 |  279.575 | +/-  56.455 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  137.005 | +/-  72.561 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |  117.778 | +/-  30.186 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  107.957 | +/-  10.158 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  101.341 | +/-   3.914 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |  222.220 | +/-   3.854 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  112.715 | +/-   9.088 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  119.341 | +/-  76.528 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  105.224 | +/-  30.477 | ops/ms |
> 
> __soft-float__ `-Xint -XX:-InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 |  173.150 | +/-  36.279 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  129.774 | +/-   8.795 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |   53.524 | +/-   1.679 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  132.503 | +/-   4.274 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  135.483 | +/-   1.150 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |   54.266 | +/-   0.699 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  105.636 | +/-   4.647 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  204.550 | +/-   7.206 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  101.072 | +/-   3.701 | ops/ms |

Christoph Göttschkes has updated the pull request incrementally with one additional commit since the last revision:

  Rename use_runtime_function -> use_runtime_call

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/4927/files
  - new: https://git.openjdk.java.net/jdk/pull/4927/files/3c20c980..6de922f8

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4927&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4927&range=02-03

  Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/4927.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/4927/head:pull/4927

PR: https://git.openjdk.java.net/jdk/pull/4927