RFR: 8271128: InlineIntrinsics support for 32-bit ARM [v4]

Fri Aug 6 10:26:32 UTC 2021

On Fri, 6 Aug 2021 08:49:50 GMT, Christoph Göttschkes <cgo at openjdk.org> wrote:

>> Hi,
>> 
>> please review this patch, which adds support for InlineIntrinsics to the 32-bit ARM port. The old aarch32 port had this intrinsic implemented and enabled by default.
>> 
>> Like on many other platforms, the 32-bit ARM port simply calls into the `SharedRuntime` to intrinsify the basic `java.lang.Math` methods. InlineIntrinsics is already implemented for C1 on 32-bit ARM, which does the same thing.
>> 
>> testing: hotspot tier1 on ARMv5TE (soft-float) and ARMv7-A (hard-float)
>> 
>> There is already the micro benchmark `test/micro/org/openjdk/bench/java/lang/MathBench.java` which I used. The soft-float benchmarks are not that meaningful, since I performed them in QEMU.
>> 
>> __hard-float__ `-Xint -XX:+InlineIntrinsics`
>> 
>> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
>> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
>> | MathBench.absDouble           |      0 | thrpt |   5 | 1169.574 | +/- 133.694 | ops/ms |
>> | MathBench.cosDouble           |      0 | thrpt |   5 |  759.902 | +/- 573.852 | ops/ms |
>> | MathBench.expDouble           |      0 | thrpt |   5 |  854.753 | +/-  67.217 | ops/ms |
>> | MathBench.log10Double         |      0 | thrpt |   5 |  902.034 | +/-  22.413 | ops/ms |
>> | MathBench.logDouble           |      0 | thrpt |   5 |  895.470 | +/- 113.811 | ops/ms |
>> | MathBench.powDouble           |      0 | thrpt |   5 |  936.136 | +/-  40.661 | ops/ms |
>> | MathBench.sinDouble           |      0 | thrpt |   5 |  864.670 | +/-  68.329 | ops/ms |
>> | MathBench.sqrtDouble          |      0 | thrpt |   5 | 1082.589 | +/-  92.570 | ops/ms |
>> | MathBench.tanDouble           |      0 | thrpt |   5 |  853.715 | +/- 122.427 | ops/ms |
>> 
>> __hard-float__ `-Xint -XX:-InlineIntrinsics`
>> 
>> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
>> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
>> | MathBench.absDouble           |      0 | thrpt |   5 |  450.907 | +/-  10.402 | ops/ms |
>> | MathBench.cosDouble           |      0 | thrpt |   5 |  592.242 | +/-  14.011 | ops/ms |
>> | MathBench.expDouble           |      0 | thrpt |   5 |  167.614 | +/-   7.530 | ops/ms |
>> | MathBench.log10Double         |      0 | thrpt |   5 |  572.099 | +/-  55.089 | ops/ms |
>> | MathBench.logDouble           |      0 | thrpt |   5 |  596.588 | +/-  24.976 | ops/ms |
>> | MathBench.powDouble           |      0 | thrpt |   5 |  212.673 | +/-   4.060 | ops/ms |
>> | MathBench.sinDouble           |      0 | thrpt |   5 |  584.873 | +/-  42.774 | ops/ms |
>> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  514.690 | +/-  30.568 | ops/ms |
>> | MathBench.tanDouble           |      0 | thrpt |   5 |  566.586 | +/-  23.995 | ops/ms |
>> 
>> __soft-float__ `-Xint -XX:+InlineIntrinsics`
>> 
>> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
>> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
>> | MathBench.absDouble           |      0 | thrpt |   5 |  279.575 | +/-  56.455 | ops/ms |
>> | MathBench.cosDouble           |      0 | thrpt |   5 |  137.005 | +/-  72.561 | ops/ms |
>> | MathBench.expDouble           |      0 | thrpt |   5 |  117.778 | +/-  30.186 | ops/ms |
>> | MathBench.log10Double         |      0 | thrpt |   5 |  107.957 | +/-  10.158 | ops/ms |
>> | MathBench.logDouble           |      0 | thrpt |   5 |  101.341 | +/-   3.914 | ops/ms |
>> | MathBench.powDouble           |      0 | thrpt |   5 |  222.220 | +/-   3.854 | ops/ms |
>> | MathBench.sinDouble           |      0 | thrpt |   5 |  112.715 | +/-   9.088 | ops/ms |
>> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  119.341 | +/-  76.528 | ops/ms |
>> | MathBench.tanDouble           |      0 | thrpt |   5 |  105.224 | +/-  30.477 | ops/ms |
>> 
>> __soft-float__ `-Xint -XX:-InlineIntrinsics`
>> 
>> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
>> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
>> | MathBench.absDouble           |      0 | thrpt |   5 |  173.150 | +/-  36.279 | ops/ms |
>> | MathBench.cosDouble           |      0 | thrpt |   5 |  129.774 | +/-   8.795 | ops/ms |
>> | MathBench.expDouble           |      0 | thrpt |   5 |   53.524 | +/-   1.679 | ops/ms |
>> | MathBench.log10Double         |      0 | thrpt |   5 |  132.503 | +/-   4.274 | ops/ms |
>> | MathBench.logDouble           |      0 | thrpt |   5 |  135.483 | +/-   1.150 | ops/ms |
>> | MathBench.powDouble           |      0 | thrpt |   5 |   54.266 | +/-   0.699 | ops/ms |
>> | MathBench.sinDouble           |      0 | thrpt |   5 |  105.636 | +/-   4.647 | ops/ms |
>> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  204.550 | +/-   7.206 | ops/ms |
>> | MathBench.tanDouble           |      0 | thrpt |   5 |  101.072 | +/-   3.701 | ops/ms |
>
> Christoph Göttschkes has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename use_runtime_function -> use_runtime_call

ARM builds are clean.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4927