RFR: 8271128: InlineIntrinsics support for 32-bit ARM

Thu Jul 29 12:01:29 UTC 2021

On Thu, 29 Jul 2021 09:40:08 GMT, Christoph Göttschkes <cgo at openjdk.org> wrote:

> Hi,
> 
> please review this patch, which adds support for InlineIntrinsics to the 32-bit ARM port. The old aarch32 port had this intrinsic implemented and enabled by default.
> 
> Like on many other platforms, the 32-bit ARM port simply calls into the `SharedRuntime` to intrinsify the basic `java.lang.Math` methods. InlineIntrinsics is already implemented for C1 on 32-bit ARM, which does the same thing.
> 
> testing: hotspot tier1 on ARMv5TE (soft-float) and ARMv7-A (hard-float)
> 
> There is already the micro benchmark `test/micro/org/openjdk/bench/java/lang/MathBench.java` which I used. The soft-float benchmarks are not that meaningful, since I performed them in QEMU.
> 
> __hard-float__ `-Xint -XX:+InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 | 1169.574 | +/- 133.694 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  759.902 | +/- 573.852 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |  854.753 | +/-  67.217 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  902.034 | +/-  22.413 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  895.470 | +/- 113.811 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |  936.136 | +/-  40.661 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  864.670 | +/-  68.329 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 | 1082.589 | +/-  92.570 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  853.715 | +/- 122.427 | ops/ms |
> 
> __hard-float__ `-Xint -XX:-InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 |  450.907 | +/-  10.402 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  592.242 | +/-  14.011 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |  167.614 | +/-   7.530 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  572.099 | +/-  55.089 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  596.588 | +/-  24.976 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |  212.673 | +/-   4.060 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  584.873 | +/-  42.774 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  514.690 | +/-  30.568 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  566.586 | +/-  23.995 | ops/ms |
> 
> __soft-float__ `-Xint -XX:+InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 |  279.575 | +/-  56.455 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  137.005 | +/-  72.561 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |  117.778 | +/-  30.186 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  107.957 | +/-  10.158 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  101.341 | +/-   3.914 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |  222.220 | +/-   3.854 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  112.715 | +/-   9.088 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  119.341 | +/-  76.528 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  105.224 | +/-  30.477 | ops/ms |
> 
> __soft-float__ `-Xint -XX:-InlineIntrinsics`
> 
> | Benchmark                     | (seed) |  Mode | Cnt |    Score |       Error |  Units |
> | :---------------------------- | -----: | ----: | --: | -------: | ----------: | -----: |
> | MathBench.absDouble           |      0 | thrpt |   5 |  173.150 | +/-  36.279 | ops/ms |
> | MathBench.cosDouble           |      0 | thrpt |   5 |  129.774 | +/-   8.795 | ops/ms |
> | MathBench.expDouble           |      0 | thrpt |   5 |   53.524 | +/-   1.679 | ops/ms |
> | MathBench.log10Double         |      0 | thrpt |   5 |  132.503 | +/-   4.274 | ops/ms |
> | MathBench.logDouble           |      0 | thrpt |   5 |  135.483 | +/-   1.150 | ops/ms |
> | MathBench.powDouble           |      0 | thrpt |   5 |   54.266 | +/-   0.699 | ops/ms |
> | MathBench.sinDouble           |      0 | thrpt |   5 |  105.636 | +/-   4.647 | ops/ms |
> | MathBench.sqrtDouble          |      0 | thrpt |   5 |  204.550 | +/-   7.206 | ops/ms |
> | MathBench.tanDouble           |      0 | thrpt |   5 |  101.072 | +/-   3.701 | ops/ms |

Sorry, I forgot to mention this.
I tested the java.lang.Math methods for which the new intrinsics are implemented manually, by comparing the results of the intrinsics with the results of the corresponding java.lang.StrictMath method. Since both, StrictMath and the intrinsics use the same algorithm on 32-bit ARM, it is possible to do that.
But I agree, doing some more testing doesn't hurt. I will start jdk tier1 as well and will report back as soon as they are done, will definitely take some time though.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4927