RFR: 8294198: Implement isFinite intrinsic for RISC-V
Fei Yang
fyang at openjdk.org
Mon Sep 26 14:25:21 UTC 2022
On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:
> Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic turned out to be profitable on RISC-V using the same fclass instruction as for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it added on RISC-V in this PR.
>
> benchmark results:
>
> before:
>
> Benchmark Mode Cnt Score Error Units
> DoubleClassCheck.testIsFiniteBranch avgt 15 52.824 ± 1.744 ns/op
> DoubleClassCheck.testIsFiniteCMov avgt 15 16.104 ± 0.358 ns/op
> DoubleClassCheck.testIsFiniteStore avgt 15 14.366 ± 2.174 ns/op
> FloatClassCheck.testIsFiniteBranch avgt 15 49.821 ± 0.330 ns/op
> FloatClassCheck.testIsFiniteCMov avgt 15 14.702 ± 0.335 ns/op
> FloatClassCheck.testIsFiniteStore avgt 15 14.749 ± 0.496 ns/op
>
> after:
>
> DoubleClassCheck.testIsFiniteBranch avgt 15 48.921 ± 0.557 ns/op
> DoubleClassCheck.testIsFiniteCMov avgt 15 13.716 ± 0.304 ns/op
> DoubleClassCheck.testIsFiniteStore avgt 15 9.152 ± 0.158 ns/op
> FloatClassCheck.testIsFiniteBranch avgt 15 47.740 ± 2.028 ns/op
> FloatClassCheck.testIsFiniteCMov avgt 15 13.299 ± 0.282 ns/op
> FloatClassCheck.testIsFiniteStore avgt 15 9.185 ± 0.396 ns/op
>
> Existing isInfinite jtreg test was altered to be able to use common code for isFinite test and fine-grained requires tag filtering. Existing benchmark was modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed on the way.
>
> Test passed on both release and fastdebug builds. Hotspot tier1 tests were run on x86_64 and RISC-V with no issues.
I find the cause of the fluctuations for 'testIsFiniteBranch' lies in randomness of the input.
@Benchmark
@OperationsPerInvocation(BUFFER_SIZE)
public void testIsFiniteBranch() {
for (int i = 0; i < BUFFER_SIZE; i++) {
cmovOutputs[i] = Float.isFinite(inputs[i]) ? call() : 7;
}
}
Here the C2 JIT code for invoking 'call()' has changed with this patch. The register allocation is
different and hence the difference of saving and restoring of live registers around the method call. So the probability of invoking this method call will affect the JMH result, which is not the case for the other two JMH tests.
For the other two JMH tests, I see the performance gain on SiFive platform comes from C2 loop unrolling. Since your change benefit the other two JMH tests on both SiFive and C906 platforms, looks good to me.
-------------
Marked as reviewed by fyang (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10391
More information about the core-libs-dev
mailing list