RFR: 8294198: Implement isFinite intrinsic for RISC-V

Mon Sep 26 14:25:21 UTC 2022

On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:

> Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic turned out to be profitable on RISC-V using the same fclass instruction as for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it added on RISC-V in this PR.
> 
> benchmark results:
> 
> before:
> 
> Benchmark                              Mode  Cnt   Score   Error  Units
> DoubleClassCheck.testIsFiniteBranch    avgt   15  52.824 ± 1.744  ns/op
> DoubleClassCheck.testIsFiniteCMov      avgt   15  16.104 ± 0.358  ns/op
> DoubleClassCheck.testIsFiniteStore     avgt   15  14.366 ± 2.174  ns/op
> FloatClassCheck.testIsFiniteBranch     avgt   15  49.821 ± 0.330  ns/op
> FloatClassCheck.testIsFiniteCMov       avgt   15  14.702 ± 0.335  ns/op
> FloatClassCheck.testIsFiniteStore      avgt   15  14.749 ± 0.496  ns/op
> 
> after:
> 
> DoubleClassCheck.testIsFiniteBranch    avgt   15  48.921 ± 0.557  ns/op
> DoubleClassCheck.testIsFiniteCMov      avgt   15  13.716 ± 0.304  ns/op
> DoubleClassCheck.testIsFiniteStore     avgt   15   9.152 ± 0.158  ns/op
> FloatClassCheck.testIsFiniteBranch     avgt   15  47.740 ± 2.028  ns/op
> FloatClassCheck.testIsFiniteCMov       avgt   15  13.299 ± 0.282  ns/op
> FloatClassCheck.testIsFiniteStore      avgt   15   9.185 ± 0.396  ns/op
> 
> Existing isInfinite jtreg test was altered to be able to use common code for isFinite test and fine-grained requires tag filtering. Existing benchmark was modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed on the way.
> 
> Test passed on both release and fastdebug builds. Hotspot tier1 tests were run on x86_64 and RISC-V with no issues.

I find the cause of the fluctuations for 'testIsFiniteBranch' lies in randomness of the input.

    @Benchmark
    @OperationsPerInvocation(BUFFER_SIZE)
    public void testIsFiniteBranch() {
        for (int i = 0; i < BUFFER_SIZE; i++) {
            cmovOutputs[i] = Float.isFinite(inputs[i]) ? call() : 7;
        }
    }

Here the C2 JIT code for invoking 'call()' has changed with this patch. The register allocation is
different and hence the difference of saving and restoring of live registers around the method call. So the probability of invoking this method call will affect the JMH result, which is not the case for the other two JMH tests.

For the other two JMH tests, I see the performance gain on SiFive platform comes from C2 loop unrolling. Since your change benefit the other two JMH tests on both SiFive and C906 platforms, looks good to me.

-------------

Marked as reviewed by fyang (Reviewer).

PR: https://git.openjdk.org/jdk/pull/10391