RFR: 8294198: Implement isFinite intrinsic for RISC-V

Fri Sep 23 13:29:21 UTC 2022

On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:

> Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic turned out to be profitable on RISC-V using the same fclass instruction as for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it added on RISC-V in this PR.
> 
> benchmark results:
> 
> before:
> 
> Benchmark                              Mode  Cnt   Score   Error  Units
> DoubleClassCheck.testIsFiniteBranch    avgt   15  52.824 ± 1.744  ns/op
> DoubleClassCheck.testIsFiniteCMov      avgt   15  16.104 ± 0.358  ns/op
> DoubleClassCheck.testIsFiniteStore     avgt   15  14.366 ± 2.174  ns/op
> FloatClassCheck.testIsFiniteBranch     avgt   15  49.821 ± 0.330  ns/op
> FloatClassCheck.testIsFiniteCMov       avgt   15  14.702 ± 0.335  ns/op
> FloatClassCheck.testIsFiniteStore      avgt   15  14.749 ± 0.496  ns/op
> 
> after:
> 
> DoubleClassCheck.testIsFiniteBranch    avgt   15  48.921 ± 0.557  ns/op
> DoubleClassCheck.testIsFiniteCMov      avgt   15  13.716 ± 0.304  ns/op
> DoubleClassCheck.testIsFiniteStore     avgt   15   9.152 ± 0.158  ns/op
> FloatClassCheck.testIsFiniteBranch     avgt   15  47.740 ± 2.028  ns/op
> FloatClassCheck.testIsFiniteCMov       avgt   15  13.299 ± 0.282  ns/op
> FloatClassCheck.testIsFiniteStore      avgt   15   9.185 ± 0.396  ns/op
> 
> Existing isInfinite jtreg test was altered to be able to use common code for isFinite test and fine-grained requires tag filtering. Existing benchmark was modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed on the way.
> 
> Test passed on both release and fastdebug builds. Hotspot tier1 tests were run on x86_64 and RISC-V with no issues.

Good you checked on SiFive HW, as I could only check their cookbook for instruction costs. I was using C906.

You're not observing any statistically significant difference in the first benchmark from this change due to higher latency of fclass instruction on SiFive CPU (4 cycles, while C906 has 3 cycles latency for fclass). However, it is still profitable in general as we replace 2 relatively expensive FP instructions with 1 FP + 2 cheap GPR instructions. That's what the rest of the cases demonstrate on both SiFive and C906.

Overall, as the platform matures the instructions costs tend to decrease, so I'd assume fclass will take less cycles in most future implementations.

-------------

PR: https://git.openjdk.org/jdk/pull/10391