RFR: 8294198: Implement isFinite intrinsic for RISC-V
Aleksei Voitylov
avoitylov at openjdk.org
Fri Sep 23 13:29:21 UTC 2022
On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov <avoitylov at openjdk.org> wrote:
> Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic turned out to be profitable on RISC-V using the same fclass instruction as for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it added on RISC-V in this PR.
>
> benchmark results:
>
> before:
>
> Benchmark Mode Cnt Score Error Units
> DoubleClassCheck.testIsFiniteBranch avgt 15 52.824 ± 1.744 ns/op
> DoubleClassCheck.testIsFiniteCMov avgt 15 16.104 ± 0.358 ns/op
> DoubleClassCheck.testIsFiniteStore avgt 15 14.366 ± 2.174 ns/op
> FloatClassCheck.testIsFiniteBranch avgt 15 49.821 ± 0.330 ns/op
> FloatClassCheck.testIsFiniteCMov avgt 15 14.702 ± 0.335 ns/op
> FloatClassCheck.testIsFiniteStore avgt 15 14.749 ± 0.496 ns/op
>
> after:
>
> DoubleClassCheck.testIsFiniteBranch avgt 15 48.921 ± 0.557 ns/op
> DoubleClassCheck.testIsFiniteCMov avgt 15 13.716 ± 0.304 ns/op
> DoubleClassCheck.testIsFiniteStore avgt 15 9.152 ± 0.158 ns/op
> FloatClassCheck.testIsFiniteBranch avgt 15 47.740 ± 2.028 ns/op
> FloatClassCheck.testIsFiniteCMov avgt 15 13.299 ± 0.282 ns/op
> FloatClassCheck.testIsFiniteStore avgt 15 9.185 ± 0.396 ns/op
>
> Existing isInfinite jtreg test was altered to be able to use common code for isFinite test and fine-grained requires tag filtering. Existing benchmark was modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed on the way.
>
> Test passed on both release and fastdebug builds. Hotspot tier1 tests were run on x86_64 and RISC-V with no issues.
Good you checked on SiFive HW, as I could only check their cookbook for instruction costs. I was using C906.
You're not observing any statistically significant difference in the first benchmark from this change due to higher latency of fclass instruction on SiFive CPU (4 cycles, while C906 has 3 cycles latency for fclass). However, it is still profitable in general as we replace 2 relatively expensive FP instructions with 1 FP + 2 cheap GPR instructions. That's what the rest of the cases demonstrate on both SiFive and C906.
Overall, as the platform matures the instructions costs tend to decrease, so I'd assume fclass will take less cycles in most future implementations.
-------------
PR: https://git.openjdk.org/jdk/pull/10391
More information about the core-libs-dev
mailing list