Feedback.. & Question about verifying Intrinsics are being used
Paul Sandoz
paul.sandoz at oracle.com
Mon Apr 26 23:29:34 UTC 2021
Hi Ben,
[Apologies for the delay in replying. Emails from non-members were queued up for approval and we forgot to approve ‘em]
The best approach right now is to write a JMH benchmark and measure the performance. With JMH you can either look at the assembler code using perfasm (on linux) or dtraceasm (on mac), alternatively use the -XX:+PrintInlining and =XX:+PrintIntrinsics options to see if intrinsification fails or not.
We don’t currently have a way to query at runtime if an operation is hardware supported, we could expose that if need be, since the runtime needs to anyway know that. However, that is arguable only part of process by which the hardware instruction is leveraged, since the vector operation will be used in the larger context of code from which the C2 runtime compiler needs to kick in.
My hope is over time C2 get better and better so we can mostly ask the question from a platform perspective, rather than, or in addition, from a “is there a problem with C2” perspective.
Paul.
> On Feb 13, 2021, at 11:39 PM, Ben Hutchison <brhutchison at gmail.com> wrote:
>
> Hi Panama dev team :)
>
> First post here.
>
> 1. Some positive feedback: I tried calling the Vector SIMD API from Scala
> 3.0.0-M3 on JDK16 RC and, from the outside, everything worked as expected.
> The hardest bit was getting SBT (Scala Build Tool) to load the module (`sbt
> -J--add-modules -Jjdk.incubator.vector` worked for me, if anyone else gets
> stuck on that).
>
> Actually, I found the Panama API documentation easy to understand and
> conceptually clear. While it's low level, the API made sense and Im
> confident I could build higher operations on top of it.
>
>
> 2. However, Im not at all confident that the SSE/AVX Intrinsics were
> actually used. My cpu is theoretically capable, but I want to be sure... Is
> there any docs, blogs or suggested methods to verify that a running program
> is actually using the vector cpu instructions, and not the fallback scalar
> implementation?
>
> -Ben
More information about the panama-dev
mailing list