Vector API: passing vectors as arguments
Piotr Rżysko
piotr.rzysko at gmail.com
Wed Jun 5 18:00:27 UTC 2024
Hi Paul and Jatin,
Thank you for your responses. Everything is clear now.
Best regards,
Piotr
On Tue, Jun 4, 2024 at 1:08 PM Bhateja, Jatin <jatin.bhateja at intel.com>
wrote:
> > >> Are vectors declared as static final fields guaranteed to be inlined
> > properly in methods/loops using them, or is it safer to always explicitly
> > create them within methods/loops?
> >
> > Static final vector fields are pre-cooked vectors and their usage in a
> loop will
> > translate into a vector load from its backing storage. Depending on the
> > operation, re-metallization may be beneficial in some cases e.g.
> > broadcasting a constant value into vector lanes may have lower latency
> > compared to a L1D access latency, but again it will vary with targets.
>
> And compiler will anyways move any invariants out of loop.
>
> Best Regards,
> Jatin
>
> > -----Original Message-----
> > From: Bhateja, Jatin
> > Sent: Tuesday, June 4, 2024 4:34 PM
> > To: Piotr Rzysko <piotr.rzysko at gmail.com>
> > Cc: panama-dev at openjdk.org
> > Subject: RE: Vector API: passing vectors as arguments
> >
> > Hi Piotr,
> >
> > >> As a general rule, to achieve the best possible performance, should
> all
> > vector operations in hot methods/loops always be manually inlined? If
> > that’s the case, is there still a possibility that even in fully inlined
> code
> > vector boxing occurs?
> >
> > Yes, an outlined methods with vector arguments will incur boxing
> penalties,
> > more so if this happens in a loop, it may significantly degrade the
> > performance which is what your benchmarks show. We may see boxing in
> > fully inlined methods if the target does not meet the required feature
> sets
> > to intensify an operation.
> >
> > >> Are vectors declared as static final fields guaranteed to be inlined
> > properly in methods/loops using them, or is it safer to always explicitly
> > create them within methods/loops?
> >
> > Static final vector fields are pre-cooked vectors and their usage in a
> loop will
> > translate into a vector load from its backing storage. Depending on the
> > operation, re-metallization may be beneficial in some cases e.g.
> > broadcasting a constant value into vector lanes may have lower latency
> > compared to a L1D access latency, but again it will vary with targets.
> >
> > Best Regards,
> > Jatin
> >
> > From: panama-dev <panama-dev-retn at openjdk.org> On Behalf Of Piotr
> > Rzysko
> > Sent: Friday, May 31, 2024 10:36 PM
> > To: panama-dev at openjdk.org
> > Subject: Vector API: passing vectors as arguments
> >
> > Hi,
> > I use the Vector API in a JSON parser (simdjson-java [1]) that I’ve been
> > developing. Recently, I’ve noticed that sometimes when I have a loop that
> > performs operations on vectors, extracting helper methods from the loop
> > and passing vectors to them causes a significant drop in performance.
> > To illustrate the problem, I’ve prepared several implementations of the
> > same algorithm. The algorithm has been extracted from the parser:
> > • OriginalStructuralIndexer [2]: An initial implementation in which, to
> reuse
> > some pieces of code, I extracted multiple helper methods that perform
> > operations on the vectors loaded by the main loop.
> > • LoadingInStepStructuralIndexer [3]: A modified version of the
> > OriginalStructuralIndexer in which vector loading is done in the method
> > called from the loop; performance is significantly better.
> > • InlinedStepStructuralIndexer [4]: More operations on vectors are
> manually
> > inlined; performance is slightly better compared to the
> > LoadingInStepStructuralIndexer.
> > • InlinedIndexStructuralIndexer [5]: All operations on vectors are
> manually
> > inlined in the main loop; performance is the best out of all the
> > implementations.
> > Please take a look at the comments at the top of each class. They include
> > the results I obtained from running benchmarks [6] of the implementations
> > on my desktop (256-bit registers, Temurin-21.0.1). If you would like to
> run
> > the benchmarks, please follow the instructions in the README [7].
> > Overall, the most surprising to me was the poor performance of the
> > OriginalStructuralIndexer, which I assume was caused by vector boxing and
> > the JIT’s inability to inline the helper methods. I have two questions
> > regarding this:
> > 1. As a general rule, to achieve the best possible performance, should
> all
> > vector operations in hot methods/loops always be manually inlined? If
> > that’s the case, is there still a possibility that even in fully inlined
> code
> > vector boxing occurs?
> > 2. Are vectors declared as static final fields guaranteed to be inlined
> > properly in methods/loops using them, or is it safer to always explicitly
> > create them within methods/loops?
> > Best regards,
> > Piotr
> >
> > [1] https://github.com/simdjson/simdjson-java
> > [2] https://github.com/piotrrzysko/vector-api-
> > benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Ori
> > ginalStructuralIndexer.java
> > [3] https://github.com/piotrrzysko/vector-api-
> > benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Loa
> > dingInStepStructuralIndexer.java
> > [4] https://github.com/piotrrzysko/vector-api-
> > benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Inli
> > nedStepStructuralIndexer.java
> > [5] https://github.com/piotrrzysko/vector-api-
> > benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Inli
> > nedIndexStructuralIndexer.java
> > [6] https://github.com/piotrrzysko/vector-api-
> > benchmarks/blob/main/src/jmh/java/io/github/piotrrzysko/StructuralInde
> > xerBenchmark.java
> > [7] https://github.com/piotrrzysko/vector-api-
> > benchmarks/blob/main/README.md
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240605/f23d1f26/attachment.htm>
More information about the panama-dev
mailing list