Vector API: passing vectors as arguments

Bhateja, Jatin jatin.bhateja at intel.com
Tue Jun 4 11:08:00 UTC 2024


> >> Are vectors declared as static final fields guaranteed to be inlined
> properly in methods/loops using them, or is it safer to always explicitly
> create them within methods/loops?
> 
> Static final vector fields are pre-cooked vectors and their usage in a loop will
> translate into a vector load from its backing storage.  Depending on the
> operation, re-metallization may be beneficial in some cases e.g.
> broadcasting a constant value into vector lanes may have lower latency
> compared to a L1D access latency, but again it will vary with targets.

And compiler will anyways move any invariants out of loop. 

Best Regards,
Jatin

> -----Original Message-----
> From: Bhateja, Jatin
> Sent: Tuesday, June 4, 2024 4:34 PM
> To: Piotr Rzysko <piotr.rzysko at gmail.com>
> Cc: panama-dev at openjdk.org
> Subject: RE: Vector API: passing vectors as arguments
> 
> Hi Piotr,
> 
> >> As a general rule, to achieve the best possible performance, should all
> vector operations in hot methods/loops always be manually inlined? If
> that’s the case, is there still a possibility that even in fully inlined code
> vector boxing occurs?
> 
> Yes, an outlined methods with vector arguments will incur boxing penalties,
> more so if this happens in a loop, it may significantly degrade the
> performance which is what your benchmarks show. We may see boxing in
> fully inlined methods if the target does not meet the required feature sets
> to intensify an operation.
> 
> >> Are vectors declared as static final fields guaranteed to be inlined
> properly in methods/loops using them, or is it safer to always explicitly
> create them within methods/loops?
> 
> Static final vector fields are pre-cooked vectors and their usage in a loop will
> translate into a vector load from its backing storage.  Depending on the
> operation, re-metallization may be beneficial in some cases e.g.
> broadcasting a constant value into vector lanes may have lower latency
> compared to a L1D access latency, but again it will vary with targets.
> 
> Best Regards,
> Jatin
> 
> From: panama-dev <panama-dev-retn at openjdk.org> On Behalf Of Piotr
> Rzysko
> Sent: Friday, May 31, 2024 10:36 PM
> To: panama-dev at openjdk.org
> Subject: Vector API: passing vectors as arguments
> 
> Hi,
> I use the Vector API in a JSON parser (simdjson-java [1]) that I’ve been
> developing. Recently, I’ve noticed that sometimes when I have a loop that
> performs operations on vectors, extracting helper methods from the loop
> and passing vectors to them causes a significant drop in performance.
> To illustrate the problem, I’ve prepared several implementations of the
> same algorithm. The algorithm has been extracted from the parser:
> • OriginalStructuralIndexer [2]: An initial implementation in which, to reuse
> some pieces of code, I extracted multiple helper methods that perform
> operations on the vectors loaded by the main loop.
> • LoadingInStepStructuralIndexer [3]: A modified version of the
> OriginalStructuralIndexer in which vector loading is done in the method
> called from the loop; performance is significantly better.
> • InlinedStepStructuralIndexer [4]: More operations on vectors are manually
> inlined; performance is slightly better compared to the
> LoadingInStepStructuralIndexer.
> • InlinedIndexStructuralIndexer [5]: All operations on vectors are manually
> inlined in the main loop; performance is the best out of all the
> implementations.
> Please take a look at the comments at the top of each class. They include
> the results I obtained from running benchmarks [6] of the implementations
> on my desktop (256-bit registers, Temurin-21.0.1). If you would like to run
> the benchmarks, please follow the instructions in the README [7].
> Overall, the most surprising to me was the poor performance of the
> OriginalStructuralIndexer, which I assume was caused by vector boxing and
> the JIT’s inability to inline the helper methods. I have two questions
> regarding this:
> 1. As a general rule, to achieve the best possible performance, should all
> vector operations in hot methods/loops always be manually inlined? If
> that’s the case, is there still a possibility that even in fully inlined code
> vector boxing occurs?
> 2. Are vectors declared as static final fields guaranteed to be inlined
> properly in methods/loops using them, or is it safer to always explicitly
> create them within methods/loops?
> Best regards,
> Piotr
> 
> [1] https://github.com/simdjson/simdjson-java
> [2]  https://github.com/piotrrzysko/vector-api-
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Ori
> ginalStructuralIndexer.java
> [3] https://github.com/piotrrzysko/vector-api-
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Loa
> dingInStepStructuralIndexer.java
> [4] https://github.com/piotrrzysko/vector-api-
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Inli
> nedStepStructuralIndexer.java
> [5] https://github.com/piotrrzysko/vector-api-
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Inli
> nedIndexStructuralIndexer.java
> [6] https://github.com/piotrrzysko/vector-api-
> benchmarks/blob/main/src/jmh/java/io/github/piotrrzysko/StructuralInde
> xerBenchmark.java
> [7] https://github.com/piotrrzysko/vector-api-
> benchmarks/blob/main/README.md



More information about the panama-dev mailing list