Vector API: passing vectors as arguments

Bhateja, Jatin jatin.bhateja at intel.com
Tue Jun 4 11:04:19 UTC 2024


Hi Piotr,

>> As a general rule, to achieve the best possible performance, should all vector operations in hot methods/loops always be manually inlined? If that’s the case, is there still a possibility that even in fully inlined code vector boxing occurs?

Yes, an outlined methods with vector arguments will incur boxing penalties, more so if this happens in a loop, it may significantly degrade the performance which is what your benchmarks show. We may see boxing in fully inlined methods if the target does not meet the required feature sets to intensify an operation. 

>> Are vectors declared as static final fields guaranteed to be inlined properly in methods/loops using them, or is it safer to always explicitly create them within methods/loops?

Static final vector fields are pre-cooked vectors and their usage in a loop will translate into a vector load from its backing storage.  Depending on the operation, re-metallization may be beneficial in some cases e.g. broadcasting a constant value into vector lanes may have lower latency compared to a L1D access latency, but again it will vary with targets. 

Best Regards,
Jatin 

From: panama-dev <panama-dev-retn at openjdk.org> On Behalf Of Piotr Rzysko
Sent: Friday, May 31, 2024 10:36 PM
To: panama-dev at openjdk.org
Subject: Vector API: passing vectors as arguments

Hi,
I use the Vector API in a JSON parser (simdjson-java [1]) that I’ve been developing. Recently, I’ve noticed that sometimes when I have a loop that performs operations on vectors, extracting helper methods from the loop and passing vectors to them causes a significant drop in performance.
To illustrate the problem, I’ve prepared several implementations of the same algorithm. The algorithm has been extracted from the parser:
• OriginalStructuralIndexer [2]: An initial implementation in which, to reuse some pieces of code, I extracted multiple helper methods that perform operations on the vectors loaded by the main loop.
• LoadingInStepStructuralIndexer [3]: A modified version of the OriginalStructuralIndexer in which vector loading is done in the method called from the loop; performance is significantly better.
• InlinedStepStructuralIndexer [4]: More operations on vectors are manually inlined; performance is slightly better compared to the LoadingInStepStructuralIndexer.
• InlinedIndexStructuralIndexer [5]: All operations on vectors are manually inlined in the main loop; performance is the best out of all the implementations.
Please take a look at the comments at the top of each class. They include the results I obtained from running benchmarks [6] of the implementations on my desktop (256-bit registers, Temurin-21.0.1). If you would like to run the benchmarks, please follow the instructions in the README [7].
Overall, the most surprising to me was the poor performance of the OriginalStructuralIndexer, which I assume was caused by vector boxing and the JIT’s inability to inline the helper methods. I have two questions regarding this:
1. As a general rule, to achieve the best possible performance, should all vector operations in hot methods/loops always be manually inlined? If that’s the case, is there still a possibility that even in fully inlined code vector boxing occurs?
2. Are vectors declared as static final fields guaranteed to be inlined properly in methods/loops using them, or is it safer to always explicitly create them within methods/loops?
Best regards,
Piotr

[1] https://github.com/simdjson/simdjson-java
[2]  https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/OriginalStructuralIndexer.java
[3] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/LoadingInStepStructuralIndexer.java
[4] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedStepStructuralIndexer.java
[5] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedIndexStructuralIndexer.java
[6] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/jmh/java/io/github/piotrrzysko/StructuralIndexerBenchmark.java
[7] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/README.md



More information about the panama-dev mailing list