Vector API: passing vectors as arguments

Paul Sandoz paul.sandoz at oracle.com
Mon Jun 3 18:44:34 UTC 2024


Hi Piotr,

Thank you for the detailed example. For now I recommend manual inlining for more reliable compilation. The end goal would be for you to write the code as in [2], but unfortunately we are not there yet. 

When inlining fails the vector arguments are boxed (as you noted). Once we align with Valhalla we will be able to enhance the method calling convention to support the passing of vector instances as vector registers, thereby avoiding boxing, but ideally inlining is preferable because it unlocks so many other optimizations. For example, its ok to declare static final vectors, C2 can load the vector into memory and hoist it out of the main loop and hold in a register (it might spill due to heavy register usage but that is a separate issue), since it knows its constant (I have used this with many vector constants to implement a very efficient vectorized arc tangent). The same should occur if you declare the vector as a local variable outside the main loop. (In some cases it may be more preferable to broadcast in the loop if all lane elements are the same).

It’s possible to verify if inlining is an issue by some HotSpot command line arguments, printing out inlining diagnostics, increasing the inlining threshold, or explicitly forcing inlining of certain methods. Perhaps the easiest approach to verify is to use the JMH annotation `@CompilerControl(INLINE)` although I am unsure of the scope (IIRC when it forks the benchmark for execution it basically adds the command line options, that I currently cannot exactly recall!).

Paul.

> On May 31, 2024, at 10:06 AM, Piotr Rżysko <piotr.rzysko at gmail.com> wrote:
> 
> Hi,
> I use the Vector API in a JSON parser (simdjson-java [1]) that I’ve been developing. Recently, I’ve noticed that sometimes when I have a loop that performs operations on vectors, extracting helper methods from the loop and passing vectors to them causes a significant drop in performance.
> To illustrate the problem, I’ve prepared several implementations of the same algorithm. The algorithm has been extracted from the parser:
>     • OriginalStructuralIndexer [2]: An initial implementation in which, to reuse some pieces of code, I extracted multiple helper methods that perform operations on the vectors loaded by the main loop.
>     • LoadingInStepStructuralIndexer [3]: A modified version of the OriginalStructuralIndexer in which vector loading is done in the method called from the loop; performance is significantly better.
>     • InlinedStepStructuralIndexer [4]: More operations on vectors are manually inlined; performance is slightly better compared to the LoadingInStepStructuralIndexer.
>     • InlinedIndexStructuralIndexer [5]: All operations on vectors are manually inlined in the main loop; performance is the best out of all the implementations.
> Please take a look at the comments at the top of each class. They include the results I obtained from running benchmarks [6] of the implementations on my desktop (256-bit registers, Temurin-21.0.1). If you would like to run the benchmarks, please follow the instructions in the README [7].
> Overall, the most surprising to me was the poor performance of the OriginalStructuralIndexer, which I assume was caused by vector boxing and the JIT’s inability to inline the helper methods. I have two questions regarding this:
>     • As a general rule, to achieve the best possible performance, should all vector operations in hot methods/loops always be manually inlined? If that’s the case, is there still a possibility that even in fully inlined code vector boxing occurs?
>     • Are vectors declared as static final fields guaranteed to be inlined properly in methods/loops using them, or is it safer to always explicitly create them within methods/loops?
> Best regards,
> Piotr
> 
> [1] https://github.com/simdjson/simdjson-java
> [2]  https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/OriginalStructuralIndexer.java
> [3] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/LoadingInStepStructuralIndexer.java
> [4] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedStepStructuralIndexer.java
> [5] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedIndexStructuralIndexer.java
> [6] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/jmh/java/io/github/piotrrzysko/StructuralIndexerBenchmark.java
> [7] https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/README.md
> 



More information about the panama-dev mailing list