<div dir="ltr">Hi Paul and Jatin,<br><br>Thank you for your responses. Everything is clear now.<div> <br>Best regards,<br>Piotr<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 4, 2024 at 1:08 PM Bhateja, Jatin <<a href="mailto:jatin.bhateja@intel.com">jatin.bhateja@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> >> Are vectors declared as static final fields guaranteed to be inlined<br>
> properly in methods/loops using them, or is it safer to always explicitly<br>
> create them within methods/loops?<br>
> <br>
> Static final vector fields are pre-cooked vectors and their usage in a loop will<br>
> translate into a vector load from its backing storage. Depending on the<br>
> operation, re-metallization may be beneficial in some cases e.g.<br>
> broadcasting a constant value into vector lanes may have lower latency<br>
> compared to a L1D access latency, but again it will vary with targets.<br>
<br>
And compiler will anyways move any invariants out of loop. <br>
<br>
Best Regards,<br>
Jatin<br>
<br>
> -----Original Message-----<br>
> From: Bhateja, Jatin<br>
> Sent: Tuesday, June 4, 2024 4:34 PM<br>
> To: Piotr Rzysko <<a href="mailto:piotr.rzysko@gmail.com" target="_blank">piotr.rzysko@gmail.com</a>><br>
> Cc: <a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a><br>
> Subject: RE: Vector API: passing vectors as arguments<br>
> <br>
> Hi Piotr,<br>
> <br>
> >> As a general rule, to achieve the best possible performance, should all<br>
> vector operations in hot methods/loops always be manually inlined? If<br>
> that’s the case, is there still a possibility that even in fully inlined code<br>
> vector boxing occurs?<br>
> <br>
> Yes, an outlined methods with vector arguments will incur boxing penalties,<br>
> more so if this happens in a loop, it may significantly degrade the<br>
> performance which is what your benchmarks show. We may see boxing in<br>
> fully inlined methods if the target does not meet the required feature sets<br>
> to intensify an operation.<br>
> <br>
> >> Are vectors declared as static final fields guaranteed to be inlined<br>
> properly in methods/loops using them, or is it safer to always explicitly<br>
> create them within methods/loops?<br>
> <br>
> Static final vector fields are pre-cooked vectors and their usage in a loop will<br>
> translate into a vector load from its backing storage. Depending on the<br>
> operation, re-metallization may be beneficial in some cases e.g.<br>
> broadcasting a constant value into vector lanes may have lower latency<br>
> compared to a L1D access latency, but again it will vary with targets.<br>
> <br>
> Best Regards,<br>
> Jatin<br>
> <br>
> From: panama-dev <<a href="mailto:panama-dev-retn@openjdk.org" target="_blank">panama-dev-retn@openjdk.org</a>> On Behalf Of Piotr<br>
> Rzysko<br>
> Sent: Friday, May 31, 2024 10:36 PM<br>
> To: <a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a><br>
> Subject: Vector API: passing vectors as arguments<br>
> <br>
> Hi,<br>
> I use the Vector API in a JSON parser (simdjson-java [1]) that I’ve been<br>
> developing. Recently, I’ve noticed that sometimes when I have a loop that<br>
> performs operations on vectors, extracting helper methods from the loop<br>
> and passing vectors to them causes a significant drop in performance.<br>
> To illustrate the problem, I’ve prepared several implementations of the<br>
> same algorithm. The algorithm has been extracted from the parser:<br>
> • OriginalStructuralIndexer [2]: An initial implementation in which, to reuse<br>
> some pieces of code, I extracted multiple helper methods that perform<br>
> operations on the vectors loaded by the main loop.<br>
> • LoadingInStepStructuralIndexer [3]: A modified version of the<br>
> OriginalStructuralIndexer in which vector loading is done in the method<br>
> called from the loop; performance is significantly better.<br>
> • InlinedStepStructuralIndexer [4]: More operations on vectors are manually<br>
> inlined; performance is slightly better compared to the<br>
> LoadingInStepStructuralIndexer.<br>
> • InlinedIndexStructuralIndexer [5]: All operations on vectors are manually<br>
> inlined in the main loop; performance is the best out of all the<br>
> implementations.<br>
> Please take a look at the comments at the top of each class. They include<br>
> the results I obtained from running benchmarks [6] of the implementations<br>
> on my desktop (256-bit registers, Temurin-21.0.1). If you would like to run<br>
> the benchmarks, please follow the instructions in the README [7].<br>
> Overall, the most surprising to me was the poor performance of the<br>
> OriginalStructuralIndexer, which I assume was caused by vector boxing and<br>
> the JIT’s inability to inline the helper methods. I have two questions<br>
> regarding this:<br>
> 1. As a general rule, to achieve the best possible performance, should all<br>
> vector operations in hot methods/loops always be manually inlined? If<br>
> that’s the case, is there still a possibility that even in fully inlined code<br>
> vector boxing occurs?<br>
> 2. Are vectors declared as static final fields guaranteed to be inlined<br>
> properly in methods/loops using them, or is it safer to always explicitly<br>
> create them within methods/loops?<br>
> Best regards,<br>
> Piotr<br>
> <br>
> [1] <a href="https://github.com/simdjson/simdjson-java" rel="noreferrer" target="_blank">https://github.com/simdjson/simdjson-java</a><br>
> [2] <a href="https://github.com/piotrrzysko/vector-api-" rel="noreferrer" target="_blank">https://github.com/piotrrzysko/vector-api-</a><br>
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Ori<br>
> ginalStructuralIndexer.java<br>
> [3] <a href="https://github.com/piotrrzysko/vector-api-" rel="noreferrer" target="_blank">https://github.com/piotrrzysko/vector-api-</a><br>
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Loa<br>
> dingInStepStructuralIndexer.java<br>
> [4] <a href="https://github.com/piotrrzysko/vector-api-" rel="noreferrer" target="_blank">https://github.com/piotrrzysko/vector-api-</a><br>
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Inli<br>
> nedStepStructuralIndexer.java<br>
> [5] <a href="https://github.com/piotrrzysko/vector-api-" rel="noreferrer" target="_blank">https://github.com/piotrrzysko/vector-api-</a><br>
> benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/Inli<br>
> nedIndexStructuralIndexer.java<br>
> [6] <a href="https://github.com/piotrrzysko/vector-api-" rel="noreferrer" target="_blank">https://github.com/piotrrzysko/vector-api-</a><br>
> benchmarks/blob/main/src/jmh/java/io/github/piotrrzysko/StructuralInde<br>
> xerBenchmark.java<br>
> [7] <a href="https://github.com/piotrrzysko/vector-api-" rel="noreferrer" target="_blank">https://github.com/piotrrzysko/vector-api-</a><br>
> benchmarks/blob/main/README.md<br>
<br>
</blockquote></div>