<div dir="ltr"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;padding:0pt 0pt 15pt"><span style="font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">Hi,</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt;padding:0pt 0pt 15pt"><span style="font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">I use the Vector API in a JSON parser (simdjson-java [1]) that I’ve been developing. Recently, I’ve noticed that sometimes when I have a loop that performs operations on vectors, extracting helper methods from the loop and passing vectors to them causes a significant drop in performance.</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:15pt"><span style="font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">To illustrate the problem, I’ve prepared several implementations of the same algorithm. The algorithm has been extracted from the parser:</span></p><ul style="margin-top:0px;margin-bottom:0px"><li dir="ltr" style="list-style-type:disc;font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" role="presentation" style="line-height:1.38;margin-top:21pt;margin-bottom:0pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">OriginalStructuralIndexer [2]: An initial implementation in which, to reuse some pieces of code, I extracted multiple helper methods that perform operations on the vectors loaded by the main loop.</span></p></li><li dir="ltr" style="list-style-type:disc;font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" role="presentation" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">LoadingInStepStructuralIndexer [3]: A modified version of the OriginalStructuralIndexer in which vector loading is done in the method called from the loop; performance is significantly better.</span></p></li><li dir="ltr" style="list-style-type:disc;font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" role="presentation" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">InlinedStepStructuralIndexer [4]: More operations on vectors are manually inlined; performance is slightly better compared to the LoadingInStepStructuralIndexer.</span></p></li><li dir="ltr" style="list-style-type:disc;font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline;white-space:pre"><p dir="ltr" role="presentation" style="line-height:1.38;margin-top:0pt;margin-bottom:21pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">InlinedIndexStructuralIndexer [5]: All operations on vectors are manually inlined in the main loop; performance is the best out of all the implementations.</span></p></li></ul><p dir="ltr" style="line-height:1.38;margin-top:15pt;margin-bottom:0pt;padding:0pt 0pt 15pt"><span style="font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">Please take a look at the comments at the top of each class. They include the results I obtained from running benchmarks [6] of the implementations on my desktop (256-bit registers, Temurin-21.0.1). If you would like to run the benchmarks, please follow the instructions in the README [7].</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:15pt"><span style="font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">Overall, the most surprising to me was the poor performance of the OriginalStructuralIndexer, which I assume was caused by vector boxing and the JIT’s inability to inline the helper methods. I have two questions regarding this:</span></p><ol style="margin-top:0px;margin-bottom:0px"><li><span style="background-color:transparent;color:rgb(13,13,13);font-family:Arial,sans-serif">As a general rule, to achieve the best possible performance, should all vector operations in hot methods/loops always be manually inlined? If that’s the case, is there still a possibility that even in fully inlined code vector boxing occurs?</span></li><li style="list-style-type:decimal;font-family:Arial,sans-serif;color:rgb(13,13,13);background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline;white-space:pre"><p role="presentation" style="line-height:1.38;margin-top:0pt;margin-bottom:21pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">Are vectors declared as static final fields guaranteed to be inlined properly in methods/loops using them, or is it safer to always explicitly create them within methods/loops?</span></p></li></ol><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;background-color:transparent;font-family:Arial,sans-serif;color:rgb(13,13,13);vertical-align:baseline">Best regards,</span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;background-color:transparent;font-family:Arial,sans-serif;color:rgb(13,13,13);vertical-align:baseline"><br></span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;background-color:transparent;font-family:Arial,sans-serif;color:rgb(13,13,13);vertical-align:baseline">Piotr</span><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;background-color:transparent;font-family:Arial,sans-serif;color:rgb(13,13,13);vertical-align:baseline"><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;background-color:transparent;font-family:Arial,sans-serif;color:rgb(13,13,13);vertical-align:baseline">[1] <a href="https://github.com/simdjson/simdjson-java">https://github.com/simdjson/simdjson-java</a></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;background-color:transparent;font-family:Arial,sans-serif;color:rgb(13,13,13);vertical-align:baseline">[2]  <a href="https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/OriginalStructuralIndexer.java">https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/OriginalStructuralIndexer.java</a><br>[3] <a href="https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/LoadingInStepStructuralIndexer.java">https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/LoadingInStepStructuralIndexer.java</a><br>[4] <a href="https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedStepStructuralIndexer.java">https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedStepStructuralIndexer.java</a><br>[5] <a href="https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedIndexStructuralIndexer.java">https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/main/java/io/github/piotrrzysko/simdjson/InlinedIndexStructuralIndexer.java</a><br>[6] <a href="https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/jmh/java/io/github/piotrrzysko/StructuralIndexerBenchmark.java">https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/src/jmh/java/io/github/piotrrzysko/StructuralIndexerBenchmark.java</a><br>[7] <a href="https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/README.md">https://github.com/piotrrzysko/vector-api-benchmarks/blob/main/README.md</a><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;background-color:transparent;font-family:Arial,sans-serif;color:rgb(13,13,13);vertical-align:baseline"><br></span></div></div>