<div dir="ltr">Hello everyone,<br><br>I've rebuilt the latest OpenJDK (24) from <a href="https://github.com/openjdk/panama-vector">https://github.com/openjdk/panama-vector</a> and run the arrays addition benchmark another time:<br><br><font face="monospace">AddBenchmark<br> .scalarArrayArray thrpt 5 6487636 ops/s<br> .scalarArrayArrayLongStride thrpt 5 1001515 ops/s<br> .scalarSegmentArray thrpt 5 1747531 ops/s<br> .scalarSegmentSegment thrpt 5 1154193 ops/s<br> .scalarUnsafeArray thrpt 5 6970073 ops/s<br> .scalarUnsafeUnsafe thrpt 5 1246625 ops/s<br> .unrolledArrayArray thrpt 5 1251824 ops/s<br> .unrolledSegmentArray thrpt 5 1694164 ops/s<br> .unrolledUnsafeArray thrpt 5 5043685 ops/s<br> .unrolledUnsafeUnsafe thrpt 5 1197024 ops/s<br> .vectorArrayArray thrpt 5 7200224 ops/s<br> .vectorArraySegment thrpt 5 7377553 ops/s<br> .vectorSegmentArray thrpt 5 7263505 ops/s<br> .vectorSegmentSegment thrpt 5 7143647 ops/s</font><br><br><ul><li>Performance using the vector API is now very consistent and good across arrays and segments.</li><li>Reading and writing from/to segments still seems to be disrupting auto-vectorization. Reading with Unsafe works well but it's marked for removal.</li><li>Less important, manual unrolling also seems to be disrupting auto-vectorization.</li></ul><br><br>Best,<br>-Antoine<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 26, 2024 at 5:40 PM Vladimir Ivanov <<a href="mailto:vladimir.x.ivanov@oracle.com">vladimir.x.ivanov@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
>> Personally, I prefer to see vectorizer handling "MoveX2Y (LoadX mem)"<br>
>> => "VectorReinterpret (LoadVector mem)" well and then introduce rules to<br>
>> strength-reduce it to mismatched access.<br>
> <br>
> Do I understand you right that you're saying the vector node for MoveL2D<br>
> (for instance) is VectorReinterpret so we could vectorize the code.<br>
> <br>
> Are you then suggesting that we can transform:<br>
> <br>
> (VectorReinterpret (LoadVector mem)<br>
> <br>
> into:<br>
> <br>
> (LoadVector mem)<br>
> <br>
> with that LoadVector a mismatched access?<br>
<br>
Yes, but thinking more about it, the latter step may be optional. For <br>
example, VectorReinterpret implementation on x86 is a no-op, so not much <br>
gained from folding VectorReinterpret+LoadVector into a mismatched <br>
LoadVector.<br>
<br>
Best regards,<br>
Vladimir Ivanov<br>
</blockquote></div>