<div dir="ltr">Hello everyone,<br><br>I've rebuilt the latest OpenJDK (24) from <a href="https://github.com/openjdk/panama-vector">https://github.com/openjdk/panama-vector</a> and run the arrays addition benchmark another time:<br><br><font face="monospace">AddBenchmark<br> .scalarArrayArray            thrpt    5   6487636 ops/s<br> .scalarArrayArrayLongStride  thrpt    5   1001515 ops/s<br> .scalarSegmentArray          thrpt    5   1747531 ops/s<br> .scalarSegmentSegment        thrpt    5   1154193 ops/s<br> .scalarUnsafeArray           thrpt    5   6970073 ops/s<br> .scalarUnsafeUnsafe          thrpt    5   1246625 ops/s<br> .unrolledArrayArray          thrpt    5   1251824 ops/s<br> .unrolledSegmentArray        thrpt    5   1694164 ops/s<br> .unrolledUnsafeArray         thrpt    5   5043685 ops/s<br> .unrolledUnsafeUnsafe        thrpt    5   1197024 ops/s<br> .vectorArrayArray            thrpt    5   7200224 ops/s<br> .vectorArraySegment          thrpt    5   7377553 ops/s<br> .vectorSegmentArray          thrpt    5   7263505 ops/s<br> .vectorSegmentSegment        thrpt    5   7143647 ops/s</font><br><br><ul><li>Performance using the vector API is now very consistent and good across arrays and segments.</li><li>Reading and writing from/to segments still seems to be disrupting auto-vectorization. Reading with Unsafe works well but it's marked for removal.</li><li>Less important, manual unrolling also seems to be disrupting auto-vectorization.</li></ul><br><br>Best,<br>-Antoine<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 26, 2024 at 5:40 PM Vladimir Ivanov <<a href="mailto:vladimir.x.ivanov@oracle.com">vladimir.x.ivanov@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

>> Personally, I prefer to see vectorizer handling "MoveX2Y (LoadX mem)"<br>

>> => "VectorReinterpret (LoadVector mem)" well and then introduce rules to<br>

>> strength-reduce it to mismatched access.<br>

> <br>

> Do I understand you right that you're saying the vector node for MoveL2D<br>

> (for instance) is VectorReinterpret so we could vectorize the code.<br>

> <br>

> Are you then suggesting that we can transform:<br>

> <br>

> (VectorReinterpret (LoadVector mem)<br>

> <br>

> into:<br>

> <br>

> (LoadVector mem)<br>

> <br>

> with that LoadVector a mismatched access?<br>

<br>

Yes, but thinking more about it, the latter step may be optional. For <br>

example, VectorReinterpret implementation on x86 is a no-op, so not much <br>

gained from folding VectorReinterpret+LoadVector into a mismatched <br>

LoadVector.<br>

<br>

Best regards,<br>

Vladimir Ivanov<br>

</blockquote></div>