<div dir="ltr">Sorry for forgetting to include the mailing list.<br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>Von: <strong class="gmail_sendername" dir="auto">Dirk Toewe</strong> <span dir="auto"><<a href="mailto:dirktoewe@gmail.com">dirktoewe@gmail.com</a>></span><br>Date: Di., 28. Feb. 2023 um 21:20 Uhr<br>Subject: Re: Performance of FloatVector::pow vs. equivalent FloatVector::mul (oracle-jdk-19.0.2, Intel i7 8700B)<br>To: Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com">paul.sandoz@oracle.com</a>><br></div><br><br><div dir="ltr">Hi,<div><br></div><div>if the exponent is a compile time constant integer, it would be really useful if either the compiler or the JIT would optimize away with it. Wouldn't this still beat Intel's Vector Math Library (unless the JIT does some template/macro magic)?</div><div><br></div><div>Regards</div><div>Dirk</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Am Di., 28. Feb. 2023 um 21:08 Uhr schrieb Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com" target="_blank">paul.sandoz@oracle.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Alex,<br>

<br>

The performance difference you observe is because the pow operation is falling back to scalar code (Math.pow on each lane element) and not using vector instructions.<br>

<br>

On x86 linux or windows you should observe better performance of the pow operation because it should leverage code from Intel’s Short Vector Math Library [1], but that code OS specific and is not currently ported on Mac OS.<br>

<br>

Paul.<br>

<br>

[1] <br>

<a href="https://github.com/openjdk/jdk/tree/master/src/jdk.incubator.vector/linux/native/libjsvml" rel="noreferrer" target="_blank">https://github.com/openjdk/jdk/tree/master/src/jdk.incubator.vector/linux/native/libjsvml</a><br>

<a href="https://github.com/openjdk/jdk/tree/master/src/jdk.incubator.vector/windows/native/libjsvml" rel="noreferrer" target="_blank">https://github.com/openjdk/jdk/tree/master/src/jdk.incubator.vector/windows/native/libjsvml</a><br>

<br>

> On Feb 26, 2023, at 8:01 PM, Alex K <<a href="mailto:aklibisz@gmail.com" target="_blank">aklibisz@gmail.com</a>> wrote:<br>

> <br>

> Hello,<br>

> <br>

> I have a question, possibly a bug, to ask/report regarding performance with the jdk.incubator.vector.FloatVector class.<br>

> <br>

> Specifically, given a FloatVector fv, I've found that calling fv.mul(fv) is ~40x faster than calling fv.pow(2)<br>

> <br>

> Here is an example JMH benchmark: <a href="https://github.com/alexklibisz/site-projects/blob/782dcd53d3ee09c93f65b660c8ed4fd030a8889a/jdk-incubator-vector-optimizations/src/main/scala/com/alexklibisz/BenchPowVsMul.scala" rel="noreferrer" target="_blank">https://github.com/alexklibisz/site-projects/blob/782dcd53d3ee09c93f65b660c8ed4fd030a8889a/jdk-incubator-vector-optimizations/src/main/scala/com/alexklibisz/BenchPowVsMul.scala</a><br>

> <br>

> The results look like this, on my 2018 Mac Mini, w/ Intel i7 8700B, running oracle-jdk-19.0.2:<br>

> <br>

> <image.png><br>

> <br>

> As far as I can tell, the two operations produce equivalent results, yet one is significantly faster than the other.<br>

> <br>

> I'm eager to learn if this is expected, a regression, or something else.<br>

> <br>

> Thanks,<br>

> Alex Klibisz<br>

<br>

</blockquote></div>

</div></div>