Fwd: Performance of FloatVector::pow vs. equivalent FloatVector::mul (oracle-jdk-19.0.2, Intel i7 8700B)

Dirk Toewe dirktoewe at gmail.com
Tue Feb 28 20:22:32 UTC 2023


Sorry for forgetting to include the mailing list.

---------- Forwarded message ---------
Von: Dirk Toewe <dirktoewe at gmail.com>
Date: Di., 28. Feb. 2023 um 21:20 Uhr
Subject: Re: Performance of FloatVector::pow vs. equivalent
FloatVector::mul (oracle-jdk-19.0.2, Intel i7 8700B)
To: Paul Sandoz <paul.sandoz at oracle.com>


Hi,

if the exponent is a compile time constant integer, it would be really
useful if either the compiler or the JIT would optimize away with it.
Wouldn't this still beat Intel's Vector Math Library (unless the JIT does
some template/macro magic)?

Regards
Dirk

Am Di., 28. Feb. 2023 um 21:08 Uhr schrieb Paul Sandoz <
paul.sandoz at oracle.com>:

> Hi Alex,
>
> The performance difference you observe is because the pow operation is
> falling back to scalar code (Math.pow on each lane element) and not using
> vector instructions.
>
> On x86 linux or windows you should observe better performance of the pow
> operation because it should leverage code from Intel’s Short Vector Math
> Library [1], but that code OS specific and is not currently ported on Mac
> OS.
>
> Paul.
>
> [1]
>
> https://github.com/openjdk/jdk/tree/master/src/jdk.incubator.vector/linux/native/libjsvml
>
> https://github.com/openjdk/jdk/tree/master/src/jdk.incubator.vector/windows/native/libjsvml
>
> > On Feb 26, 2023, at 8:01 PM, Alex K <aklibisz at gmail.com> wrote:
> >
> > Hello,
> >
> > I have a question, possibly a bug, to ask/report regarding performance
> with the jdk.incubator.vector.FloatVector class.
> >
> > Specifically, given a FloatVector fv, I've found that calling fv.mul(fv)
> is ~40x faster than calling fv.pow(2)
> >
> > Here is an example JMH benchmark:
> https://github.com/alexklibisz/site-projects/blob/782dcd53d3ee09c93f65b660c8ed4fd030a8889a/jdk-incubator-vector-optimizations/src/main/scala/com/alexklibisz/BenchPowVsMul.scala
> >
> > The results look like this, on my 2018 Mac Mini, w/ Intel i7 8700B,
> running oracle-jdk-19.0.2:
> >
> > <image.png>
> >
> > As far as I can tell, the two operations produce equivalent results, yet
> one is significantly faster than the other.
> >
> > I'm eager to learn if this is expected, a regression, or something else.
> >
> > Thanks,
> > Alex Klibisz
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230228/21a1855b/attachment-0001.htm>


More information about the panama-dev mailing list