Array addition and array sum Panama benchmarks
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Sep 30 12:10:55 UTC 2024
Hi Antoine,
auto-vectorization on memory segments doesn't work in some cases. This
issue is mostly due to:
https://bugs.openjdk.org/browse/JDK-8324751
That is, when working with a "source" and a "target" segment, if the
auto-vectorizer cannot prove that the two segments are disjoint, no
vectorization occurs.
This is an issue for operations like add, or copy, but it's not an issue
with something like MemorySegment::fill (as that method only works on a
single segment).
We hope to be able to make some progress on this issue, as that will
allow 3rd party routines on memory segment to enjoy vectorization too
w/o the need of having an intrinsics in the JDK.
Maurizio
On 30/09/2024 13:04, Antoine Chambille wrote:
> Hello everyone,
>
> I've rebuilt the latest OpenJDK (24) from
> https://github.com/openjdk/panama-vector and run the arrays addition
> benchmark another time:
>
> AddBenchmark
> .scalarArrayArray thrpt 5 6487636 ops/s
> .scalarArrayArrayLongStride thrpt 5 1001515 ops/s
> .scalarSegmentArray thrpt 5 1747531 ops/s
> .scalarSegmentSegment thrpt 5 1154193 ops/s
> .scalarUnsafeArray thrpt 5 6970073 ops/s
> .scalarUnsafeUnsafe thrpt 5 1246625 ops/s
> .unrolledArrayArray thrpt 5 1251824 ops/s
> .unrolledSegmentArray thrpt 5 1694164 ops/s
> .unrolledUnsafeArray thrpt 5 5043685 ops/s
> .unrolledUnsafeUnsafe thrpt 5 1197024 ops/s
> .vectorArrayArray thrpt 5 7200224 ops/s
> .vectorArraySegment thrpt 5 7377553 ops/s
> .vectorSegmentArray thrpt 5 7263505 ops/s
> .vectorSegmentSegment thrpt 5 7143647 ops/s
>
> * Performance using the vector API is now very consistent and good
> across arrays and segments.
> * Reading and writing from/to segments still seems to be disrupting
> auto-vectorization. Reading with Unsafe works well but it's marked
> for removal.
> * Less important, manual unrolling also seems to be disrupting
> auto-vectorization.
>
>
>
> Best,
> -Antoine
>
> On Tue, Mar 26, 2024 at 5:40 PM Vladimir Ivanov
> <vladimir.x.ivanov at oracle.com> wrote:
>
>
> >> Personally, I prefer to see vectorizer handling "MoveX2Y (LoadX
> mem)"
> >> => "VectorReinterpret (LoadVector mem)" well and then introduce
> rules to
> >> strength-reduce it to mismatched access.
> >
> > Do I understand you right that you're saying the vector node for
> MoveL2D
> > (for instance) is VectorReinterpret so we could vectorize the code.
> >
> > Are you then suggesting that we can transform:
> >
> > (VectorReinterpret (LoadVector mem)
> >
> > into:
> >
> > (LoadVector mem)
> >
> > with that LoadVector a mismatched access?
>
> Yes, but thinking more about it, the latter step may be optional. For
> example, VectorReinterpret implementation on x86 is a no-op, so
> not much
> gained from folding VectorReinterpret+LoadVector into a mismatched
> LoadVector.
>
> Best regards,
> Vladimir Ivanov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240930/25f674b8/attachment.htm>
More information about the panama-dev
mailing list