Array addition and array sum Panama benchmarks
Antoine Chambille
ach at activeviam.com
Mon Sep 30 12:16:23 UTC 2024
Hi Maurizio, thanks for the quick response. Looking forward to it.
-Antoine
On Mon, Sep 30, 2024 at 2:11 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:
> Hi Antoine,
> auto-vectorization on memory segments doesn't work in some cases. This
> issue is mostly due to:
>
> https://bugs.openjdk.org/browse/JDK-8324751
>
> That is, when working with a "source" and a "target" segment, if the
> auto-vectorizer cannot prove that the two segments are disjoint, no
> vectorization occurs.
>
> This is an issue for operations like add, or copy, but it's not an issue
> with something like MemorySegment::fill (as that method only works on a
> single segment).
>
> We hope to be able to make some progress on this issue, as that will allow
> 3rd party routines on memory segment to enjoy vectorization too w/o the
> need of having an intrinsics in the JDK.
>
> Maurizio
>
>
>
>
> On 30/09/2024 13:04, Antoine Chambille wrote:
>
> Hello everyone,
>
> I've rebuilt the latest OpenJDK (24) from
> https://github.com/openjdk/panama-vector and run the arrays addition
> benchmark another time:
>
> AddBenchmark
> .scalarArrayArray thrpt 5 6487636 ops/s
> .scalarArrayArrayLongStride thrpt 5 1001515 ops/s
> .scalarSegmentArray thrpt 5 1747531 ops/s
> .scalarSegmentSegment thrpt 5 1154193 ops/s
> .scalarUnsafeArray thrpt 5 6970073 ops/s
> .scalarUnsafeUnsafe thrpt 5 1246625 ops/s
> .unrolledArrayArray thrpt 5 1251824 ops/s
> .unrolledSegmentArray thrpt 5 1694164 ops/s
> .unrolledUnsafeArray thrpt 5 5043685 ops/s
> .unrolledUnsafeUnsafe thrpt 5 1197024 ops/s
> .vectorArrayArray thrpt 5 7200224 ops/s
> .vectorArraySegment thrpt 5 7377553 ops/s
> .vectorSegmentArray thrpt 5 7263505 ops/s
> .vectorSegmentSegment thrpt 5 7143647 ops/s
>
>
> - Performance using the vector API is now very consistent and good
> across arrays and segments.
> - Reading and writing from/to segments still seems to be disrupting
> auto-vectorization. Reading with Unsafe works well but it's marked for
> removal.
> - Less important, manual unrolling also seems to be disrupting
> auto-vectorization.
>
>
>
> Best,
> -Antoine
>
> On Tue, Mar 26, 2024 at 5:40 PM Vladimir Ivanov <
> vladimir.x.ivanov at oracle.com> wrote:
>
>>
>> >> Personally, I prefer to see vectorizer handling "MoveX2Y (LoadX mem)"
>> >> => "VectorReinterpret (LoadVector mem)" well and then introduce rules
>> to
>> >> strength-reduce it to mismatched access.
>> >
>> > Do I understand you right that you're saying the vector node for MoveL2D
>> > (for instance) is VectorReinterpret so we could vectorize the code.
>> >
>> > Are you then suggesting that we can transform:
>> >
>> > (VectorReinterpret (LoadVector mem)
>> >
>> > into:
>> >
>> > (LoadVector mem)
>> >
>> > with that LoadVector a mismatched access?
>>
>> Yes, but thinking more about it, the latter step may be optional. For
>> example, VectorReinterpret implementation on x86 is a no-op, so not much
>> gained from folding VectorReinterpret+LoadVector into a mismatched
>> LoadVector.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240930/82569944/attachment-0001.htm>
More information about the panama-dev
mailing list