Array addition and array sum Panama benchmarks

Antoine Chambille ach at activeviam.com
Mon Sep 30 12:16:23 UTC 2024


Hi Maurizio, thanks for the quick response. Looking forward to it.
-Antoine

On Mon, Sep 30, 2024 at 2:11 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Hi Antoine,
> auto-vectorization on memory segments doesn't work in some cases. This
> issue is mostly due to:
>
> https://bugs.openjdk.org/browse/JDK-8324751
>
> That is, when working with a "source" and a "target" segment, if the
> auto-vectorizer cannot prove that the two segments are disjoint, no
> vectorization occurs.
>
> This is an issue for operations like add, or copy, but it's not an issue
> with something like MemorySegment::fill (as that method only works on a
> single segment).
>
> We hope to be able to make some progress on this issue, as that will allow
> 3rd party routines on memory segment to enjoy vectorization too w/o the
> need of having an intrinsics in the JDK.
>
> Maurizio
>
>
>
>
> On 30/09/2024 13:04, Antoine Chambille wrote:
>
> Hello everyone,
>
> I've rebuilt the latest OpenJDK (24) from
> https://github.com/openjdk/panama-vector and run the arrays addition
> benchmark another time:
>
> AddBenchmark
>  .scalarArrayArray            thrpt    5   6487636 ops/s
>  .scalarArrayArrayLongStride  thrpt    5   1001515 ops/s
>  .scalarSegmentArray          thrpt    5   1747531 ops/s
>  .scalarSegmentSegment        thrpt    5   1154193 ops/s
>  .scalarUnsafeArray           thrpt    5   6970073 ops/s
>  .scalarUnsafeUnsafe          thrpt    5   1246625 ops/s
>  .unrolledArrayArray          thrpt    5   1251824 ops/s
>  .unrolledSegmentArray        thrpt    5   1694164 ops/s
>  .unrolledUnsafeArray         thrpt    5   5043685 ops/s
>  .unrolledUnsafeUnsafe        thrpt    5   1197024 ops/s
>  .vectorArrayArray            thrpt    5   7200224 ops/s
>  .vectorArraySegment          thrpt    5   7377553 ops/s
>  .vectorSegmentArray          thrpt    5   7263505 ops/s
>  .vectorSegmentSegment        thrpt    5   7143647 ops/s
>
>
>    - Performance using the vector API is now very consistent and good
>    across arrays and segments.
>    - Reading and writing from/to segments still seems to be disrupting
>    auto-vectorization. Reading with Unsafe works well but it's marked for
>    removal.
>    - Less important, manual unrolling also seems to be disrupting
>    auto-vectorization.
>
>
>
> Best,
> -Antoine
>
> On Tue, Mar 26, 2024 at 5:40 PM Vladimir Ivanov <
> vladimir.x.ivanov at oracle.com> wrote:
>
>>
>> >> Personally, I prefer to see vectorizer handling "MoveX2Y (LoadX mem)"
>> >> => "VectorReinterpret (LoadVector mem)" well and then introduce rules
>> to
>> >> strength-reduce it to mismatched access.
>> >
>> > Do I understand you right that you're saying the vector node for MoveL2D
>> > (for instance) is VectorReinterpret so we could vectorize the code.
>> >
>> > Are you then suggesting that we can transform:
>> >
>> > (VectorReinterpret (LoadVector mem)
>> >
>> > into:
>> >
>> > (LoadVector mem)
>> >
>> > with that LoadVector a mismatched access?
>>
>> Yes, but thinking more about it, the latter step may be optional. For
>> example, VectorReinterpret implementation on x86 is a no-op, so not much
>> gained from folding VectorReinterpret+LoadVector into a mismatched
>> LoadVector.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240930/82569944/attachment-0001.htm>


More information about the panama-dev mailing list