[External] : Re: Potential performance regression with FFM compared to Unsafe
Tomer Zeltzer
tomerr90 at gmail.com
Sat Apr 19 17:40:10 UTC 2025
Same in 24 for the reads, there seems to be some "regression" with the
writes as static now does not fix it to non-VarHandle version level.
On Sat, Apr 19, 2025, 19:49 Chen Liang <chen.l.liang at oracle.com> wrote:
> Interesting; can you please try running this benchmark on JDK 24? 21 was
> from over a year ago, during which FFM API developers have investigated and
> addressed a lot of performance issues. I don't have a device available
> right now, but if you replicated the read performance issue, I can
> investigate when I come back later.
>
> Chen
> ------------------------------
> *From:* Tomer Zeltzer <tomerr90 at gmail.com>
> *Sent:* Saturday, April 19, 2025 11:40 AM
> *To:* Chen Liang <chen.l.liang at oracle.com>
> *Cc:* panama-dev at openjdk.org <panama-dev at openjdk.org>
> *Subject:* Re: [External] : Re: Potential performance regression with FFM
> compared to Unsafe
>
>
> Sorry, message got cut off, it made no difference for read, but it brought
> the writes to the same level as the version without the varhandles
>
> On Sat, Apr 19, 2025, 19:31 Tomer Zeltzer <tomerr90 at gmail.com> wrote:
>
> Thanks Chen, adding static seemed to have had no effect
>
> On Sat, Apr 19, 2025, 19:20 Chen Liang <chen.l.liang at oracle.com> wrote:
>
> Hi Tomer,
> Note that VarHandle instances must be constant folded against the code for
> peak performance, so they must be in static final fields. You left them in
> instance final fields, which C2 cannot inline because C2 anticipates
> serialization or reflection to change such a field.
>
> Regards, Chen
> ------------------------------
> *From:* Tomer Zeltzer <tomerr90 at gmail.com>
> *Sent:* Saturday, April 19, 2025 11:09 AM
> *To:* Chen Liang <chen.l.liang at oracle.com>
> *Cc:* panama-dev at openjdk.org <panama-dev at openjdk.org>
> *Subject:* Re: [External] : Re: Potential performance regression with FFM
> compared to Unsafe
>
>
> Tried pastebin.com/YvE02tgj
> <https://urldefense.com/v3/__http://pastebin.com/YvE02tgj__;!!ACWV5N9M2RV99hQ!Lp9Z_ABo_ogsHtdpwsnqbBeJyc_nOsHgZFCLI4l4oKNnqsQv_20S6HlySX4DwuZSYf2Tx-oSImhYXBy4T5o$>,
> its like 100x slower
>
> On Sat, Apr 19, 2025, 01:35 Chen Liang <chen.l.liang at oracle.com> wrote:
>
> Hi Tomer,
> Note that your way of accessing the layout might not be the best; our
> recommended way of element access is to construct a larger layout (like a
> group layout representing a struct), and then obtain var handles with
> varHandle(PathElement). These var handles perform the same access checks,
> and these duplicate checks might be merged into one by the JIT compiler;
> such is called "loop hoisting" and is seen in JDK benchmarks. I haven't got
> time to try this out on your benchmarks yet, but I hope this might be able
> to address some of the regressions you have observed.
>
> In addition, for particular structs, we are planning record and interface
> mappers; record mappers perform a single read to copy native data to
> immutable objects, while interface mappers are more memory efficient and
> lazy, but can suffer from memory tearing issues. Those might be useful for
> the different scenarios you have mentioned as well.
>
> Chen
> ------------------------------
> *From:* Tomer Zeltzer <tomerr90 at gmail.com>
> *Sent:* Friday, April 18, 2025 4:45 PM
> *To:* Chen Liang <chen.l.liang at oracle.com>
> *Cc:* panama-dev at openjdk.org <panama-dev at openjdk.org>
> *Subject:* [External] : Re: Potential performance regression with FFM
> compared to Unsafe
>
>
> Thank you for testing this out Chen!
> A number of other people were able to reproduce the on heap results so not
> sure what to say here but thats the less important conclusion I think.
> For off heap, having the memory segment as a final field sounds like
> something that can be relevant for a very few niche use cases, if at all...
> If this cant be optimized further, without the final, this means a
> significant performance hit for a lot of use cases... off the top of my
> head, libraries like zstd and gzip that do jni bindings
>
> On Fri, Apr 18, 2025, 01:17 Chen Liang <chen.l.liang at oracle.com> wrote:
>
> Hello, I think the observed performance difference is probably due to the
> heap array being static final. I tested on latest mainline, and ffm is
> consistently slower without a static final object that it can constant fold
> against: it exhibited similar performance for auto arena 100 vs byte array
> 100, both having a significant overhead compared to Unsafe, unless the byte
> array is a constant (in a static final field). Meanwhile, I cannot
> reproduce FFM being faster than Unsafe for heap access: in the best case
> FFM is still slightly slower than Unsafe.
>
> For context, I used the source code at
> https://github.com/tomerr90/UnsafeVSFMA/blob/main/src/main/java/org/example/FMASerDe.java
> <https://urldefense.com/v3/__https://github.com/tomerr90/UnsafeVSFMA/blob/main/src/main/java/org/example/FMASerDe.java__;!!ACWV5N9M2RV99hQ!Lxvv0mXLndq2JUXA3nMUEyn-744ytcvs5eJAjifpuCqRoGtqtYNwPrN8QnhNzqGoP2hkW7qBYd1IkVXpAIs$> and
> edited around. I recommend testing against 22 or later releases where FFM
> has finalized; the preview feature on 21 is no longer maintained.
>
> Regards, Chen Liang
> ------------------------------
> *From:* panama-dev <panama-dev-retn at openjdk.org> on behalf of Tomer
> Zeltzer <tomerr90 at gmail.com>
> *Sent:* Thursday, April 17, 2025 6:31 AM
> *To:* panama-dev at openjdk.org <panama-dev at openjdk.org>
> *Subject:* Potential performance regression with FFM compared to Unsafe
>
> Hey all!
> First time emailing such a list so apologies if somwthing is "off
> protocol".
> I wrote the following article where I benchmarked FFM and Unsafe in JDK21
> https://itnext.io/javas-new-fma-renaissance-or-decay-372a2aee5f32
> <https://urldefense.com/v3/__https://itnext.io/javas-new-fma-renaissance-or-decay-372a2aee5f32__;!!ACWV5N9M2RV99hQ!Lxvv0mXLndq2JUXA3nMUEyn-744ytcvs5eJAjifpuCqRoGtqtYNwPrN8QnhNzqGoP2hkW7qBYd1IekAyXVw$>
>
> Conclusions were that FFM was 42% faster for on heap accesses while 67%
> slower for off heap, which is a bit weird.
> Code is also linked in the article.
> Would love hearing your thoughts!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20250419/54eb3e11/attachment.htm>
More information about the panama-dev
mailing list