[External] : Re: Potential performance regression with FFM compared to Unsafe

Sat Apr 19 16:40:14 UTC 2025

Sorry, message got cut off, it made no difference for read, but it brought
the writes to the same level as the version without the varhandles

On Sat, Apr 19, 2025, 19:31 Tomer Zeltzer <tomerr90 at gmail.com> wrote:

> Thanks Chen, adding static seemed to have had no effect
>
> On Sat, Apr 19, 2025, 19:20 Chen Liang <chen.l.liang at oracle.com> wrote:
>
>> Hi Tomer,
>> Note that VarHandle instances must be constant folded against the code
>> for peak performance, so they must be in static final fields. You left them
>> in instance final fields, which C2 cannot inline because C2 anticipates
>> serialization or reflection to change such a field.
>>
>> Regards, Chen
>> ------------------------------
>> *From:* Tomer Zeltzer <tomerr90 at gmail.com>
>> *Sent:* Saturday, April 19, 2025 11:09 AM
>> *To:* Chen Liang <chen.l.liang at oracle.com>
>> *Cc:* panama-dev at openjdk.org <panama-dev at openjdk.org>
>> *Subject:* Re: [External] : Re: Potential performance regression with
>> FFM compared to Unsafe
>>
>>
>> Tried pastebin.com/YvE02tgj
>> <https://urldefense.com/v3/__http://pastebin.com/YvE02tgj__;!!ACWV5N9M2RV99hQ!Lp9Z_ABo_ogsHtdpwsnqbBeJyc_nOsHgZFCLI4l4oKNnqsQv_20S6HlySX4DwuZSYf2Tx-oSImhYXBy4T5o$>,
>> its like 100x slower
>>
>> On Sat, Apr 19, 2025, 01:35 Chen Liang <chen.l.liang at oracle.com> wrote:
>>
>> Hi Tomer,
>> Note that your way of accessing the layout might not be the best; our
>> recommended way of element access is to construct a larger layout (like a
>> group layout representing a struct), and then obtain var handles with
>> varHandle(PathElement). These var handles perform the same access checks,
>> and these duplicate checks might be merged into one by the JIT compiler;
>> such is called "loop hoisting" and is seen in JDK benchmarks. I haven't got
>> time to try this out on your benchmarks yet, but I hope this might be able
>> to address some of the regressions you have observed.
>>
>> In addition, for particular structs, we are planning record and interface
>> mappers; record mappers perform a single read to copy native data to
>> immutable objects, while interface mappers are more memory efficient and
>> lazy, but can suffer from memory tearing issues. Those might be useful for
>> the different scenarios you have mentioned as well.
>>
>> Chen
>> ------------------------------
>> *From:* Tomer Zeltzer <tomerr90 at gmail.com>
>> *Sent:* Friday, April 18, 2025 4:45 PM
>> *To:* Chen Liang <chen.l.liang at oracle.com>
>> *Cc:* panama-dev at openjdk.org <panama-dev at openjdk.org>
>> *Subject:* [External] : Re: Potential performance regression with FFM
>> compared to Unsafe
>>
>>
>> Thank you for testing this out Chen!
>> A number of other people were able to reproduce the on heap results so
>> not sure what to say here but thats the less important conclusion I think.
>> For off heap, having the memory segment as a final field sounds like
>> something that can be relevant for a very few niche use cases, if at all...
>> If this cant be optimized further, without the final, this means a
>> significant performance hit for a lot of use cases... off the top of my
>> head, libraries like zstd and gzip that do jni bindings
>>
>> On Fri, Apr 18, 2025, 01:17 Chen Liang <chen.l.liang at oracle.com> wrote:
>>
>> Hello, I think the observed performance difference is probably due to the
>> heap array being static final. I tested on latest mainline, and ffm is
>> consistently slower without a static final object that it can constant fold
>> against: it exhibited similar performance for auto arena 100 vs byte array
>> 100, both having a significant overhead compared to Unsafe, unless the byte
>> array is a constant (in a static final field). Meanwhile, I cannot
>> reproduce FFM being faster than Unsafe for heap access: in the best case
>> FFM is still slightly slower than Unsafe.
>>
>> For context, I used the source code at
>> https://github.com/tomerr90/UnsafeVSFMA/blob/main/src/main/java/org/example/FMASerDe.java
>> <https://urldefense.com/v3/__https://github.com/tomerr90/UnsafeVSFMA/blob/main/src/main/java/org/example/FMASerDe.java__;!!ACWV5N9M2RV99hQ!Lxvv0mXLndq2JUXA3nMUEyn-744ytcvs5eJAjifpuCqRoGtqtYNwPrN8QnhNzqGoP2hkW7qBYd1IkVXpAIs$> and
>> edited around. I recommend testing against 22 or later releases where FFM
>> has finalized; the preview feature on 21 is no longer maintained.
>>
>> Regards, Chen Liang
>> ------------------------------
>> *From:* panama-dev <panama-dev-retn at openjdk.org> on behalf of Tomer
>> Zeltzer <tomerr90 at gmail.com>
>> *Sent:* Thursday, April 17, 2025 6:31 AM
>> *To:* panama-dev at openjdk.org <panama-dev at openjdk.org>
>> *Subject:* Potential performance regression with FFM compared to Unsafe
>>
>> Hey all!
>> First time emailing such a list so apologies if somwthing is "off
>> protocol".
>> I wrote the following article where I benchmarked FFM and Unsafe in JDK21
>> https://itnext.io/javas-new-fma-renaissance-or-decay-372a2aee5f32
>> <https://urldefense.com/v3/__https://itnext.io/javas-new-fma-renaissance-or-decay-372a2aee5f32__;!!ACWV5N9M2RV99hQ!Lxvv0mXLndq2JUXA3nMUEyn-744ytcvs5eJAjifpuCqRoGtqtYNwPrN8QnhNzqGoP2hkW7qBYd1IekAyXVw$>
>>
>> Conclusions were that FFM was 42% faster for on heap accesses while 67%
>> slower for off heap, which is a bit weird.
>> Code is also linked in the article.
>> Would love hearing your thoughts!
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20250419/3e31c6b3/attachment-0001.htm>