performance: arrayElementVarHandle / calculated index / aligned vs unaligned

Matthias Ernst matthias at mernst.org
Wed Dec 18 08:26:42 UTC 2024


Hi,

I'm trying to use the foreign memory api to interpret some variable-length
encoded data, where an offset vector encodes the start offset of each
stride. Accessing element `i` in this case involves reading `offset[i+1]`
in addition to `offset[i]`. The offset vector is modeled as a
`JAVA_LONG.arrayElementVarHandle()`.

Just out of curiosity about bounds and alignment checks I switched the
layout to JAVA_LONG_UNALIGNED for reading (data is still aligned) and I saw
a large difference in performance where I didn't expect one, and it seems
to boil down to the computed index `endOffset[i+1]` access, not for the
`[i]` case. My expectation would have been that all variants exhibit the
same performance, since alignment checks would be moved out of the loop.

A micro-benchmark (attached) to demonstrate:
long-aligned memory segment, looping over the same elements in 6 different
ways:
{aligned, unaligned} x {segment[i] , segment[i+1],  segment[i+1] (w/ base
offset) } gives very different results for aligned[i+1] (but not for
aligned[i]):

Benchmark                         Mode  Cnt    Score   Error  Units
Alignment.findAligned            thrpt       217.050          ops/s
Alignment.findAlignedPlusOne     thrpt       110.366          ops/s. <=
#####
Alignment.findAlignedNext    thrpt       110.377          ops/s. <= #####
Alignment.findUnaligned          thrpt       216.591          ops/s
Alignment.findUnalignedPlusOne   thrpt       215.843          ops/s
Alignment.findUnalignedNext  thrpt       216.483          ops/s

openjdk version "23.0.1" 2024-10-15
OpenJDK Runtime Environment (build 23.0.1+11-39)
OpenJDK 64-Bit Server VM (build 23.0.1+11-39, mixed mode, sharing)
Macbook Air M3

Needless to say that the difference was smaller with more app code in play,
but large enough to give me pause. Likely it wouldn't matter at all but I
want to have a better idea which design choices to pay attention to. With
the foreign memory api, I find it a bit difficult to distinguish
convenience from performance-relevant options (e.g. using path expressions
vs computed offsets vs using a base offset. Besides "make layouts and
varhandles static final" what would be other rules of thumb?)

Thx
Matthias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20241218/139228d2/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Alignment.java
Type: application/octet-stream
Size: 3569 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20241218/139228d2/Alignment-0001.java>


More information about the panama-dev mailing list