Performance impact of *_UNALIGNED when segment is aligned.

Lee Rhodes leerho at gmail.com
Tue May 20 17:21:43 UTC 2025


Per-Ake,
Thank you for a very comprehensive response!  I did read Emanuel Peter's
analysis and it is very interesting.

Your statement that "generally, specifying unaligned access is faster" was
a surprise and your explanation makes sense.  The API Note on every
JAVA_*_UNALIGNED Field in the ValueLayout class led me to believe that
unaligned access might be slower.

My question about vector operations was not about my intent to directly use
the Vector API, but in my use of MemorySegments, I am using a lot of the
built-in bulk operations such as the *copy(...)*[1].  And it is my
understanding that the internal MemorySegment implementation of these bulk
operations will try to use vectors if possible.  And since my application
uses MemorySegments both on-heap and off-heap I want to make sure I am not
doing anything that would inadvertently cause these bulk operations to be
slower.

A lesson I have learned from this discussion is that when creating a
MemorySegment on-heap, I should always create them using *long *arrays, if
possible, as that is the only way I can guarantee that the memory alignment
is at least 8 bytes. I will also need to take care that slices are created
on 8 byte boundaries.  This should improve the chances that the MS bulk
operations will be able to take advantage of vectors.

There isn't much I can do about MemorySegments or Buffers handed to me as
input.

[1] In the DataSketches implementations, there are many loop-algorithms
applied to arrays that are computing various mathematical quantities.  I
have found it generally faster to copy the array from the MS to a heap
array, perform the algorithm on the heap array, then, if required, copy the
result back to the MS -- as opposed to attempting to implement the
algorithm directly on the MS.  Thus, the importance of the speed of the
*copy(...)* operations!

Thanks again!

Lee.

On Mon, May 19, 2025 at 4:47 AM Per-Ake Minborg <per-ake.minborg at oracle.com>
wrote:

> Hi Lee, and thank you for your questions.
>
> Generally, the impact of unaligned/aligned access by the CPU using scalar
> or vector load or store operations depends on many factors, and it is
> neither desirable nor possible to document all variants. On some newer
> platforms, there is no significant difference for scalar operations in most
> cases, whereas there might be a significant difference on other platforms.
> Some operations must operate on aligned memory (e.g., CAS operations).
>
> If you are using an aligned layout (such as JAVA_INT), the alignment
> constraint must be asserted for each access, whereas this is not the case
> for the unaligned counterparts (e.g., JAVA_INT_UNALIGNED). So, generally,
> specifying unaligned access is faster. However, in many cases, the JIT
> compiler can hoist alignment checking making the difference smaller or even
> insignificant.
>
> Vector operations, on the other hand, are impacted by alignment and cache
> line crossing. Emanuel Peter has analyzed the impact of alignment with
> vector operations in this comprehensive analysis
> <https://github.com/openjdk/jdk/pull/25065>. In short, if you are working
> with the Vector API, you may also want to worry about alignment, because
> there can be a significant performance impact (30%+ in some cases).
>
> Best,
> Per Minborg
>
> ------------------------------
> *From:* panama-dev <panama-dev-retn at openjdk.org> on behalf of Lee Rhodes <
> leerho at gmail.com>
> *Sent:* Thursday, May 15, 2025 6:55 PM
> *To:* panama-dev at openjdk.org <panama-dev at openjdk.org>
> *Subject:* Performance impact of *_UNALIGNED when segment is aligned.
>
> The DataSketches library can be handed segments from other applications
> and cannot guarantee the underlying memory alignment of these segments and
> it would be too burdensome to require a specific alignment.  As I result,
> the library will have to always use *_UNALIGNED layouts when accessing
> these segments.
>
> From the Javadocs I understand that there can be a performance impact
> accessing *_UNALIGNED values in a segment that is not appropriately
> aligned.  For example, accessing JAVA_LONG values in a
> MemorySegment.ofArray(byte[]).  This leads me to a couple of questions
> related to performance and alignment that don't seem to be answered in the
> documentation:
>
>
>    - Is there a performance impact of using *_UNALIGNED layouts on
>    segments that are fortuitously properly aligned, instead of being
>    configured with the proper alignment?
>
>
>
>    - Is the performance of underlying vector operations used on segment
>    bulk operations impacted by alignment? For example, are the vector
>    operations disabled if the segment alignment doesn't match an array element
>    natural alignment? Or are they just slower?
>
> Lee.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20250520/8cd84c25/attachment.htm>


More information about the panama-dev mailing list