<div dir="ltr"><div>Per-Ake,</div>Thank you for a very comprehensive response! I did read Emanuel Peter's analysis and it is very interesting.<div> <div>Your statement that "generally, specifying unaligned access is faster" was a surprise and your explanation makes sense. The API Note on every JAVA_*_UNALIGNED Field in the ValueLayout class led me to believe that unaligned access might be slower. </div><div><br></div><div>My question about vector operations was not about my intent to directly use the Vector API, but in my use of MemorySegments, I am using a lot of the built-in bulk operations such as the <i>copy(...)</i>[1]. And it is my understanding that the internal MemorySegment implementation of these bulk operations will try to use vectors if possible. And since my application uses MemorySegments both on-heap and off-heap I want to make sure I am not doing anything that would inadvertently cause these bulk operations to be slower. <br></div></div><div><br></div><div>A lesson I have learned from this discussion is that when creating a MemorySegment on-heap, I should always create them using <i>long </i>arrays, if possible, as that is the only way I can guarantee that the memory alignment is at least 8 bytes. I will also need to take care that slices are created on 8 byte boundaries. This should improve the chances that the MS bulk operations will be able to take advantage of vectors. </div><div><br></div><div>There isn't much I can do about MemorySegments or Buffers handed to me as input.</div><div><br></div><div>[1] In the DataSketches implementations, there are many loop-algorithms applied to arrays that are computing various mathematical quantities. I have found it generally faster to copy the array from the MS to a heap array, perform the algorithm on the heap array, then, if required, copy the result back to the MS -- as opposed to attempting to implement the algorithm directly on the MS. Thus, the importance of the speed of the <i>copy(...)</i> operations!</div><div><br></div><div>Thanks again!</div><div><br></div><div>Lee.</div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, May 19, 2025 at 4:47 AM Per-Ake Minborg <<a href="mailto:per-ake.minborg@oracle.com">per-ake.minborg@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-4295395860928612968">
<div dir="ltr">
<div style="text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:black">
Hi Lee, and thank you for your questions.<br>
<br>
Generally, the impact of unaligned/aligned access by the CPU using scalar or vector load or store operations depends on many factors, and it is neither desirable nor possible to document all variants. On some newer platforms, there is no significant difference
for scalar operations in most cases, whereas there might be a significant difference on other platforms. Some operations must operate on aligned memory (e.g., CAS operations).<br>
<br>
</div>
<div style="text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:black">
If you are using an aligned layout (such as <code>JAVA_INT</code>), the alignment constraint must be asserted for each access, whereas this is not the case for the unaligned counterparts (e.g.,
<code>JAVA_INT_UNALIGNED</code>). So, generally, specifying unaligned access is faster. However, in many cases, the JIT compiler can hoist alignment checking making the difference smaller or even insignificant.<br>
<br>
</div>
<div style="text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:black">
Vector operations, on the other hand, are impacted by alignment and cache line crossing. Emanuel Peter has analyzed the impact of alignment with vector operations
<a href="https://github.com/openjdk/jdk/pull/25065" id="m_-4295395860928612968OWAfc3a1952-8769-ddcc-7abe-54eaa4740866" title="https://github.com/openjdk/jdk/pull/25065" rel="noopener noreferrer" style="margin:0px" target="_blank">
in this comprehensive analysis</a>. In short, if you are working with the Vector API, you may also want to worry about
<span style="font-weight:600">alignment</span>, because there can be a <span style="font-weight:600">
significant performance impact</span> (30%+ in some cases).</div>
<div style="text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:black">
<br>
Best,<br>
Per Minborg</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_-4295395860928612968appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_-4295395860928612968divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> panama-dev <<a href="mailto:panama-dev-retn@openjdk.org" target="_blank">panama-dev-retn@openjdk.org</a>> on behalf of Lee Rhodes <<a href="mailto:leerho@gmail.com" target="_blank">leerho@gmail.com</a>><br>
<b>Sent:</b> Thursday, May 15, 2025 6:55 PM<br>
<b>To:</b> <a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a> <<a href="mailto:panama-dev@openjdk.org" target="_blank">panama-dev@openjdk.org</a>><br>
<b>Subject:</b> Performance impact of *_UNALIGNED when segment is aligned.</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>The DataSketches library can be handed segments from other applications and cannot guarantee the underlying memory alignment of these segments and it would be too burdensome to require a specific alignment. As I result, the library will have to always
use *_UNALIGNED layouts when accessing these segments.</div>
<div><br>
</div>
>From the Javadocs I understand that there can be a performance impact accessing *_UNALIGNED values in a segment that is not appropriately aligned. For example, accessing JAVA_LONG values in a MemorySegment.ofArray(byte[]). This leads me to a couple of questions
related to performance and alignment that don't seem to be answered in the documentation:
<div><br>
</div>
<div>
<ul>
<li>Is there a performance impact of using *_UNALIGNED layouts on segments that are fortuitously properly aligned, instead of being configured with the proper alignment?</li></ul>
<br>
<ul>
<li>Is the performance of underlying vector operations used on segment bulk operations impacted by alignment? For example, are the vector operations disabled if the segment alignment doesn't match an array element natural alignment? Or are they just slower?</li></ul>
<div>Lee.</div>
<div><br>
</div>
<div><br>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div></blockquote></div>