<!DOCTYPE html><html><head><title></title></head><body><div>Let me make my comment clearer. Here are the operations that one can expect to be affected: a character search, a copy, a byte-by-byte comparison… on non-trivial inputs that reside in fast cache. Basically cases where the compute part is basically negligible and we are driven by load/store operations.</div><div><br></div><div>These are cheap operations that will be very fast and might be a tad slower on some hardware when the alignment is off.</div><div><br></div><div>These operations are “common” but they are also often supported by intrinsic functions.</div><div><br></div><div><br></div><div><br></div><div><br></div><blockquote type="cite" id="qt" style=""><div>On 7 Jan 2026, at 16:28, Daniel Lemire wrote:</div><div><br></div><div>> The 25% is real but it affects mostly simple functions that do little compute. Like a memory copy, or a quick scan of an input.</div><div>></div><div>> Daniel Lemire, "Dot product on misaligned data," in *Daniel Lemire's blog*, July 14, 2025, <a href="https://urldefense.com/v3/__https://lemire.me/blog/2025/07/14/dot-product-on-misaligned-data/__;!!ACWV5N9M2RV99hQ!K9TplKFS-WUOa0MaUq2AHthEGN65CdQg4SMS7SPHfeftYwG4A2S_2jP9hWf_6B2iBjb6V-MeGAQCO-hOcQ$">https://urldefense.com/v3/__https://lemire.me/blog/2025/07/14/dot-product-on-misaligned-data/__;!!ACWV5N9M2RV99hQ!K9TplKFS-WUOa0MaUq2AHthEGN65CdQg4SMS7SPHfeftYwG4A2S_2jP9hWf_6B2iBjb6V-MeGAQCO-hOcQ$</a> .</div><div>></div><div><br></div><div>Good point. We expect such affected functions to be common, right?</div><div><br></div><div>Surely vectorized hashcode or comparison is affected about as much</div><div>as dot product.</div><div><br></div><div>Also, we are in the habit of worrying about micro-benchmarks, which</div><div>are usually oversimplified, but may well show the 25% effect.</div><div>This is a sad habit for us platform folks, but a necessary one.</div><div><br></div><div>Finally, Peter’s very interesting "trip report" showed another</div><div>common story: The 25% showed up only after he removed some</div><div>performance bugs (accumulator bottlenecks). It sure would be</div><div>nice to reward the diligent coder with the full benefit,</div><div>rather than take away the last 25% due to misalignment.</div><div><br></div><div>What I think (or hope) is that a large array hyper-alignment</div><div>feature could silently patch up a number of such artifacts.</div><div><br></div><div>— John</div><div><br></div></blockquote><div><br></div></body></html>