RFR: 8324751: C2 SuperWord: Aliasing Analysis runtime check [v18]

Fri Aug 22 08:58:05 UTC 2025

On Thu, 21 Aug 2025 18:35:40 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   disable flag if not possible
>
> It would be nice to have code profiling tool which could show which part in code for these two cases is hot. Instead of guessing based on whole system behaviors.

@vnkozlov - ⚠ I'm now playing with replacing the fast-path with a `HaltNode` - with that a lot of lines of assembly disappear (100-200). And I'm now seeing the performance difference go away, at least for the byte case (strangely not in int case). Maybe it is code locality? Maybe the `perf stat` `tma_frontend_bound` results were misleading? ⚠

But I'm not sure about locality either. With a sufficiently large loop iteration, the slow-loop body should eventually be cached fully. So the performance difference should fade away with larger loops. But that does not seem to be the case.

Here the `HaltNode` [patch](https://github.com/user-attachments/files/21934393/patch.txt)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24278#issuecomment-3213613126