RFR: 8339531: Improve performance of MemorySegment::mismatch [v12]

Thu Sep 12 11:37:08 UTC 2024

On Thu, 12 Sep 2024 11:32:22 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java line 204:
>> 
>>> 202:         // This gives about 20% performance increase for large values of `length`.
>>> 203:         // On non-Aarch64 architectures, the unroll code will be eliminated at compile time.
>>> 204:         if (Architecture.isAARCH64() && NATIVE_THRESHOLD_MISMATCH > 64) {
>> 
>> I'm a bit opposed to this - as it goes in the direction to add a lot of transient complexity when in reality the underlying issue is that aarch64 mismatch intrinsics should be fixed. Tinkering with thresholds is borderline, but still acceptable - having different implementations one per platform starts to look "more wrong".
>
> In other words, I don't think the goal of this (and related) PR is "improve mismatch so that it blows other alternatives - like Unsafe, or array" out of the water - as much as it is "make sure that using MemorySegment::mismatch is competitive with other offerings".

Then, an interesting part of these PRs is that we have uncovered quite a lot of issues both with our intrinsics (e.g. `fill` is not vectorized and works badly on Windows, mismatch works poorly on aarch64) *and* missed optimization opportunities - e.g. "short" segment loops are not optimized as tightly as they could. But it is not the job of these PRs to fix all these issues. The main focus remain: for small sizes it is not worth going down intrinsics-lane. Let's stick to it (in the interest of keeping the review focused).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20848#discussion_r1756682363