RFR: 8339531: Improve performance of MemorySegment::mismatch [v12]
Maurizio Cimadamore
mcimadamore at openjdk.org
Thu Sep 12 11:37:08 UTC 2024
On Thu, 12 Sep 2024 11:32:22 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java line 204:
>>
>>> 202: // This gives about 20% performance increase for large values of `length`.
>>> 203: // On non-Aarch64 architectures, the unroll code will be eliminated at compile time.
>>> 204: if (Architecture.isAARCH64() && NATIVE_THRESHOLD_MISMATCH > 64) {
>>
>> I'm a bit opposed to this - as it goes in the direction to add a lot of transient complexity when in reality the underlying issue is that aarch64 mismatch intrinsics should be fixed. Tinkering with thresholds is borderline, but still acceptable - having different implementations one per platform starts to look "more wrong".
>
> In other words, I don't think the goal of this (and related) PR is "improve mismatch so that it blows other alternatives - like Unsafe, or array" out of the water - as much as it is "make sure that using MemorySegment::mismatch is competitive with other offerings".
Then, an interesting part of these PRs is that we have uncovered quite a lot of issues both with our intrinsics (e.g. `fill` is not vectorized and works badly on Windows, mismatch works poorly on aarch64) *and* missed optimization opportunities - e.g. "short" segment loops are not optimized as tightly as they could. But it is not the job of these PRs to fix all these issues. The main focus remain: for small sizes it is not worth going down intrinsics-lane. Let's stick to it (in the interest of keeping the review focused).
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20848#discussion_r1756682363
More information about the core-libs-dev
mailing list