[foreign-memaccess+abi] RFR: 8311594: Avoid GlobalSession liveness check [v2]

Tue Jul 11 14:43:23 UTC 2023

On Tue, 11 Jul 2023 14:24:10 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> There's a ton of different access patterns, and most of them cannot be implemented as simple loops. This statement concerns me the most: "a single memory access using MemorySegment, ByteBuffer or any other API which enforces some extra checks is always going to be slower than same access using Unsafe"  If the MemorySegment is global and encompasses the entire address space, why should there be any extra checks at all compared to Unsafe?
>
>> There's a ton of different access patterns, and most of them cannot be implemented as simple loops. This statement concerns me the most: "a single memory access using MemorySegment, ByteBuffer or any other API which enforces some extra checks is always going to be slower than same access using Unsafe" If the MemorySegment is global and encompasses the entire address space, why should there be any extra checks at all compared to Unsafe?
> 
> The main point I'm trying to make here is that, so far at least, we're speculating as to what the cause of the regression might be. What we need to do is (a) be able to reproduce the issue you are experiencing (there is no evidence so far that the fix in this PR does anything to ameliorate your situation and I wouldn't be too surprised if it did not help at all) and (b) find some ways to mitigate it (where and if possible). For (a) we need help, as we are essentially discussing code without being able to see it, which is not optimal.

I'm working on a smaller test at the moment. The critical performance operation overall is a binary search within a b-tree node. Each comparison obtains a 2-byte pointer which references a binary key, which itself starts with a byte field to encode the length. The comparison does a byte-by-byte check of keys.

I've tried to optimize this in the past by doing larger comparisons (8 bytes at a time), but it didn't show much improvement. I assumed that HotSpot was already doing some optimizations that were good enough, and so I didn't bother with any more experiments.

Anyhow, with a test program which just stresses this critical bottleneck, the Panama version is two times slower than the Unsafe version. I'll work on creating something smaller and standalone (it won't suck in the whole project).

-------------

PR Review Comment: https://git.openjdk.org/panama-foreign/pull/844#discussion_r1259842846