[foreign-memaccess+abi] RFR: 8311594: Avoid GlobalSession liveness check [v2]
Maurizio Cimadamore
mcimadamore at openjdk.org
Mon Jul 31 11:23:08 UTC 2023
On Mon, 10 Jul 2023 15:09:56 GMT, Jorn Vernee <jvernee at openjdk.org> wrote:
>> This patch overrides `checkValidStateRaw` in GlobalSession, to do nothing. The current `checkValidStateRaw` in MemorySessionImpl checks the owner thread, and liveness, which are both not needed for the global session.
>>
>> Since the state field is mutable, I found that the JIT can not fold away the liveness check. This has a very marginal effect on performance, but since the fix is also very non-intrusive, I think it's okay to do.
>>
>> I've added 2 new benchmarks that I've used to measure the difference. One measures the performance of MS::copy, and the other measures MS::get. Both use a special `ALL` segment which spans `0` to `Long.MAX_VALUE`, which means that the target address can be used as an offset when doing the accesses. This helps the bounds checks being eliminated, and makes the result of overriding `checkValidStateRaw` in GlobalSession easier to see.
>>
>> Here are some of the interesting numbers:
>>
>> Before:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> MemorySegmentCopyALL.panama_ALL_to_ALL avgt 30 7.788 ± 0.029 ns/op
>> MemorySegmentCopyALL.unsafe_array_to_addr avgt 30 7.543 ± 0.018 ns/op
>>
>> Benchmark Mode Cnt Score Error Units
>> MemorySegmentGet.segment_ALL_unaligned avgt 30 0.486 ± 0.002 ns/op
>> MemorySegmentGet.unsafe avgt 30 0.398 ± 0.003 ns/op
>>
>>
>> After:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> MemorySegmentCopyALL.panama_ALL_to_ALL avgt 30 7.571 ± 0.017 ns/op
>> MemorySegmentCopyALL.unsafe_array_to_addr avgt 30 7.550 ± 0.019 ns/op
>>
>> Benchmark Mode Cnt Score Error Units
>> MemorySegmentGet.segment_ALL_unaligned avgt 30 0.409 ± 0.002 ns/op
>> MemorySegmentGet.unsafe avgt 30 0.397 ± 0.003 ns/op
>
> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision:
>
> Add global session polluted benchmark
> For liveness check, a solution on the top of my head is to push the branch to the slow path. Instead of having one `state` field, we can have 2, which can take values according to the state of the session, (0, x): Open, (1, 0): Closing, (1, 1): Closed. These can be marked as stable, since they do not transit from value 1 to value 0. Then, we can special case the global session to have the `openState` field being 1, and we do type branching when encountering a 1 when reading this state. This have the advantage that it does not affect the execution of other sessions, as well as being able to fold the check for the global ones.
>
> For ownership check, maybe we can tell the compiler to trust this particular field. Stable does not work here, though maybe we can add a dummy `Thread` object to use as the pivot value instead of `null`. But comparing to another pointer is a bit tad more expensive than comparing to `null`, so changing the compiler seems more preferable.
>
> For range check, maybe we can do the same as liveness one, instead of failing on out of bounds, we branch on a constant field to see if it is a global one, and only throw if it is not.
>
> Thanks a lot.
What you suggest are sensible optimization tactics. However, it is unclear how much this path needs to be optimized further. In most cases, when we see a significant performance difference between MS and Unsafe, the liveness check is never the culprit, the bound check is.
And, when using tactics to e.g. generate wrapper segments on the fly to avoid bound checks [1], we can observe that liveness check is also gone from the corresponding assembly. So, this optimization seems to be in YAGNI territory, at least for now.
[1] - https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html
-------------
PR Comment: https://git.openjdk.org/panama-foreign/pull/844#issuecomment-1658176358
More information about the panama-dev
mailing list