[foreign-memaccess+abi] RFR: Performance improvement to unchecked segment ofNativeRestricted [v4]
Maurizio Cimadamore
mcimadamore at openjdk.java.net
Wed Jan 20 14:23:52 UTC 2021
On Sat, 16 Jan 2021 21:00:04 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:
>> This changes removes (by making no-ops) range and temporal checks for `ofNativeRestricted` segment. As this segment is global, above checks are not needed.
>>
>> Generated native code is smaller, and execution outperforms Java native arrays (depending on CPU)
>> Changed
>> Benchmark Mode Cnt Score Error Units
>> AccessBenchmark.foreignAddress thrpt 5 128946129.691 ± 317433.113 ops/s
>> AccessBenchmark.foreignAddressRaw thrpt 5 136883439.221 ± 749390.255 ops/s
>> AccessBenchmark.target thrpt 5 125325586.957 ± 32129.931 ops/s
>> Base
>> Benchmark Mode Cnt Score Error Units
>> AccessBenchmark.foreignAddress thrpt 5 125257424.876 ± 230508.169 ops/s
>> AccessBenchmark.foreignAddressRaw thrpt 5 128818591.434 ± 241806.765 ops/s
>> AccessBenchmark.target thrpt 5 125083379.819 ± 184070.467 ops/s
>> ---
>> This PR is replacement for https://github.com/openjdk/panama-foreign/pull/431 (OCA)
>> and was partially discussed (before changes) in https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011747.htm
>>
>> ---
>> Benchmark
>> @State(Scope.Thread)
>> public class AccessBenchmark {
>> static final MemorySegment ms = MemorySegment.ofNativeRestricted();
>> static final VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder());
>>
>> int[] intData = new int[12];
>> volatile int intDataOffset = 0;
>>
>> volatile MemoryAddress address;
>> volatile long addressRaw;
>>
>> @Setup
>> public void setup() {
>> var ms = MemorySegment.allocateNative(256);
>> address = ms.address();
>> addressRaw = address.toRawLongValue();
>> }
>>
>> @Benchmark
>> public void target(Blackhole bh) {
>> int[] local = intData;
>> int localOffset = intDataOffset;
>> bh.consume(local[localOffset]);
>> bh.consume(local[localOffset + 1]);
>> }
>>
>> @Benchmark
>> public void foreignAddress(Blackhole bh) {
>> var a = address;
>> bh.consume((int) intHandle.get(ms, a.addOffset(0).toRawLongValue()));
>> bh.consume((int) intHandle.get(ms, a.addOffset(4).toRawLongValue()));
>> }
>>
>> @Benchmark
>> public void foreignAddressRaw(Blackhole bh) {
>> var a = addressRaw;
>> bh.consume((int) intHandle.get(ms, a));
>> bh.consume((int) intHandle.get(ms, a + 4));
>> }
>> }
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
>
> Replaced the stride access with normal VarHandle.
>
> Added no_align benchmakr, to compare preformance with alignments checks turned off.
> ```
> Benchmark Mode Cnt Score Error Units
> LoopOverNonConstant.BB_get avgt 30 3.892 ? 0.012 ns/op
> LoopOverNonConstant.BB_loop avgt 30 0.230 ? 0.001 ms/op
> LoopOverNonConstant.global_segment_get avgt 30 3.887 ? 0.008 ns/op
> LoopOverNonConstant.global_segment_loop avgt 30 0.396 ? 0.002 ms/op
> LoopOverNonConstant.global_segment_loop_no_align avgt 30 0.247 ? 0.001 ms/op
> LoopOverNonConstant.segment_get avgt 30 5.489 ? 0.014 ns/op
> LoopOverNonConstant.segment_loop avgt 30 0.229 ? 0.001 ms/op
> LoopOverNonConstant.segment_loop_readonly avgt 30 0.236 ? 0.001 ms/op
> LoopOverNonConstant.segment_loop_slice avgt 30 0.241 ? 0.001 ms/op
> LoopOverNonConstant.segment_loop_static avgt 30 0.230 ? 0.001 ms/op
> LoopOverNonConstant.unsafe_get avgt 30 3.425 ? 0.006 ns/op
> LoopOverNonConstant.unsafe_loop avgt 30 0.230 ? 0.001 ms/op
> ```
> Not optimized `ofNativeRestricted`
> ```
> LoopOverNonConstant.global_segment_get avgt 30 4.126 ? 0.006 ns/op
> LoopOverNonConstant.global_segment_loop avgt 30 0.603 ? 0.001 ms/op
> ```
Looks good for now - we can reassess after the hotspot improvements for long in loops start to have visible effects. Thanks!
-------------
Marked as reviewed by mcimadamore (Committer).
PR: https://git.openjdk.java.net/panama-foreign/pull/437
More information about the panama-dev
mailing list