[foreign-memaccess+abi] RFR: Performance improvement to unchecked segment ofNativeRestricted [v3]
Maurizio Cimadamore
mcimadamore at openjdk.java.net
Sat Jan 16 15:34:06 UTC 2021
On Sat, 16 Jan 2021 02:08:55 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:
>> This changes removes (by making no-ops) range and temporal checks for `ofNativeRestricted` segment. As this segment is global, above checks are not needed.
>>
>> Generated native code is smaller, and execution outperforms Java native arrays (depending on CPU)
>> Changed
>> Benchmark Mode Cnt Score Error Units
>> AccessBenchmark.foreignAddress thrpt 5 128946129.691 ± 317433.113 ops/s
>> AccessBenchmark.foreignAddressRaw thrpt 5 136883439.221 ± 749390.255 ops/s
>> AccessBenchmark.target thrpt 5 125325586.957 ± 32129.931 ops/s
>> Base
>> Benchmark Mode Cnt Score Error Units
>> AccessBenchmark.foreignAddress thrpt 5 125257424.876 ± 230508.169 ops/s
>> AccessBenchmark.foreignAddressRaw thrpt 5 128818591.434 ± 241806.765 ops/s
>> AccessBenchmark.target thrpt 5 125083379.819 ± 184070.467 ops/s
>> ---
>> This PR is replacement for https://github.com/openjdk/panama-foreign/pull/431 (OCA)
>> and was partially discussed (before changes) in https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011747.htm
>>
>> ---
>> Benchmark
>> @State(Scope.Thread)
>> public class AccessBenchmark {
>> static final MemorySegment ms = MemorySegment.ofNativeRestricted();
>> static final VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder());
>>
>> int[] intData = new int[12];
>> volatile int intDataOffset = 0;
>>
>> volatile MemoryAddress address;
>> volatile long addressRaw;
>>
>> @Setup
>> public void setup() {
>> var ms = MemorySegment.allocateNative(256);
>> address = ms.address();
>> addressRaw = address.toRawLongValue();
>> }
>>
>> @Benchmark
>> public void target(Blackhole bh) {
>> int[] local = intData;
>> int localOffset = intDataOffset;
>> bh.consume(local[localOffset]);
>> bh.consume(local[localOffset + 1]);
>> }
>>
>> @Benchmark
>> public void foreignAddress(Blackhole bh) {
>> var a = address;
>> bh.consume((int) intHandle.get(ms, a.addOffset(0).toRawLongValue()));
>> bh.consume((int) intHandle.get(ms, a.addOffset(4).toRawLongValue()));
>> }
>>
>> @Benchmark
>> public void foreignAddressRaw(Blackhole bh) {
>> var a = addressRaw;
>> bh.consume((int) intHandle.get(ms, a));
>> bh.consume((int) intHandle.get(ms, a + 4));
>> }
>> }
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
>
> JMH Benchmarks for evaluation of `ofNativeRestricted`
>
> Original benchmark comparing performance of accessing
> data using var handles vs ordinal arrays
>
> Modified existing benchmark `LoopOverNonConstant` to
> see differences versus range / temporal checking & and non-checking segments.
>
> ```
> Benchmark Mode Cnt Score Error Units
> LoopOverNonConstant.BB_get avgt 30 3.885 ? 0.003 ns/op
> LoopOverNonConstant.BB_loop avgt 30 0.229 ? 0.001 ms/op
> LoopOverNonConstant.global_segment_get avgt 30 3.663 ? 0.006 ns/op
> LoopOverNonConstant.global_segment_loop avgt 30 0.374 ? 0.001 ms/op
> LoopOverNonConstant.segment_get avgt 30 5.514 ? 0.023 ns/op
> LoopOverNonConstant.segment_loop avgt 30 0.229 ? 0.001 ms/op
> ```
> Not optimized `ofNativeRestricted`
> ```
> LoopOverNonConstant.global_segment_get avgt 30 4.126 ? 0.006 ns/op
> LoopOverNonConstant.global_segment_loop avgt 30 0.603 ? 0.001 ms/op
> ```
test/micro/org/openjdk/bench/jdk/incubator/foreign/LoopOverNonConstant.java line 65:
> 63: static final MemorySegment globalRestrictedSegment = MemorySegment.ofNativeRestricted();
> 64:
> 65: static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());
This is a strided VarHandle - e.g. it takes a logical index (the sequence element index inside the segment) and it dereferences that element. So, you use it like this:
VH_int.get(segment, 0); // first int element
VH_int.get(segment, 1); // second int element (segment base address + 4)
VH_int.get(segment, 2); // second int element (segment base address + 8)
...
test/micro/org/openjdk/bench/jdk/incubator/foreign/LoopOverNonConstant.java line 140:
> 138: int res = 0;
> 139: for (int i = 0; i < ELEM_SIZE; i ++) {
> 140: res += (int) VH_int.get(globalRestrictedSegment, segment_addr_idx + i);
This looks wrong. you are passing an absolute address to a "logical index" argument. I see that you are attempting to divide the segment address by the carrier size, and that kind offset things, but still leaves you with suboptimal performances.
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/437
More information about the panama-dev
mailing list