[foreign-memaccess+abi] RFR: Performance improvement to unchecked segment ofNativeRestricted [v3]
Radoslaw Smogura
github.com+7535718+rsmogura at openjdk.java.net
Sat Jan 16 02:08:55 UTC 2021
> This changes removes (by making no-ops) range and temporal checks for `ofNativeRestricted` segment. As this segment is global, above checks are not needed.
>
> Generated native code is smaller, and execution outperforms Java native arrays (depending on CPU)
> Changed
> Benchmark Mode Cnt Score Error Units
> AccessBenchmark.foreignAddress thrpt 5 128946129.691 ± 317433.113 ops/s
> AccessBenchmark.foreignAddressRaw thrpt 5 136883439.221 ± 749390.255 ops/s
> AccessBenchmark.target thrpt 5 125325586.957 ± 32129.931 ops/s
> Base
> Benchmark Mode Cnt Score Error Units
> AccessBenchmark.foreignAddress thrpt 5 125257424.876 ± 230508.169 ops/s
> AccessBenchmark.foreignAddressRaw thrpt 5 128818591.434 ± 241806.765 ops/s
> AccessBenchmark.target thrpt 5 125083379.819 ± 184070.467 ops/s
> ---
> This PR is replacement for https://github.com/openjdk/panama-foreign/pull/431 (OCA)
> and was partially discussed (before changes) in https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011747.htm
>
> ---
> Benchmark
> @State(Scope.Thread)
> public class AccessBenchmark {
> static final MemorySegment ms = MemorySegment.ofNativeRestricted();
> static final VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder());
>
> int[] intData = new int[12];
> volatile int intDataOffset = 0;
>
> volatile MemoryAddress address;
> volatile long addressRaw;
>
> @Setup
> public void setup() {
> var ms = MemorySegment.allocateNative(256);
> address = ms.address();
> addressRaw = address.toRawLongValue();
> }
>
> @Benchmark
> public void target(Blackhole bh) {
> int[] local = intData;
> int localOffset = intDataOffset;
> bh.consume(local[localOffset]);
> bh.consume(local[localOffset + 1]);
> }
>
> @Benchmark
> public void foreignAddress(Blackhole bh) {
> var a = address;
> bh.consume((int) intHandle.get(ms, a.addOffset(0).toRawLongValue()));
> bh.consume((int) intHandle.get(ms, a.addOffset(4).toRawLongValue()));
> }
>
> @Benchmark
> public void foreignAddressRaw(Blackhole bh) {
> var a = addressRaw;
> bh.consume((int) intHandle.get(ms, a));
> bh.consume((int) intHandle.get(ms, a + 4));
> }
> }
Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
JMH Benchmarks for evaluation of `ofNativeRestricted`
Original benchmark comparing performance of accessing
data using var handles vs ordinal arrays
Modified existing benchmark `LoopOverNonConstant` to
see differences versus range / temporal checking & and non-checking segments.
```
Benchmark Mode Cnt Score Error Units
LoopOverNonConstant.BB_get avgt 30 3.885 ? 0.003 ns/op
LoopOverNonConstant.BB_loop avgt 30 0.229 ? 0.001 ms/op
LoopOverNonConstant.global_segment_get avgt 30 3.663 ? 0.006 ns/op
LoopOverNonConstant.global_segment_loop avgt 30 0.374 ? 0.001 ms/op
LoopOverNonConstant.segment_get avgt 30 5.514 ? 0.023 ns/op
LoopOverNonConstant.segment_loop avgt 30 0.229 ? 0.001 ms/op
```
Not optimized `ofNativeRestricted`
```
LoopOverNonConstant.global_segment_get avgt 30 4.126 ? 0.006 ns/op
LoopOverNonConstant.global_segment_loop avgt 30 0.603 ? 0.001 ms/op
```
-------------
Changes:
- all: https://git.openjdk.java.net/panama-foreign/pull/437/files
- new: https://git.openjdk.java.net/panama-foreign/pull/437/files/c7d4fdf1..ee220f9d
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=panama-foreign&pr=437&range=02
- incr: https://webrevs.openjdk.java.net/?repo=panama-foreign&pr=437&range=01-02
Stats: 127 lines in 2 files changed: 125 ins; 0 del; 2 mod
Patch: https://git.openjdk.java.net/panama-foreign/pull/437.diff
Fetch: git fetch https://git.openjdk.java.net/panama-foreign pull/437/head:pull/437
PR: https://git.openjdk.java.net/panama-foreign/pull/437
More information about the panama-dev
mailing list