[foreign-memaccess+abi] RFR: Performance improvement to unchecked segment ofNativeRestricted [v2]
Radoslaw Smogura
github.com+7535718+rsmogura at openjdk.java.net
Thu Jan 14 23:24:23 UTC 2021
> This changes removes (by making no-ops) range and temporal checks for `ofNativeRestricted` segment. As this segment is global, above checks are not needed.
>
> Generated native code is smaller, and execution outperforms Java native arrays (depending on CPU)
> Changed
> Benchmark Mode Cnt Score Error Units
> AccessBenchmark.foreignAddress thrpt 5 128946129.691 ± 317433.113 ops/s
> AccessBenchmark.foreignAddressRaw thrpt 5 136883439.221 ± 749390.255 ops/s
> AccessBenchmark.target thrpt 5 125325586.957 ± 32129.931 ops/s
> Base
> Benchmark Mode Cnt Score Error Units
> AccessBenchmark.foreignAddress thrpt 5 125257424.876 ± 230508.169 ops/s
> AccessBenchmark.foreignAddressRaw thrpt 5 128818591.434 ± 241806.765 ops/s
> AccessBenchmark.target thrpt 5 125083379.819 ± 184070.467 ops/s
> ---
> This PR is replacement for https://github.com/openjdk/panama-foreign/pull/431 (OCA)
> and was partially discussed (before changes) in https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011747.htm
>
> ---
> Benchmark
> @State(Scope.Thread)
> public class AccessBenchmark {
> static final MemorySegment ms = MemorySegment.ofNativeRestricted();
> static final VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder());
>
> int[] intData = new int[12];
> volatile int intDataOffset = 0;
>
> volatile MemoryAddress address;
> volatile long addressRaw;
>
> @Setup
> public void setup() {
> var ms = MemorySegment.allocateNative(256);
> address = ms.address();
> addressRaw = address.toRawLongValue();
> }
>
> @Benchmark
> public void target(Blackhole bh) {
> int[] local = intData;
> int localOffset = intDataOffset;
> bh.consume(local[localOffset]);
> bh.consume(local[localOffset + 1]);
> }
>
> @Benchmark
> public void foreignAddress(Blackhole bh) {
> var a = address;
> bh.consume((int) intHandle.get(ms, a.addOffset(0).toRawLongValue()));
> bh.consume((int) intHandle.get(ms, a.addOffset(4).toRawLongValue()));
> }
>
> @Benchmark
> public void foreignAddressRaw(Blackhole bh) {
> var a = addressRaw;
> bh.consume((int) intHandle.get(ms, a));
> bh.consume((int) intHandle.get(ms, a + 4));
> }
> }
Radoslaw Smogura has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains four new commits since the last revision:
- Small naming & comments improvements.
- Revert "Next iteration of tuning"
This change was introduced as it was found that JVM makes
null check and inlines empty method. However right now
this phenomen can't be see, so reverting this as
it can generate number of NPE.
This reverts commit 9e29818b8a2f4ba3a3bec8a1edace072c993ccd4.
- Next iteration of tuning
After checking source code it looks like that better is to set scope to `null`.
The results outpaced the Java array access.
```
Benchmark Mode Cnt Score Error Units
AccessBenchmark.foreignAddress thrpt 4 86860188.499 ± 13454393.406 ops/s
AccessBenchmark.foreignAddressRaw thrpt 4 96150181.668 ± 7025145.700 ops/s
AccessBenchmark.target thrpt 4 93673099.539 ± 23272596.145 ops/s```
versus tests on original repo
```
Benchmark Mode Cnt Score Error Units
AccessBenchmark.foreignAddress thrpt 4 81907199.092 ± 2663269.652 ops/s
AccessBenchmark.foreignAddressRaw thrpt 4 83629168.611 ± 1025857.535 ops/s
AccessBenchmark.target thrpt 4 94023553.582 ± 6128411.421 ops/s
```
# Benchmark code
```
State(Scope.Thread)
public class AccessBenchmark {
static final MemorySegment ms = MemorySegment.ofNativeRestricted();
static final VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder());
int[] intData = new int[12];
volatile int intDataOffset = 0;
volatile MemoryAddress address;
volatile long addressRaw;
@Setup
public void setup() {
var ms = MemorySegment.allocateNative(256);
address = ms.address();
addressRaw = address.toRawLongValue();
}
@Benchmark
public void target(Blackhole bh) {
int[] local = intData;
int localOffset = intDataOffset;
bh.consume(local[localOffset]);
bh.consume(local[localOffset + 1]);
}
@Benchmark
public void foreignAddress(Blackhole bh) {
var a = address;
bh.consume((int) intHandle.get(ms, a.addOffset(0).toRawLongValue()));
bh.consume((int) intHandle.get(ms, a.addOffset(4).toRawLongValue()));
}
@Benchmark
public void foreignAddressRaw(Blackhole bh) {
var a = addressRaw;
bh.consume((int) intHandle.get(ms, a));
bh.consume((int) intHandle.get(ms, a + 4));
}
}
```
- [WIP] Performance improvement to unchecked segment ofNativeRestricted
Accessing native memory using ofNativeRestricted could generate range and temporal checkes. As this scope can't be closed and represents whole memory, above checks are not needed, and are |leftoevers| from NativeMemorySegmentImpl.
Thus to overcome this, I adding special segment & scope to allow hotspot better optimize code would be a good solution.
The JMH benchmarks baselined to peformance of plain array access, shown improvement from 89% of array access to 94% of it (% = foreignAddress / target)
Improved version
```
Benchmark Mode Cnt Score Error Units
AccessBenchmark.foreignAddress thrpt 4 87981021.113 ± 4496953.479 ops/s
AccessBenchmark.target thrpt 4 92840761.490 ± 15994108.441 ops/s
```
Original version
```
Benchmark Mode Cnt Score Error Units
AccessBenchmark.foreignAddress thrpt 4 82076915.820 ± 3076568.791 ops/s
AccessBenchmark.target thrpt 4 91962637.002 ± 5104697.571 ops/s
```
-------------
Changes:
- all: https://git.openjdk.java.net/panama-foreign/pull/437/files
- new: https://git.openjdk.java.net/panama-foreign/pull/437/files/98ad3a9c..c7d4fdf1
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=panama-foreign&pr=437&range=01
- incr: https://webrevs.openjdk.java.net/?repo=panama-foreign&pr=437&range=00-01
Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/panama-foreign/pull/437.diff
Fetch: git fetch https://git.openjdk.java.net/panama-foreign pull/437/head:pull/437
PR: https://git.openjdk.java.net/panama-foreign/pull/437
More information about the panama-dev
mailing list