[foreign-memaccess+abi] RFR: Performance improvement to unchecked segment ofNativeRestricted [v4]

Radoslaw Smogura github.com+7535718+rsmogura at openjdk.java.net
Sat Jan 16 21:00:04 UTC 2021


> This changes removes (by making no-ops) range and temporal checks for `ofNativeRestricted` segment. As this segment is global, above checks are not needed.
> 
> Generated native code is smaller, and execution outperforms Java native arrays (depending on CPU)
> Changed
> Benchmark                           Mode  Cnt          Score        Error  Units
> AccessBenchmark.foreignAddress     thrpt    5  128946129.691 ± 317433.113  ops/s
> AccessBenchmark.foreignAddressRaw  thrpt    5  136883439.221 ± 749390.255  ops/s
> AccessBenchmark.target             thrpt    5  125325586.957 ±  32129.931  ops/s
> Base
> Benchmark                           Mode  Cnt          Score        Error  Units
> AccessBenchmark.foreignAddress     thrpt    5  125257424.876 ± 230508.169  ops/s
> AccessBenchmark.foreignAddressRaw  thrpt    5  128818591.434 ± 241806.765  ops/s
> AccessBenchmark.target             thrpt    5  125083379.819 ± 184070.467  ops/s
> ---
> This PR is replacement for https://github.com/openjdk/panama-foreign/pull/431 (OCA)
> and was partially discussed (before changes) in https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011747.htm
> 
> ---
> Benchmark
> @State(Scope.Thread)
> public class AccessBenchmark {
>     static final MemorySegment ms = MemorySegment.ofNativeRestricted();
>     static final VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder());
> 
>     int[] intData = new int[12];
>     volatile int intDataOffset = 0;
> 
>     volatile MemoryAddress address;
>     volatile long addressRaw;
> 
>     @Setup
>     public void setup() {
>         var ms = MemorySegment.allocateNative(256);
>         address = ms.address();
>         addressRaw = address.toRawLongValue();
>     }
> 
>     @Benchmark
>     public void target(Blackhole bh) {
>         int[] local = intData;
>         int localOffset = intDataOffset;
>         bh.consume(local[localOffset]);
>         bh.consume(local[localOffset + 1]);
>     }
> 
>     @Benchmark
>     public void foreignAddress(Blackhole bh) {
>         var a = address;
>         bh.consume((int) intHandle.get(ms, a.addOffset(0).toRawLongValue()));
>         bh.consume((int) intHandle.get(ms, a.addOffset(4).toRawLongValue()));
>     }
> 
>     @Benchmark
>     public void foreignAddressRaw(Blackhole bh) {
>         var a = addressRaw;
>         bh.consume((int) intHandle.get(ms, a));
>         bh.consume((int) intHandle.get(ms, a + 4));
>     }
> }

Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:

  Replaced the stride access with normal VarHandle.
  
  Added no_align benchmakr, to compare preformance with alignments checks turned off.
  ```
  Benchmark                                         Mode  Cnt  Score   Error  Units
  LoopOverNonConstant.BB_get                        avgt   30  3.892 ? 0.012  ns/op
  LoopOverNonConstant.BB_loop                       avgt   30  0.230 ? 0.001  ms/op
  LoopOverNonConstant.global_segment_get            avgt   30  3.887 ? 0.008  ns/op
  LoopOverNonConstant.global_segment_loop           avgt   30  0.396 ? 0.002  ms/op
  LoopOverNonConstant.global_segment_loop_no_align  avgt   30  0.247 ? 0.001  ms/op
  LoopOverNonConstant.segment_get                   avgt   30  5.489 ? 0.014  ns/op
  LoopOverNonConstant.segment_loop                  avgt   30  0.229 ? 0.001  ms/op
  LoopOverNonConstant.segment_loop_readonly         avgt   30  0.236 ? 0.001  ms/op
  LoopOverNonConstant.segment_loop_slice            avgt   30  0.241 ? 0.001  ms/op
  LoopOverNonConstant.segment_loop_static           avgt   30  0.230 ? 0.001  ms/op
  LoopOverNonConstant.unsafe_get                    avgt   30  3.425 ? 0.006  ns/op
  LoopOverNonConstant.unsafe_loop                   avgt   30  0.230 ? 0.001  ms/op
  ```
  Not optimized `ofNativeRestricted`
  ```
  LoopOverNonConstant.global_segment_get     avgt   30  4.126 ?  0.006  ns/op
  LoopOverNonConstant.global_segment_loop    avgt   30  0.603 ?  0.001  ms/op
  ```

-------------

Changes:
  - all: https://git.openjdk.java.net/panama-foreign/pull/437/files
  - new: https://git.openjdk.java.net/panama-foreign/pull/437/files/ee220f9d..a262b6d6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=panama-foreign&pr=437&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=panama-foreign&pr=437&range=02-03

  Stats: 23 lines in 1 file changed: 14 ins; 3 del; 6 mod
  Patch: https://git.openjdk.java.net/panama-foreign/pull/437.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-foreign pull/437/head:pull/437

PR: https://git.openjdk.java.net/panama-foreign/pull/437


More information about the panama-dev mailing list