[foreign-memaccess+abi] RFR: 8291826: Rework MemoryLayout Sealed Hierarchy [v5]

Mon Aug 22 16:19:53 UTC 2022

On Mon, 22 Aug 2022 11:16:36 GMT, Per Minborg <duke at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1159:
>> 
>>> 1157:     @ForceInline
>>> 1158:     default byte get(ValueLayout.OfByte layout, long offset) {
>>> 1159:         return (byte) ((ValueLayouts.OfByteImpl) layout).accessHandle().get(this, offset);
>> 
>> That looks better. Let's measure using micros under test/micro/org/openjdk/bench/java/lang/foreign, search for a few that perform get/set via MemorySegment e.g. the LoopOver* benchmarks are a good candidate set.
>
> I've made some benchmarks:
> 
> 
> Main branch ("Baseline"):
> 
> 
> Benchmark                                                (polluteProfile)  Mode  Cnt  Score   Error  Units
> LoopOverNonConstant.segment_loop_instance                             N/A  avgt   30  0.300 ± 0.003  ms/op
> LoopOverNonConstant.segment_loop_instance_index                       N/A  avgt   30  0.321 ± 0.005  ms/op
> LoopOverNonConstant.segment_loop_instance_unaligned                   N/A  avgt   30  0.339 ± 0.005  ms/op
> LoopOverNonConstantHeap.segment_loop_instance                       false  avgt   30  0.243 ± 0.004  ms/op
> LoopOverNonConstantHeap.segment_loop_instance                        true  avgt   30  0.255 ± 0.010  ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned             false  avgt   30  0.251 ± 0.011  ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned              true  avgt   30  0.254 ± 0.009  ms/op
> LoopOverNonConstantMapped.segment_loop_instance                       N/A  avgt   30  0.266 ± 0.012  ms/op
> LoopOverNonConstantShared.segment_loop_instance                       N/A  avgt   30  0.256 ± 0.011  ms/op
> LoopOverNonConstantShared.segment_loop_instance_address               N/A  avgt   30  0.264 ± 0.010  ms/op
> 
> 
> With the proposed solution ("Casting" to a specific `Of*Impl` class):
> 
> 
> Benchmark                                                (polluteProfile)  Mode  Cnt  Score   Error  Units
> LoopOverNonConstant.segment_loop_instance                             N/A  avgt   30  0.263 ±  0.006  ms/op
> LoopOverNonConstant.segment_loop_instance_index                       N/A  avgt   30  0.281 ±  0.004  ms/op
> LoopOverNonConstant.segment_loop_instance_unaligned                   N/A  avgt   30  0.286 ±  0.015  ms/op
> LoopOverNonConstantHeap.segment_loop_instance                       false  avgt   30  0.282 ±  0.003  ms/op
> LoopOverNonConstantHeap.segment_loop_instance                        true  avgt   30  0.272 ±  0.008  ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned             false  avgt   30  0.279 ±  0.003  ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned              true  avgt   30  0.274 ±  0.009  ms/op
> LoopOverNonConstantMapped.segment_loop_instance                       N/A  avgt   30  0.252 ±  0.011  ms/op
> LoopOverNonConstantShared.segment_loop_instance                       N/A  avgt   30  0.267 ±  0.009  ms/op
> LoopOverNonConstantShared.segment_loop_instance_address               N/A  avgt   30  0.261 ±  0.012  ms/op
> 
> 
> This can be summarized in the following table (values are in ms/op):
> 
> 
> Benchmark | Baseline | Casting
> -- | -- | --
> LONC.segment_loop_instance | 0.300 | 0.263
> LONC.segment_loop_instance_index | 0.321 | 0.281
> LONC.segment_loop_instance_unaligned | 0.339 | 0.286
> LONCHeap.segment_loop_instance (non-polluted) | 0.243 | 0.282
> LONCHeap.segment_loop_instance (polluted) | 0.255 | 0.272
> LONCHeap.segment_loop_instance_unaligned (non-polluted) | 0.251 | 0.279
> LONCHeap.segment_loop_instance_unaligned (polluted) | 0.254 | 0.274
> LONCMapped.segment_loop_instance | 0.266 | 0.252
> LONCShared.segment_loop_instance | 0.256 | 0.267
> LONCShared.segment_loop_instance_address | 0.264 | 0.261
> 
> , and the following graph (also showing estimated error margins at around 0.01 ms):
> 
> ![image](https://user-images.githubusercontent.com/7457876/185908494-42b04f68-ce11-463b-a375-0710e38a3607.png)
> 
> NOTE: The PR ("Casting") contains more than just the casting operations compared to the "Baseline" and the benchmarks were performed on a MacBook Pro (16-inch, 2019) with 2.3 GHz 8-Core Intel Core i9 and MacOS 12.5.1.

Those numbers look ok.

-------------

PR: https://git.openjdk.org/panama-foreign/pull/710