[foreign-memaccess+abi] RFR: 8291826: Rework MemoryLayout Sealed Hierarchy [v5]
Paul Sandoz
psandoz at openjdk.org
Mon Aug 22 16:19:53 UTC 2022
On Mon, 22 Aug 2022 11:16:36 GMT, Per Minborg <duke at openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1159:
>>
>>> 1157: @ForceInline
>>> 1158: default byte get(ValueLayout.OfByte layout, long offset) {
>>> 1159: return (byte) ((ValueLayouts.OfByteImpl) layout).accessHandle().get(this, offset);
>>
>> That looks better. Let's measure using micros under test/micro/org/openjdk/bench/java/lang/foreign, search for a few that perform get/set via MemorySegment e.g. the LoopOver* benchmarks are a good candidate set.
>
> I've made some benchmarks:
>
>
> Main branch ("Baseline"):
>
>
> Benchmark (polluteProfile) Mode Cnt Score Error Units
> LoopOverNonConstant.segment_loop_instance N/A avgt 30 0.300 ± 0.003 ms/op
> LoopOverNonConstant.segment_loop_instance_index N/A avgt 30 0.321 ± 0.005 ms/op
> LoopOverNonConstant.segment_loop_instance_unaligned N/A avgt 30 0.339 ± 0.005 ms/op
> LoopOverNonConstantHeap.segment_loop_instance false avgt 30 0.243 ± 0.004 ms/op
> LoopOverNonConstantHeap.segment_loop_instance true avgt 30 0.255 ± 0.010 ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned false avgt 30 0.251 ± 0.011 ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned true avgt 30 0.254 ± 0.009 ms/op
> LoopOverNonConstantMapped.segment_loop_instance N/A avgt 30 0.266 ± 0.012 ms/op
> LoopOverNonConstantShared.segment_loop_instance N/A avgt 30 0.256 ± 0.011 ms/op
> LoopOverNonConstantShared.segment_loop_instance_address N/A avgt 30 0.264 ± 0.010 ms/op
>
>
> With the proposed solution ("Casting" to a specific `Of*Impl` class):
>
>
> Benchmark (polluteProfile) Mode Cnt Score Error Units
> LoopOverNonConstant.segment_loop_instance N/A avgt 30 0.263 ± 0.006 ms/op
> LoopOverNonConstant.segment_loop_instance_index N/A avgt 30 0.281 ± 0.004 ms/op
> LoopOverNonConstant.segment_loop_instance_unaligned N/A avgt 30 0.286 ± 0.015 ms/op
> LoopOverNonConstantHeap.segment_loop_instance false avgt 30 0.282 ± 0.003 ms/op
> LoopOverNonConstantHeap.segment_loop_instance true avgt 30 0.272 ± 0.008 ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned false avgt 30 0.279 ± 0.003 ms/op
> LoopOverNonConstantHeap.segment_loop_instance_unaligned true avgt 30 0.274 ± 0.009 ms/op
> LoopOverNonConstantMapped.segment_loop_instance N/A avgt 30 0.252 ± 0.011 ms/op
> LoopOverNonConstantShared.segment_loop_instance N/A avgt 30 0.267 ± 0.009 ms/op
> LoopOverNonConstantShared.segment_loop_instance_address N/A avgt 30 0.261 ± 0.012 ms/op
>
>
> This can be summarized in the following table (values are in ms/op):
>
>
> Benchmark | Baseline | Casting
> -- | -- | --
> LONC.segment_loop_instance | 0.300 | 0.263
> LONC.segment_loop_instance_index | 0.321 | 0.281
> LONC.segment_loop_instance_unaligned | 0.339 | 0.286
> LONCHeap.segment_loop_instance (non-polluted) | 0.243 | 0.282
> LONCHeap.segment_loop_instance (polluted) | 0.255 | 0.272
> LONCHeap.segment_loop_instance_unaligned (non-polluted) | 0.251 | 0.279
> LONCHeap.segment_loop_instance_unaligned (polluted) | 0.254 | 0.274
> LONCMapped.segment_loop_instance | 0.266 | 0.252
> LONCShared.segment_loop_instance | 0.256 | 0.267
> LONCShared.segment_loop_instance_address | 0.264 | 0.261
>
> , and the following graph (also showing estimated error margins at around 0.01 ms):
>
> ![image](https://user-images.githubusercontent.com/7457876/185908494-42b04f68-ce11-463b-a375-0710e38a3607.png)
>
> NOTE: The PR ("Casting") contains more than just the casting operations compared to the "Baseline" and the benchmarks were performed on a MacBook Pro (16-inch, 2019) with 2.3 GHz 8-Core Intel Core i9 and MacOS 12.5.1.
Those numbers look ok.
-------------
PR: https://git.openjdk.org/panama-foreign/pull/710
More information about the panama-dev
mailing list