[foreign-memaccess+abi] RFR: 8291826: Rework MemoryLayout Sealed Hierarchy [v5]
Per Minborg
duke at openjdk.org
Mon Aug 22 11:28:02 UTC 2022
On Fri, 19 Aug 2022 17:19:59 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:
>> Per Minborg has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Fix problems in static initializers
>
> src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1159:
>
>> 1157: @ForceInline
>> 1158: default byte get(ValueLayout.OfByte layout, long offset) {
>> 1159: return (byte) ((ValueLayouts.OfByteImpl) layout).accessHandle().get(this, offset);
>
> That looks better. Let's measure using micros under test/micro/org/openjdk/bench/java/lang/foreign, search for a few that perform get/set via MemorySegment e.g. the LoopOver* benchmarks are a good candidate set.
I've made some benchmarks:
Main branch ("Baseline"):
Benchmark (polluteProfile) Mode Cnt Score Error Units
LoopOverNonConstant.segment_loop_instance N/A avgt 30 0.300 ± 0.003 ms/op
LoopOverNonConstant.segment_loop_instance_index N/A avgt 30 0.321 ± 0.005 ms/op
LoopOverNonConstant.segment_loop_instance_unaligned N/A avgt 30 0.339 ± 0.005 ms/op
LoopOverNonConstantHeap.segment_loop_instance false avgt 30 0.243 ± 0.004 ms/op
LoopOverNonConstantHeap.segment_loop_instance true avgt 30 0.255 ± 0.010 ms/op
LoopOverNonConstantHeap.segment_loop_instance_unaligned false avgt 30 0.251 ± 0.011 ms/op
LoopOverNonConstantHeap.segment_loop_instance_unaligned true avgt 30 0.254 ± 0.009 ms/op
LoopOverNonConstantMapped.segment_loop_instance N/A avgt 30 0.266 ± 0.012 ms/op
LoopOverNonConstantShared.segment_loop_instance N/A avgt 30 0.256 ± 0.011 ms/op
LoopOverNonConstantShared.segment_loop_instance_address N/A avgt 30 0.264 ± 0.010 ms/op
With the proposed solution ("Casting" to a specific `Of*Impl` class):
Benchmark (polluteProfile) Mode Cnt Score Error Units
LoopOverNonConstant.segment_loop_instance N/A avgt 30 0.263 ± 0.006 ms/op
LoopOverNonConstant.segment_loop_instance_index N/A avgt 30 0.281 ± 0.004 ms/op
LoopOverNonConstant.segment_loop_instance_unaligned N/A avgt 30 0.286 ± 0.015 ms/op
LoopOverNonConstantHeap.segment_loop_instance false avgt 30 0.282 ± 0.003 ms/op
LoopOverNonConstantHeap.segment_loop_instance true avgt 30 0.272 ± 0.008 ms/op
LoopOverNonConstantHeap.segment_loop_instance_unaligned false avgt 30 0.279 ± 0.003 ms/op
LoopOverNonConstantHeap.segment_loop_instance_unaligned true avgt 30 0.274 ± 0.009 ms/op
LoopOverNonConstantMapped.segment_loop_instance N/A avgt 30 0.252 ± 0.011 ms/op
LoopOverNonConstantShared.segment_loop_instance N/A avgt 30 0.267 ± 0.009 ms/op
LoopOverNonConstantShared.segment_loop_instance_address N/A avgt 30 0.261 ± 0.012 ms/op
This can be summarized in the following table (values are in ms/op):
Benchmark | Baseline | Casting
-- | -- | --
LONC.segment_loop_instance | 0.300 | 0.263
LONC.segment_loop_instance_index | 0.321 | 0.281
LONC.segment_loop_instance_unaligned | 0.339 | 0.286
LONCHeap.segment_loop_instance (non-polluted) | 0.243 | 0.282
LONCHeap.segment_loop_instance (polluted) | 0.255 | 0.272
LONCHeap.segment_loop_instance_unaligned (non-polluted) | 0.251 | 0.279
LONCHeap.segment_loop_instance_unaligned (polluted) | 0.254 | 0.274
LONCMapped.segment_loop_instance | 0.266 | 0.252
LONCShared.segment_loop_instance | 0.256 | 0.267
LONCShared.segment_loop_instance_address | 0.264 | 0.261
, and the following graph (also showing estimated error margins at around 0.01 ms):
![image](https://user-images.githubusercontent.com/7457876/185908494-42b04f68-ce11-463b-a375-0710e38a3607.png)
NOTE: The PR ("Casting") contains more than just the casting operations compared to the "Baseline" and the benchmarks were performed on a MacBook Pro (16-inch, 2019) with 2.3 GHz 8-Core Intel Core i9 and MacOS 12.5.1.
-------------
PR: https://git.openjdk.org/panama-foreign/pull/710
More information about the panama-dev
mailing list