[foreign-memaccess+abi] RFR: Prevent maxAlign virtual calls for polluted accesses [v2]
Maurizio Cimadamore
mcimadamore at openjdk.org
Mon Aug 8 09:31:36 UTC 2022
On Mon, 8 Aug 2022 02:45:07 GMT, Radoslaw Smogura <duke at openjdk.org> wrote:
>> In case of polluted accesses (when different kinds of segments are accessed
>> from same code), `maxAlign()` can get virtual call which would prevent
>> effective inlining and loop optimizations.
>>
>> This patch moves `maxAlign` to `AbstractMemorySegmentImpl` field, and makes method
>> final. The value of align is passed as constructor argument.
>>
>> _Note: This patch can cause slightly bigger memory usage, as memory segment will carry `maxAlign` value, this can optimizaed by using smaller container for value i. e. `byte` or `short`_
>>
>> After
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MixedAccessBenchmarks.directCopy 1048576 avgt 10 16410.733 ± 79.901 ns/op
>> MixedAccessBenchmarks.pollutedAccessCopy 1048576 avgt 10 168497.502 ± 632.578 ns/op
>>
>>
>> Before
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MixedAccessBenchmarks.directCopy 1048576 avgt 10 18336.054 ± 63.133 ns/op
>> MixedAccessBenchmarks.pollutedAccessCopy 1048576 avgt 10 2069032.456 ± 167512.633 ns/op
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
>
> Previous version created performance drop for `LoopOverNonConstantHeap` benchamrk, this fixes this and keeps same results for tests with vectors.
src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 326:
> 324: // Helper methods
> 325:
> 326: @ForceInline
When adding `maxAlign` I have played with something similar as what you have done. My recollection is that doing this doesn't 100% solve the issue, as it's possible to have polluted profile for `maxAlignMask` (e.g. the JVM will bias the method implementation towards the 1-2 layouts that seem to be more common, and put everything else in an uncommon branch). Maybe what you see is that, since the virtual call is gone, there is still a net gain.
Does the problem only manifest with bulk copy? Or also with plain memory access? If the former, perhaps some other inlining issue with bulk copy (which is a big method) could be at play.
-------------
PR: https://git.openjdk.org/panama-foreign/pull/700
More information about the panama-dev
mailing list