[foreign-memaccess+abi] RFR: Prevent maxAlign virtual calls for polluted accesses [v2]

Mon Aug 8 09:31:36 UTC 2022

On Mon, 8 Aug 2022 02:45:07 GMT, Radoslaw Smogura <duke at openjdk.org> wrote:

>> In case of polluted accesses (when different kinds of segments are accessed
>> from same code), `maxAlign()` can get virtual call which would prevent
>> effective inlining and loop optimizations.
>> 
>> This patch moves `maxAlign` to `AbstractMemorySegmentImpl` field, and makes method
>> final. The value of align is passed as constructor argument.
>> 
>> _Note: This patch can cause slightly bigger memory usage, as memory segment will carry `maxAlign` value, this can optimizaed by using smaller container for value i. e. `byte` or `short`_
>> 
>> After
>> 
>> Benchmark                                  (size)  Mode  Cnt       Score     Error  Units
>> MixedAccessBenchmarks.directCopy          1048576  avgt   10   16410.733 ±  79.901  ns/op
>> MixedAccessBenchmarks.pollutedAccessCopy  1048576  avgt   10  168497.502 ± 632.578  ns/op
>> 
>> 
>> Before
>> 
>> Benchmark                                  (size)  Mode  Cnt        Score        Error  Units
>> MixedAccessBenchmarks.directCopy          1048576  avgt   10    18336.054 ±     63.133  ns/op
>> MixedAccessBenchmarks.pollutedAccessCopy  1048576  avgt   10  2069032.456 ± 167512.633  ns/op
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Previous version created performance drop for `LoopOverNonConstantHeap` benchamrk, this fixes this and keeps same results for tests with vectors.

src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 326:

> 324:     // Helper methods
> 325: 
> 326:     @ForceInline

When adding `maxAlign` I have played with something similar as what you have done. My recollection is that doing this doesn't 100% solve the issue, as it's possible to have polluted profile for `maxAlignMask` (e.g. the JVM will bias the method implementation towards the 1-2 layouts that seem to be more common, and put everything else in an uncommon branch). Maybe what you see is that, since the virtual call is gone, there is still a net gain.

Does the problem only manifest with bulk copy? Or also with plain memory access? If the former, perhaps some other inlining issue with bulk copy (which is a big method) could be at play.

-------------

PR: https://git.openjdk.org/panama-foreign/pull/700