[foreign-memaccess+abi] RFR: Split foreign vector load and store by null or not null base

Mon Aug 22 20:16:47 UTC 2022

On Mon, 22 Aug 2022 07:31:57 GMT, Radoslaw Smogura <duke at openjdk.org> wrote:

> Split store / load operation by if checking if base is null
> or not null.
> 
> When this happens base in Unsafe is not perceived with mixed
> access by VM, and VM does not insert barriers.
> 
> Test results gives the expected values where the case of polluted access is 2x multiplication of normal access.
> 
> After
> 
> Benchmark                                    (size)  Mode  Cnt    Score     Error  Units
> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    7.437 ±   0.195  ns/op
> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.593 ±   0.371  ns/op
> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   16.997 ±   0.118  ns/op
> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10   58.673 ± 105.783  ns/op
> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10   67.216 ±  16.157  ns/op
> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  122.567 ± 263.950  ns/op
> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  114.725 ± 209.183  ns/op
> 
> 
> Before
> 
> Benchmark                                    (size)  Mode  Cnt    Score   Error  Units
> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    8.547 ± 0.115  ns/op
> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.536 ± 0.082  ns/op
> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   15.818 ± 0.101  ns/op
> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10  146.380 ± 1.127  ns/op
> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10  290.784 ± 7.274  ns/op
> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  297.187 ± 5.096  ns/op
> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  310.166 ± 9.310  ns/op
> 
> 
> Additonally with profiling `load` and `store` method arguments as
> described in [1]
> 
> Benchmark                                    (size)  Mode  Cnt    Score   Error  Units
> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    7.480 ± 0.169  ns/op
> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.497 ± 0.062  ns/op
> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   16.829 ± 0.132  ns/op
> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10  145.436 ± 1.081  ns/op
> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10  291.081 ± 2.297  ns/op
> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  305.388 ± 7.518  ns/op
> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  303.931 ± 3.412  ns/op
> 
> 
> [1] https://github.com/openjdk/panama-foreign/pull/700

Yes the third result set compares against https://github.com/openjdk/panama-foreign/pull/700.

I'm not sure how to interpret this results and explain that avoiding barriers gives better results as profiling arguments. Maybe it's because barrier prevents some other optimizations.

-------------

PR: https://git.openjdk.org/panama-foreign/pull/711