[foreign-memaccess+abi] RFR: Split foreign vector load and store by null or not null base [v3]

Maurizio Cimadamore mcimadamore at openjdk.org
Mon Sep 4 18:02:08 UTC 2023


On Mon, 5 Sep 2022 06:02:04 GMT, Radoslaw Smogura <duke at openjdk.org> wrote:

>> Split store / load operation by if checking if base is null
>> or not null.
>> 
>> When this happens base in Unsafe is not perceived with mixed
>> access by VM, and VM does not insert barriers.
>> 
>> Test results gives the expected values where the case of polluted access is 2x multiplication of normal access.
>> 
>> After
>> 
>> Benchmark                                    (size)  Mode  Cnt    Score     Error  Units
>> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    7.437 ±   0.195  ns/op
>> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.593 ±   0.371  ns/op
>> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   16.997 ±   0.118  ns/op
>> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10   58.673 ± 105.783  ns/op
>> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10   67.216 ±  16.157  ns/op
>> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  122.567 ± 263.950  ns/op
>> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  114.725 ± 209.183  ns/op
>> 
>> 
>> Before
>> 
>> Benchmark                                    (size)  Mode  Cnt    Score   Error  Units
>> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    8.547 ± 0.115  ns/op
>> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.536 ± 0.082  ns/op
>> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   15.818 ± 0.101  ns/op
>> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10  146.380 ± 1.127  ns/op
>> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10  290.784 ± 7.274  ns/op
>> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  297.187 ± 5.096  ns/op
>> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  310.166 ± 9.310  ns/op
>> 
>> 
>> Additonally with profiling `load` and `store` method arguments as
>> described in [1]
>> 
>> Benchmark                                    (size)  Mode  Cnt    Score   Error  Units
>> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    7.480 ± 0.169  ns/op
>> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.497 ± 0.062  ns/op
>> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   16.829 ± 0.132  ns/op
>> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10  145.436 ± 1.081  ns/op
>> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10  291.081 ± 2.297  ns/op
>> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  305.388 ± 7.518  ns/op
> ...
>
> Radoslaw Smogura has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge remote-tracking branch 'upstream-foreign/foreign-memaccess+abi' into split-load-store-by-null-base
>  - Add unswitching to masked vector operations
>    Add benchmark covering this.
>    
>    After
>    ```
>    Benchmark                                          (size)  Mode  Cnt    Score    Error  Units
>    MemorySegmentMaskedVectorAccess.arrayCopy            1024  avgt   10   16.700 ±  0.612  ns/op
>    MemorySegmentMaskedVectorAccess.directSegments       1024  avgt   10   80.429 ±  2.897  ns/op
>    MemorySegmentMaskedVectorAccess.heapSegments         1024  avgt   10   25.528 ±  0.296  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments2    1024  avgt   10  122.809 ±  0.894  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments3    1024  avgt   10  252.930 ±  4.623  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments4    1024  avgt   10  451.579 ±  6.429  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments5    1024  avgt   10  446.500 ± 39.156  ns/op
>    ```
>    
>    Before
>    ```
>    Benchmark                                          (size)  Mode  Cnt    Score     Error  Units
>    MemorySegmentMaskedVectorAccess.arrayCopy            1024  avgt   10   21.089 ±   0.219  ns/op
>    MemorySegmentMaskedVectorAccess.directSegments       1024  avgt   10   81.384 ±   1.008  ns/op
>    MemorySegmentMaskedVectorAccess.heapSegments         1024  avgt   10   25.626 ±   0.522  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments2    1024  avgt   10  217.733 ±   5.467  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments3    1024  avgt   10  441.045 ±   9.749  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments4    1024  avgt   10  522.613 ± 104.997  ns/op
>    MemorySegmentMaskedVectorAccess.pollutedSegments5    1024  avgt   10  449.814 ±   8.203  ns/op
>    ```
>  - Split foreign vector load and store by null or not null base
>    
>    Split store / load operation by if checking if base is null
>    or not null.
>    
>    When this happens base in Unsafe is not perceived with mixed
>    access by VM, and VM does not insert barriers.
>    
>    Test results
>    
>    After
>    ```
>    Benchmark                                    (size)  Mode  Cnt    Score     Error  Units
>    MemorySegmentVectorAccess.arrayCopy   ...

Ping: does this PR needs to be open?

-------------

PR Comment: https://git.openjdk.org/panama-foreign/pull/711#issuecomment-1705584554


More information about the panama-dev mailing list