[foreign-memaccess+abi] RFR: Split foreign vector load and store by null or not null base [v2]

Radoslaw Smogura duke at openjdk.org
Wed Aug 24 19:02:59 UTC 2022


On Wed, 24 Aug 2022 18:33:00 GMT, Radoslaw Smogura <duke at openjdk.org> wrote:

>> Split store / load operation by if checking if base is null
>> or not null.
>> 
>> When this happens base in Unsafe is not perceived with mixed
>> access by VM, and VM does not insert barriers.
>> 
>> Test results gives the expected values where the case of polluted access is 2x multiplication of normal access.
>> 
>> After
>> 
>> Benchmark                                    (size)  Mode  Cnt    Score     Error  Units
>> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    7.437 ±   0.195  ns/op
>> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.593 ±   0.371  ns/op
>> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   16.997 ±   0.118  ns/op
>> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10   58.673 ± 105.783  ns/op
>> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10   67.216 ±  16.157  ns/op
>> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  122.567 ± 263.950  ns/op
>> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  114.725 ± 209.183  ns/op
>> 
>> 
>> Before
>> 
>> Benchmark                                    (size)  Mode  Cnt    Score   Error  Units
>> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    8.547 ± 0.115  ns/op
>> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.536 ± 0.082  ns/op
>> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   15.818 ± 0.101  ns/op
>> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10  146.380 ± 1.127  ns/op
>> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10  290.784 ± 7.274  ns/op
>> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  297.187 ± 5.096  ns/op
>> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  310.166 ± 9.310  ns/op
>> 
>> 
>> Additonally with profiling `load` and `store` method arguments as
>> described in [1]
>> 
>> Benchmark                                    (size)  Mode  Cnt    Score   Error  Units
>> MemorySegmentVectorAccess.arrayCopy            1024  avgt   10    7.480 ± 0.169  ns/op
>> MemorySegmentVectorAccess.directSegments       1024  avgt   10   15.497 ± 0.062  ns/op
>> MemorySegmentVectorAccess.heapSegments         1024  avgt   10   16.829 ± 0.132  ns/op
>> MemorySegmentVectorAccess.pollutedSegments2    1024  avgt   10  145.436 ± 1.081  ns/op
>> MemorySegmentVectorAccess.pollutedSegments3    1024  avgt   10  291.081 ± 2.297  ns/op
>> MemorySegmentVectorAccess.pollutedSegments4    1024  avgt   10  305.388 ± 7.518  ns/op
>> MemorySegmentVectorAccess.pollutedSegments5    1024  avgt   10  303.931 ± 3.412  ns/op
>> 
>> 
>> [1] https://github.com/openjdk/panama-foreign/pull/700
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add unswitching to masked vector operations
>   Add benchmark covering this.
>   
>   After
>   ```
>   Benchmark                                          (size)  Mode  Cnt    Score    Error  Units
>   MemorySegmentMaskedVectorAccess.arrayCopy            1024  avgt   10   16.700 ±  0.612  ns/op
>   MemorySegmentMaskedVectorAccess.directSegments       1024  avgt   10   80.429 ±  2.897  ns/op
>   MemorySegmentMaskedVectorAccess.heapSegments         1024  avgt   10   25.528 ±  0.296  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments2    1024  avgt   10  122.809 ±  0.894  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments3    1024  avgt   10  252.930 ±  4.623  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments4    1024  avgt   10  451.579 ±  6.429  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments5    1024  avgt   10  446.500 ± 39.156  ns/op
>   ```
>   
>   Before
>   ```
>   Benchmark                                          (size)  Mode  Cnt    Score     Error  Units
>   MemorySegmentMaskedVectorAccess.arrayCopy            1024  avgt   10   21.089 ±   0.219  ns/op
>   MemorySegmentMaskedVectorAccess.directSegments       1024  avgt   10   81.384 ±   1.008  ns/op
>   MemorySegmentMaskedVectorAccess.heapSegments         1024  avgt   10   25.626 ±   0.522  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments2    1024  avgt   10  217.733 ±   5.467  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments3    1024  avgt   10  441.045 ±   9.749  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments4    1024  avgt   10  522.613 ± 104.997  ns/op
>   MemorySegmentMaskedVectorAccess.pollutedSegments5    1024  avgt   10  449.814 ±   8.203  ns/op
>   ```

I can try to add it to VM, I wonder if it would be enough to create such If in VM and than duplicate current intrinsic call for each branch. However I'm not sure how far I'll go with my paramount ;) VM skills

-------------

PR: https://git.openjdk.org/panama-foreign/pull/711


More information about the panama-dev mailing list