[foreign-memaccess+abi] RFR: Split foreign vector load and store by null or not null base [v2]
Radoslaw Smogura
duke at openjdk.org
Wed Aug 24 19:02:59 UTC 2022
On Wed, 24 Aug 2022 18:33:00 GMT, Radoslaw Smogura <duke at openjdk.org> wrote:
>> Split store / load operation by if checking if base is null
>> or not null.
>>
>> When this happens base in Unsafe is not perceived with mixed
>> access by VM, and VM does not insert barriers.
>>
>> Test results gives the expected values where the case of polluted access is 2x multiplication of normal access.
>>
>> After
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.437 ± 0.195 ns/op
>> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.593 ± 0.371 ns/op
>> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.997 ± 0.118 ns/op
>> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 58.673 ± 105.783 ns/op
>> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 67.216 ± 16.157 ns/op
>> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 122.567 ± 263.950 ns/op
>> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 114.725 ± 209.183 ns/op
>>
>>
>> Before
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 8.547 ± 0.115 ns/op
>> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.536 ± 0.082 ns/op
>> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 15.818 ± 0.101 ns/op
>> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 146.380 ± 1.127 ns/op
>> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 290.784 ± 7.274 ns/op
>> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 297.187 ± 5.096 ns/op
>> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 310.166 ± 9.310 ns/op
>>
>>
>> Additonally with profiling `load` and `store` method arguments as
>> described in [1]
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.480 ± 0.169 ns/op
>> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.497 ± 0.062 ns/op
>> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.829 ± 0.132 ns/op
>> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 145.436 ± 1.081 ns/op
>> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 291.081 ± 2.297 ns/op
>> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 305.388 ± 7.518 ns/op
>> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 303.931 ± 3.412 ns/op
>>
>>
>> [1] https://github.com/openjdk/panama-foreign/pull/700
>
> Radoslaw Smogura has updated the pull request incrementally with one additional commit since the last revision:
>
> Add unswitching to masked vector operations
> Add benchmark covering this.
>
> After
> ```
> Benchmark (size) Mode Cnt Score Error Units
> MemorySegmentMaskedVectorAccess.arrayCopy 1024 avgt 10 16.700 ± 0.612 ns/op
> MemorySegmentMaskedVectorAccess.directSegments 1024 avgt 10 80.429 ± 2.897 ns/op
> MemorySegmentMaskedVectorAccess.heapSegments 1024 avgt 10 25.528 ± 0.296 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments2 1024 avgt 10 122.809 ± 0.894 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments3 1024 avgt 10 252.930 ± 4.623 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments4 1024 avgt 10 451.579 ± 6.429 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments5 1024 avgt 10 446.500 ± 39.156 ns/op
> ```
>
> Before
> ```
> Benchmark (size) Mode Cnt Score Error Units
> MemorySegmentMaskedVectorAccess.arrayCopy 1024 avgt 10 21.089 ± 0.219 ns/op
> MemorySegmentMaskedVectorAccess.directSegments 1024 avgt 10 81.384 ± 1.008 ns/op
> MemorySegmentMaskedVectorAccess.heapSegments 1024 avgt 10 25.626 ± 0.522 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments2 1024 avgt 10 217.733 ± 5.467 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments3 1024 avgt 10 441.045 ± 9.749 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments4 1024 avgt 10 522.613 ± 104.997 ns/op
> MemorySegmentMaskedVectorAccess.pollutedSegments5 1024 avgt 10 449.814 ± 8.203 ns/op
> ```
I can try to add it to VM, I wonder if it would be enough to create such If in VM and than duplicate current intrinsic call for each branch. However I'm not sure how far I'll go with my paramount ;) VM skills
-------------
PR: https://git.openjdk.org/panama-foreign/pull/711
More information about the panama-dev
mailing list