[foreign-memaccess+abi] RFR: Split foreign vector load and store by null or not null base [v3]
Radoslaw Smogura
duke at openjdk.org
Mon Sep 5 06:02:04 UTC 2022
> Split store / load operation by if checking if base is null
> or not null.
>
> When this happens base in Unsafe is not perceived with mixed
> access by VM, and VM does not insert barriers.
>
> Test results gives the expected values where the case of polluted access is 2x multiplication of normal access.
>
> After
>
> Benchmark (size) Mode Cnt Score Error Units
> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.437 ± 0.195 ns/op
> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.593 ± 0.371 ns/op
> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.997 ± 0.118 ns/op
> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 58.673 ± 105.783 ns/op
> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 67.216 ± 16.157 ns/op
> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 122.567 ± 263.950 ns/op
> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 114.725 ± 209.183 ns/op
>
>
> Before
>
> Benchmark (size) Mode Cnt Score Error Units
> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 8.547 ± 0.115 ns/op
> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.536 ± 0.082 ns/op
> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 15.818 ± 0.101 ns/op
> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 146.380 ± 1.127 ns/op
> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 290.784 ± 7.274 ns/op
> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 297.187 ± 5.096 ns/op
> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 310.166 ± 9.310 ns/op
>
>
> Additonally with profiling `load` and `store` method arguments as
> described in [1]
>
> Benchmark (size) Mode Cnt Score Error Units
> MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.480 ± 0.169 ns/op
> MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.497 ± 0.062 ns/op
> MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.829 ± 0.132 ns/op
> MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 145.436 ± 1.081 ns/op
> MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 291.081 ± 2.297 ns/op
> MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 305.388 ± 7.518 ns/op
> MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 303.931 ± 3.412 ns/op
>
>
> [1] https://github.com/openjdk/panama-foreign/pull/700
Radoslaw Smogura has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
- Merge remote-tracking branch 'upstream-foreign/foreign-memaccess+abi' into split-load-store-by-null-base
- Add unswitching to masked vector operations
Add benchmark covering this.
After
```
Benchmark (size) Mode Cnt Score Error Units
MemorySegmentMaskedVectorAccess.arrayCopy 1024 avgt 10 16.700 ± 0.612 ns/op
MemorySegmentMaskedVectorAccess.directSegments 1024 avgt 10 80.429 ± 2.897 ns/op
MemorySegmentMaskedVectorAccess.heapSegments 1024 avgt 10 25.528 ± 0.296 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments2 1024 avgt 10 122.809 ± 0.894 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments3 1024 avgt 10 252.930 ± 4.623 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments4 1024 avgt 10 451.579 ± 6.429 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments5 1024 avgt 10 446.500 ± 39.156 ns/op
```
Before
```
Benchmark (size) Mode Cnt Score Error Units
MemorySegmentMaskedVectorAccess.arrayCopy 1024 avgt 10 21.089 ± 0.219 ns/op
MemorySegmentMaskedVectorAccess.directSegments 1024 avgt 10 81.384 ± 1.008 ns/op
MemorySegmentMaskedVectorAccess.heapSegments 1024 avgt 10 25.626 ± 0.522 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments2 1024 avgt 10 217.733 ± 5.467 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments3 1024 avgt 10 441.045 ± 9.749 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments4 1024 avgt 10 522.613 ± 104.997 ns/op
MemorySegmentMaskedVectorAccess.pollutedSegments5 1024 avgt 10 449.814 ± 8.203 ns/op
```
- Split foreign vector load and store by null or not null base
Split store / load operation by if checking if base is null
or not null.
When this happens base in Unsafe is not perceived with mixed
access by VM, and VM does not insert barriers.
Test results
After
```
Benchmark (size) Mode Cnt Score Error Units
MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.437 ± 0.195 ns/op
MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.593 ± 0.371 ns/op
MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.997 ± 0.118 ns/op
MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 58.673 ± 105.783 ns/op
MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 67.216 ± 16.157 ns/op
MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 122.567 ± 263.950 ns/op
MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 114.725 ± 209.183 ns/op
```
Before
```
Benchmark (size) Mode Cnt Score Error Units
MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 8.547 ± 0.115 ns/op
MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.536 ± 0.082 ns/op
MemorySegmentVectorAccess.heapSegments 1024 avgt 10 15.818 ± 0.101 ns/op
MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 146.380 ± 1.127 ns/op
MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 290.784 ± 7.274 ns/op
MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 297.187 ± 5.096 ns/op
MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 310.166 ± 9.310 ns/op
```
Additonally with profiling `load` and `store` method arguments as
described in [1]
```
Benchmark (size) Mode Cnt Score Error Units
MemorySegmentVectorAccess.arrayCopy 1024 avgt 10 7.480 ± 0.169 ns/op
MemorySegmentVectorAccess.directSegments 1024 avgt 10 15.497 ± 0.062 ns/op
MemorySegmentVectorAccess.heapSegments 1024 avgt 10 16.829 ± 0.132 ns/op
MemorySegmentVectorAccess.pollutedSegments2 1024 avgt 10 145.436 ± 1.081 ns/op
MemorySegmentVectorAccess.pollutedSegments3 1024 avgt 10 291.081 ± 2.297 ns/op
MemorySegmentVectorAccess.pollutedSegments4 1024 avgt 10 305.388 ± 7.518 ns/op
MemorySegmentVectorAccess.pollutedSegments5 1024 avgt 10 303.931 ± 3.412 ns/op
```
[1] https://github.com/openjdk/panama-foreign/pull/700
-------------
Changes:
- all: https://git.openjdk.org/panama-foreign/pull/711/files
- new: https://git.openjdk.org/panama-foreign/pull/711/files/b4680af8..ecbb21d0
Webrevs:
- full: https://webrevs.openjdk.org/?repo=panama-foreign&pr=711&range=02
- incr: https://webrevs.openjdk.org/?repo=panama-foreign&pr=711&range=01-02
Stats: 96177 lines in 1935 files changed: 43930 ins; 42023 del; 10224 mod
Patch: https://git.openjdk.org/panama-foreign/pull/711.diff
Fetch: git fetch https://git.openjdk.org/panama-foreign pull/711/head:pull/711
PR: https://git.openjdk.org/panama-foreign/pull/711
More information about the panama-dev
mailing list