RFR: 8334342: Add MergeStore JMH benchmarks [v5]
Shaojin Wen
duke at openjdk.org
Tue Jun 18 01:12:15 UTC 2024
On Mon, 17 Jun 2024 23:02:47 GMT, Shaojin Wen <duke at openjdk.org> wrote:
>> [8318446](https://github.com/openjdk/jdk/pull/16245) brings MergeStore. We need a JMH Benchmark to evaluate the performance of various batch operations and the effect of MergeStore.
>
> Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision:
>
> - Merge remote-tracking branch 'upstream/master' into merge_store_bench
> - bug fix for `putChars4C`
> - bug fix for `putChars4C` and `putChars4S`
> - use VarHandler CHAR_L & CHAR_B
> - copyright
> - bug fix for putIntBU
> - add cases for `getChar` & `putChar`
> - code format
> - add `setIntRL` & `setIntRLU`
> - add comments
> - ... and 3 more: https://git.openjdk.org/jdk/compare/d4942ac0...4c9b9418
I re-ran the performance test based on WebRevs 04: [Full](https://webrevs.openjdk.org/?repo=jdk&pr=19734&range=04) - [Incremental](https://webrevs.openjdk.org/?repo=jdk&pr=19734&range=03-04) ([4c9b9418](https://git.openjdk.org/jdk/pull/19734/files/4c9b9418fc4a95504867b6019b3e94605917f747)) .
# 1. Cases MergeStore does not work
@eme64
I found `putChars4BV` and `putChars4LV` to be two cases where MergeStore didn't work, if support can be enhanced, it would be useful for people using VarHandle.
putChars4BV
putChars4LV
I also found that the performance of the case using VarHandle is particularly good. Why? For example:
setIntBV
setIntLV
setLongBV
setLongLV
# 2. Performance numbers
The names of these cases have the following B/L/V/U suffixes, which are:
B BigEndian
L LittleEndian
V VarHandle
U Unsafe
R reverseBytes
C Unsafe.getChar & putChar
S Unsafe.getShort & putShort
## 2.1 MacBook M1 Pro (aarch64)
Benchmark Mode Cnt Score Error Units
MergeStoreBench.getCharB avgt 15 5340.200 ? 7.038 ns/op
MergeStoreBench.getCharBU avgt 15 5482.163 ? 7.922 ns/op
MergeStoreBench.getCharBV avgt 15 5074.165 ? 6.759 ns/op
MergeStoreBench.getCharC avgt 15 5051.763 ? 6.552 ns/op
MergeStoreBench.getCharL avgt 15 5374.464 ? 9.783 ns/op
MergeStoreBench.getCharLU avgt 15 5487.532 ? 6.368 ns/op
MergeStoreBench.getCharLV avgt 15 5071.263 ? 9.717 ns/op
MergeStoreBench.getIntB avgt 15 6277.984 ? 6.284 ns/op
MergeStoreBench.getIntBU avgt 15 5232.984 ? 10.384 ns/op
MergeStoreBench.getIntBV avgt 15 1206.264 ? 1.193 ns/op
MergeStoreBench.getIntL avgt 15 6172.779 ? 1.962 ns/op
MergeStoreBench.getIntLU avgt 15 5157.317 ? 16.077 ns/op
MergeStoreBench.getIntLV avgt 15 2558.110 ? 3.402 ns/op
MergeStoreBench.getIntRB avgt 15 6889.916 ? 36.955 ns/op
MergeStoreBench.getIntRBU avgt 15 5769.950 ? 11.499 ns/op
MergeStoreBench.getIntRL avgt 15 6625.605 ? 10.662 ns/op
MergeStoreBench.getIntRLU avgt 15 5746.742 ? 11.945 ns/op
MergeStoreBench.getIntRU avgt 15 2544.586 ? 2.769 ns/op
MergeStoreBench.getIntU avgt 15 2541.119 ? 3.252 ns/op
MergeStoreBench.getLongB avgt 15 12098.129 ? 31.451 ns/op
MergeStoreBench.getLongBU avgt 15 9760.621 ? 16.427 ns/op
MergeStoreBench.getLongBV avgt 15 2593.635 ? 4.698 ns/op
MergeStoreBench.getLongL avgt 15 12031.065 ? 19.820 ns/op
MergeStoreBench.getLongLU avgt 15 9653.938 ? 18.372 ns/op
MergeStoreBench.getLongLV avgt 15 2557.521 ? 3.338 ns/op
MergeStoreBench.getLongRB avgt 15 12092.061 ? 18.026 ns/op
MergeStoreBench.getLongRBU avgt 15 9763.489 ? 17.347 ns/op
MergeStoreBench.getLongRL avgt 15 12027.686 ? 17.472 ns/op
MergeStoreBench.getLongRLU avgt 15 9649.433 ? 8.384 ns/op
MergeStoreBench.getLongRU avgt 15 2546.239 ? 2.088 ns/op
MergeStoreBench.getLongU avgt 15 2539.762 ? 1.439 ns/op
MergeStoreBench.putChars4B avgt 15 8487.381 ? 23.170 ns/op
MergeStoreBench.putChars4BU avgt 15 3830.198 ? 7.083 ns/op
MergeStoreBench.putChars4BV avgt 15 5154.819 ? 10.348 ns/op
MergeStoreBench.putChars4C avgt 15 5162.766 ? 15.041 ns/op
MergeStoreBench.putChars4L avgt 15 8381.231 ? 20.135 ns/op
MergeStoreBench.putChars4LU avgt 15 3827.784 ? 3.163 ns/op
MergeStoreBench.putChars4LV avgt 15 5151.508 ? 4.907 ns/op
MergeStoreBench.putChars4S avgt 15 5152.123 ? 7.407 ns/op
MergeStoreBench.setCharBS avgt 15 5317.319 ? 28.445 ns/op
MergeStoreBench.setCharBV avgt 15 5175.400 ? 7.110 ns/op
MergeStoreBench.setCharC avgt 15 5085.752 ? 6.222 ns/op
MergeStoreBench.setCharLS avgt 15 5294.766 ? 9.742 ns/op
MergeStoreBench.setCharLV avgt 15 5108.269 ? 6.692 ns/op
MergeStoreBench.setIntB avgt 15 5095.236 ? 2.838 ns/op
MergeStoreBench.setIntBU avgt 15 5097.007 ? 4.249 ns/op
MergeStoreBench.setIntBV avgt 15 1224.506 ? 0.976 ns/op
MergeStoreBench.setIntL avgt 15 2764.388 ? 2.400 ns/op
MergeStoreBench.setIntLU avgt 15 2573.624 ? 6.677 ns/op
MergeStoreBench.setIntLV avgt 15 5105.804 ? 11.551 ns/op
MergeStoreBench.setIntRB avgt 15 5348.785 ? 4.974 ns/op
MergeStoreBench.setIntRBU avgt 15 5422.049 ? 31.009 ns/op
MergeStoreBench.setIntRL avgt 15 5293.414 ? 8.204 ns/op
MergeStoreBench.setIntRLU avgt 15 5126.889 ? 7.435 ns/op
MergeStoreBench.setIntRU avgt 15 5097.927 ? 3.588 ns/op
MergeStoreBench.setIntU avgt 15 5087.192 ? 11.806 ns/op
MergeStoreBench.setLongB avgt 15 10249.037 ? 19.538 ns/op
MergeStoreBench.setLongBU avgt 15 10238.910 ? 11.998 ns/op
MergeStoreBench.setLongBV avgt 15 2663.647 ? 4.147 ns/op
MergeStoreBench.setLongL avgt 15 6304.458 ? 4.588 ns/op
MergeStoreBench.setLongLU avgt 15 2921.575 ? 10.649 ns/op
MergeStoreBench.setLongLV avgt 15 2663.323 ? 1.188 ns/op
MergeStoreBench.setLongRB avgt 15 10255.875 ? 19.754 ns/op
MergeStoreBench.setLongRBU avgt 15 10227.856 ? 9.970 ns/op
MergeStoreBench.setLongRL avgt 15 6641.173 ? 3.836 ns/op
MergeStoreBench.setLongRLU avgt 15 3241.057 ? 22.250 ns/op
MergeStoreBench.setLongRU avgt 15 2608.399 ? 2.243 ns/op
MergeStoreBench.setLongU avgt 15 2594.970 ? 3.490 ns/op
## 2.2 Aliyun ecs.c8a.xlarge (x64)
* CPU AMD EPYCTM Genoa
Benchmark Mode Cnt Score Error Units
MergeStoreBench.getCharB avgt 15 5969.667 ± 75.660 ns/op
MergeStoreBench.getCharBU avgt 15 4576.650 ± 27.489 ns/op
MergeStoreBench.getCharBV avgt 15 3085.061 ± 3.206 ns/op
MergeStoreBench.getCharC avgt 15 2237.624 ± 1.383 ns/op
MergeStoreBench.getCharL avgt 15 6044.112 ± 8.960 ns/op
MergeStoreBench.getCharLU avgt 15 4538.252 ± 3.747 ns/op
MergeStoreBench.getCharLV avgt 15 2221.833 ± 0.727 ns/op
MergeStoreBench.getIntB avgt 15 11983.238 ± 74.190 ns/op
MergeStoreBench.getIntBU avgt 15 9039.309 ± 6.332 ns/op
MergeStoreBench.getIntBV avgt 15 303.874 ± 0.305 ns/op
MergeStoreBench.getIntL avgt 15 10521.992 ± 15.238 ns/op
MergeStoreBench.getIntLU avgt 15 8867.106 ± 7.014 ns/op
MergeStoreBench.getIntLV avgt 15 2226.223 ± 0.887 ns/op
MergeStoreBench.getIntRB avgt 15 12332.136 ± 19.948 ns/op
MergeStoreBench.getIntRBU avgt 15 11114.256 ± 8.652 ns/op
MergeStoreBench.getIntRL avgt 15 11206.728 ± 15.291 ns/op
MergeStoreBench.getIntRLU avgt 15 9349.279 ± 7.379 ns/op
MergeStoreBench.getIntRU avgt 15 2507.213 ± 1.222 ns/op
MergeStoreBench.getIntU avgt 15 2495.432 ± 1.278 ns/op
MergeStoreBench.getLongB avgt 15 26832.797 ± 19.316 ns/op
MergeStoreBench.getLongBU avgt 15 13996.454 ± 17.628 ns/op
MergeStoreBench.getLongBV avgt 15 605.548 ± 0.538 ns/op
MergeStoreBench.getLongL avgt 15 26859.909 ± 31.234 ns/op
MergeStoreBench.getLongLU avgt 15 14519.709 ± 23.482 ns/op
MergeStoreBench.getLongLV avgt 15 2227.782 ± 0.535 ns/op
MergeStoreBench.getLongRB avgt 15 26846.549 ± 17.321 ns/op
MergeStoreBench.getLongRBU avgt 15 13994.948 ± 14.752 ns/op
MergeStoreBench.getLongRL avgt 15 26838.819 ± 14.425 ns/op
MergeStoreBench.getLongRLU avgt 15 14547.807 ± 73.859 ns/op
MergeStoreBench.getLongRU avgt 15 3061.373 ± 1.690 ns/op
MergeStoreBench.getLongU avgt 15 3049.441 ± 1.162 ns/op
MergeStoreBench.putChars4B avgt 15 13411.014 ± 4.491 ns/op
MergeStoreBench.putChars4BU avgt 15 4206.040 ± 4.317 ns/op
MergeStoreBench.putChars4BV avgt 15 7948.154 ± 904.918 ns/op
MergeStoreBench.putChars4C avgt 15 5316.859 ± 3.066 ns/op
MergeStoreBench.putChars4L avgt 15 13419.757 ± 11.175 ns/op
MergeStoreBench.putChars4LU avgt 15 4205.094 ± 5.079 ns/op
MergeStoreBench.putChars4LV avgt 15 6734.543 ± 6.452 ns/op
MergeStoreBench.putChars4S avgt 15 5323.487 ± 10.605 ns/op
MergeStoreBench.setCharBS avgt 15 9225.082 ± 11.461 ns/op
MergeStoreBench.setCharBV avgt 15 5242.360 ± 12.546 ns/op
MergeStoreBench.setCharC avgt 15 4497.345 ± 7.426 ns/op
MergeStoreBench.setCharLS avgt 15 8991.865 ± 7.281 ns/op
MergeStoreBench.setCharLV avgt 15 2535.475 ± 4.230 ns/op
MergeStoreBench.setIntB avgt 15 8036.698 ± 6.763 ns/op
MergeStoreBench.setIntBU avgt 15 10332.333 ± 10.071 ns/op
MergeStoreBench.setIntBV avgt 15 586.392 ± 1.024 ns/op
MergeStoreBench.setIntL avgt 15 2541.327 ± 4.538 ns/op
MergeStoreBench.setIntLU avgt 15 6122.574 ± 46.593 ns/op
MergeStoreBench.setIntLV avgt 15 597.930 ± 0.672 ns/op
MergeStoreBench.setIntRB avgt 15 9740.301 ± 3.367 ns/op
MergeStoreBench.setIntRBU avgt 15 10648.285 ± 29.338 ns/op
MergeStoreBench.setIntRL avgt 15 6227.445 ± 15.378 ns/op
MergeStoreBench.setIntRLU avgt 15 8409.781 ± 61.847 ns/op
MergeStoreBench.setIntRU avgt 15 631.337 ± 6.930 ns/op
MergeStoreBench.setIntU avgt 15 604.432 ± 0.682 ns/op
MergeStoreBench.setLongB avgt 15 17184.183 ± 11.490 ns/op
MergeStoreBench.setLongBU avgt 15 21377.695 ± 51.384 ns/op
MergeStoreBench.setLongBV avgt 15 1191.037 ± 10.983 ns/op
MergeStoreBench.setLongL avgt 15 3342.476 ± 4.704 ns/op
MergeStoreBench.setLongLU avgt 15 6194.791 ± 13.241 ns/op
MergeStoreBench.setLongLV avgt 15 1194.042 ± 2.943 ns/op
MergeStoreBench.setLongRB avgt 15 17946.742 ± 26.888 ns/op
MergeStoreBench.setLongRBU avgt 15 21342.899 ± 22.937 ns/op
MergeStoreBench.setLongRL avgt 15 4034.050 ± 3.792 ns/op
MergeStoreBench.setLongRLU avgt 15 4825.627 ± 11.409 ns/op
MergeStoreBench.setLongRU avgt 15 1170.252 ± 1.582 ns/op
MergeStoreBench.setLongU avgt 15 1192.220 ± 1.060 ns/op
## 2.3 Aliyun ecs.c8i.xlarge (x64)
* CPU CPU Intel® Xeon® Emerald
Benchmark Mode Cnt Score Error Units
MergeStoreBench.getCharB avgt 15 5374.604 ± 11.001 ns/op
MergeStoreBench.getCharBU avgt 15 4760.386 ± 20.612 ns/op
MergeStoreBench.getCharBV avgt 15 3068.661 ± 2.712 ns/op
MergeStoreBench.getCharC avgt 15 2591.548 ± 0.428 ns/op
MergeStoreBench.getCharL avgt 15 5224.986 ± 3.388 ns/op
MergeStoreBench.getCharLU avgt 15 4781.157 ± 19.001 ns/op
MergeStoreBench.getCharLV avgt 15 2577.009 ± 1.374 ns/op
MergeStoreBench.getIntB avgt 15 10512.241 ± 17.214 ns/op
MergeStoreBench.getIntBU avgt 15 9271.460 ± 17.628 ns/op
MergeStoreBench.getIntBV avgt 15 255.186 ± 0.731 ns/op
MergeStoreBench.getIntL avgt 15 9728.629 ± 2.364 ns/op
MergeStoreBench.getIntLU avgt 15 8983.810 ± 2.463 ns/op
MergeStoreBench.getIntLV avgt 15 2569.886 ± 1.389 ns/op
MergeStoreBench.getIntRB avgt 15 11285.198 ± 15.566 ns/op
MergeStoreBench.getIntRBU avgt 15 10321.709 ± 4.604 ns/op
MergeStoreBench.getIntRL avgt 15 10567.777 ± 3.931 ns/op
MergeStoreBench.getIntRLU avgt 15 9436.647 ± 16.046 ns/op
MergeStoreBench.getIntRU avgt 15 2327.805 ± 0.495 ns/op
MergeStoreBench.getIntU avgt 15 2310.299 ± 2.477 ns/op
MergeStoreBench.getLongB avgt 15 21698.862 ± 58.286 ns/op
MergeStoreBench.getLongBU avgt 15 14682.074 ± 22.913 ns/op
MergeStoreBench.getLongBV avgt 15 649.422 ± 2.738 ns/op
MergeStoreBench.getLongL avgt 15 21584.034 ± 29.685 ns/op
MergeStoreBench.getLongLU avgt 15 14346.370 ± 5.548 ns/op
MergeStoreBench.getLongLV avgt 15 2574.877 ± 0.748 ns/op
MergeStoreBench.getLongRB avgt 15 21689.446 ± 31.897 ns/op
MergeStoreBench.getLongRBU avgt 15 14678.181 ± 3.447 ns/op
MergeStoreBench.getLongRL avgt 15 21578.598 ± 4.353 ns/op
MergeStoreBench.getLongRLU avgt 15 14350.201 ± 37.668 ns/op
MergeStoreBench.getLongRU avgt 15 2988.364 ± 3.983 ns/op
MergeStoreBench.getLongU avgt 15 2941.190 ± 0.582 ns/op
MergeStoreBench.putChars4B avgt 15 10434.718 ± 3.309 ns/op
MergeStoreBench.putChars4BU avgt 15 3008.607 ± 1.378 ns/op
MergeStoreBench.putChars4BV avgt 15 7151.913 ± 483.572 ns/op
MergeStoreBench.putChars4C avgt 15 6489.426 ± 1.369 ns/op
MergeStoreBench.putChars4L avgt 15 10436.577 ± 5.568 ns/op
MergeStoreBench.putChars4LU avgt 15 2837.432 ± 0.697 ns/op
MergeStoreBench.putChars4LV avgt 15 7024.161 ± 9.887 ns/op
MergeStoreBench.putChars4S avgt 15 6495.194 ± 12.316 ns/op
MergeStoreBench.setCharBS avgt 15 8865.676 ± 6.476 ns/op
MergeStoreBench.setCharBV avgt 15 5002.613 ± 20.300 ns/op
MergeStoreBench.setCharC avgt 15 3936.314 ± 7.415 ns/op
MergeStoreBench.setCharLS avgt 15 6989.120 ± 23.404 ns/op
MergeStoreBench.setCharLV avgt 15 2589.797 ± 2.805 ns/op
MergeStoreBench.setIntB avgt 15 6891.353 ± 13.239 ns/op
MergeStoreBench.setIntBU avgt 15 10188.827 ± 21.409 ns/op
MergeStoreBench.setIntBV avgt 15 899.335 ± 2.777 ns/op
MergeStoreBench.setIntL avgt 15 2889.929 ± 6.582 ns/op
MergeStoreBench.setIntLU avgt 15 5314.714 ± 5.170 ns/op
MergeStoreBench.setIntLV avgt 15 945.432 ± 1.255 ns/op
MergeStoreBench.setIntRB avgt 15 8159.294 ± 16.214 ns/op
MergeStoreBench.setIntRBU avgt 15 10625.120 ± 12.809 ns/op
MergeStoreBench.setIntRL avgt 15 6035.911 ± 47.780 ns/op
MergeStoreBench.setIntRLU avgt 15 7148.487 ± 73.927 ns/op
MergeStoreBench.setIntRU avgt 15 969.966 ± 6.127 ns/op
MergeStoreBench.setIntU avgt 15 988.272 ± 2.214 ns/op
MergeStoreBench.setLongB avgt 15 15857.394 ± 9.621 ns/op
MergeStoreBench.setLongBU avgt 15 22955.799 ± 6.266 ns/op
MergeStoreBench.setLongBV avgt 15 1831.898 ± 5.519 ns/op
MergeStoreBench.setLongL avgt 15 4344.954 ± 4.273 ns/op
MergeStoreBench.setLongLU avgt 15 5452.006 ± 9.333 ns/op
MergeStoreBench.setLongLV avgt 15 1910.294 ± 22.688 ns/op
MergeStoreBench.setLongRB avgt 15 16990.616 ± 59.974 ns/op
MergeStoreBench.setLongRBU avgt 15 24951.367 ± 47.760 ns/op
MergeStoreBench.setLongRL avgt 15 4484.135 ± 5.756 ns/op
MergeStoreBench.setLongRLU avgt 15 4891.413 ± 26.743 ns/op
MergeStoreBench.setLongRU avgt 15 1820.416 ± 11.285 ns/op
MergeStoreBench.setLongU avgt 15 1932.694 ± 28.488 ns/op
[MergeStoreBench.txt](https://github.com/user-attachments/files/15878863/MergeStoreBench.txt)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19734#issuecomment-2174717821
More information about the core-libs-dev
mailing list