RFR: 8334342: Add MergeStore JMH benchmarks

Mon Jun 17 07:54:15 UTC 2024

On Sun, 16 Jun 2024 07:17:16 GMT, Shaojin Wen <duke at openjdk.org> wrote:

> [8318446](https://github.com/openjdk/jdk/pull/16245)  brings MergeStore. We need a JMH Benchmark to evaluate the performance of various batch operations and the effect of MergeStore.

A few extra comments:
There is already a `MergeStore` benchmark. I would prefer if you put yours next to it, unless if you have a good reason.

About your list:

getIntB    
getIntBU   
getIntL    
getIntLU   
getIntRB   
getIntRBU  
getIntRL   
getIntRLU  
getLongB   
getLongBU  
getLongL   
getLongLU  
getLongRB  
getLongRBU 
getLongRL  
getLongRLU
-> Obviously a "MergeStore" optimization does not work for loads. But if it is important, then maybe we could generalize the optimizations from stores to loads.

putChars4UC
-> Does the putChars4UC get inlined? Because here you are storing 4 variables, this pattern is not handled by MergeStore.

setIntB
-> You say that here MergeStore does not work. That is because the indices are increasing, but the shifts decreasing. So that does not work on little-endian machines (most architectures), but I would expect it to work on big-endian machines with https://github.com/openjdk/jdk/pull/19218.

setIntBU
-> order seems messed up

setIntRB    
setIntRBU   
setLongB
setLongBU  
setLongRB  
setLongRBU
-> I leave the rest for you to investigate.

test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 2:

> 1: /*
> 2:  * Copyright (c) 2014, 2024, Oracle and/or its affiliates. All rights reserved.

I think for a new file, you don't want to backdate the copyright to `2014` ;)

test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 57:

> 55:             HI_BYTE_SHIFT = 0;
> 56:             LO_BYTE_SHIFT = 8;
> 57:         }

I wonder if it would just be better to duplicate the tests. Because we may extend the `MergeStore` optimization such that it allows either order, and just adds a `Reverse` operations on the bytes/chars/ints. Then it would be nice to benchmark both orders on both little and big endian architectures. What do you think?

test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 656:

> 654:         array[offset + 2] = (byte) (value >>  8);
> 655:         array[offset + 3] = (byte) (value      );
> 656:     }

You say that here `MergeStore` does not work. That is because the indices are increasing, but the shifts decreasing. So that does not work on little-endian machines (most architectures), but I would expect it to work on big-endian machines with https://github.com/openjdk/jdk/pull/19218.

test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 664:

> 662:         UNSAFE.putByte(array, address + 2, (byte) (value >> 16));
> 663:         UNSAFE.putByte(array, address + 3, (byte) (value      ));
> 664:     }

Order messed up?

test/micro/org/openjdk/bench/java/lang/MergeStoreBench.java line 850:

> 848:         UNSAFE.putChar(array, address + 4, c2);
> 849:         UNSAFE.putChar(array, address + 6, c3);
> 850:     }

Does the `putChars4UC` get inlined? Because here you are storing 4 variables, this pattern is not handled by `MergeStore`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19734#issuecomment-2172548153
PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642327120
PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642329554
PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642341934
PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642350184
PR Review Comment: https://git.openjdk.org/jdk/pull/19734#discussion_r1642343781