RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7]

Thu Mar 13 08:17:01 UTC 2025

On Tue, 11 Mar 2025 19:49:41 GMT, Kangcheng Xu <kxu at openjdk.org> wrote:

>> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR.
>> 
>> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`)
>> 
>> The following was implemented to address this issue.
>> 
>> if (UseNewCode2) {
>>     *multiplier = bt == T_INT
>>         ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows
>>         : ((jlong) 1) << con->get_int();
>> } else {
>>     *multiplier = ((jlong) 1 << con->get_int());
>> }
>> 
>> 
>> Two new bitshift overflow tests were added.
>
> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add micro benchmark

This looks really interesting!

I see that you are doing some special pattern matching. I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`), sort by `variable`, and compute new additions for each `variable`. What do you think?

src/hotspot/share/opto/addnode.cpp line 407:

> 405: }
> 406: 
> 407: // Try to convert a serial of additions into a single multiplication. Also convert `(a * CON) + a` to `(CON + 1) * a` as

What about `(a * CON1) + (a * CON2)`? Like `11 * a + 5 * a`. Do we also optimize that?

src/hotspot/share/opto/addnode.cpp line 413:

> 411:   // power-of-2 addition (e.g., 3 * a => (a << 2) + a). Without this check, GVN would keep trying to optimize the same
> 412:   // node and can't progress. For example, 3 * a => (a << 2) + a => 3 * a => (a << 2) + a => ...
> 413:   if (find_power_of_two_addition_pattern(this, bt, nullptr) != nullptr) {

Where does the optimization `3 * a => (a << 2) + a` happen? Do we use `find_power_of_two_addition_pattern` there too? If not: how do we prevent the code of the two locations from diverging in the future?

src/hotspot/share/opto/addnode.cpp line 427:

> 425:       || find_simple_multiplication_pattern(in1, bt, &multiplier) == in2
> 426:       || find_power_of_two_addition_pattern(in1, bt, &multiplier) == in2) {
> 427:     multiplier++; // +1 for the in2 term

Nit: I think we generally have the `||` at the end of the line, not the beginning.

src/hotspot/share/opto/addnode.cpp line 447:

> 445: 
> 446:   return nullptr;
> 447: }

I'm not a great fan of "output arguments" such as the `multiplier` here.
Why not create a class/struct `Multiplication`, which has a field `valid` (instead of returning `nullptr`). And fields `variable` and `multiplier`. The fields can all be constant.
You could even have an `add` method, that adds two such `Multiplication`s together.

src/hotspot/share/opto/addnode.cpp line 480:

> 478:     if (!con->is_Con()) {
> 479:       swap(con, base);
> 480:     }

Is that necessary? Does `Mul` not automatically get canonicalized so that the constant is on the rhs?

src/hotspot/share/opto/addnode.cpp line 500:

> 498: // Note that one of the term of the addition could simply be `a` (i.e., a << 0). Calling this function with `multiplier`
> 499: // being null is safe.
> 500: Node* AddNode::find_power_of_two_addition_pattern(Node* n, BasicType bt, jlong* multiplier) {

This code here looks quite complicated. Why not parse both sides of the add with a `find_simple_lshift_pattern`, and then check that they use the same variable?

test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java line 43:

> 41:  */
> 42: public class TestSerialAdditions {
> 43:     private static final Random RNG = Utils.getRandomInstance();

Could you use `Generators.java` instead? It will produce more "interesting" random values, such as powers of 2, and values close to powers of 2.

-------------

PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-2680840690
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992974452
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992971118
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992953844
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992966421
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992978164
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992980967
PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r1992984685