RFR: 8320649: C2: Optimize scoped values [v4]

Fri Jan 5 12:47:22 UTC 2024

On Fri, 5 Jan 2024 00:59:03 GMT, Dean Long <dlong at openjdk.org> wrote:

> I'm not a C2 expert, so my high-level comments might not all make sense, but here goes.
> 
>     * My first reaction was why does this need to be so complicated?

That's a fair reaction.

>   Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()?  In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant.

Initially, I thought about delaying the inlining of  `get()` methods and simply have a pass that look for `get()` calls with the same inputs. I don't think that works well because the current late inlining framework can't delay inlining very late. We don't run loop opts before we're done with inlining for instance. If we wanted to hoist a call out of loop we would need loop opts. For instance, tt's likely a call to `get()` depends on a null check that we would need to hoist first. 

The other thing about optimizing `get()` calls is that they are heavy weight nodes (a high level `get()` macro node would be very similar to a `get()` call node whichever way you look at it). We don't know how to hoist a call out of loop. A call acts as a barrier on the entire memory state and get in the way of memory optimizations. If profile reports the slow path to be never taken then the shape of the `get()` becomes lighter weight. It doesn't disrupt other optimizations. Probing the cache acts as a load + test which we know how to hoist from a loop.

It felt to me that it would be fairly common for the slow path to not be needed and given the shape without the slow path is much easier to optimize, it was important to be able to expose early on if the slow path was there or not.

> 
>     * The reason for breaking up the operations into individual nodes seems to be because of the profiling information.  So I'm wondering how much this helps, given the added complexity.

The thing about `get()` is that in simple cases, it optimizes well because of profile data. A `get()` call once inlined can essentially be hoisted out of loop if all goes well. It doesn't take much for simple optimizations on `get()` to not happen anymore. The goal of this patch is to bring consistency and have optimizations work well in all sort of scenarios. But it would be hard to sell if the simple cases don't work as well as they do without the patch. And I believe that requires profile data.

> 
>     * I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors.  The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining.

Eliminating `get()` calls with the same inputs may not be common in java code but that transformation is a building block for optimizations. Hoisting a `get()` out of loop can be achieved by peeling one iteration and letting the `get()` from the loop body be removed because it's redundant with the one from the peeled iteration. Also, code that c2 optimizes once inlining has happened and dead paths have been trimmed doesn't necessarily look like the java code the programmer wrote.

> 
>     * Needing to do things like treat ScopedValueGetHitsInCache as always successful give be a bad feeling for some reason, and seem unnecessary if we did more at a higher (macro?) level rather than eagerly expanding the high-level operation into individual nodes.

I think my comments above cover that one.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1878609794