RFR: 8320649: C2: Optimize scoped values [v15]

Emanuel Peter epeter at openjdk.org
Thu Apr 18 10:20:07 UTC 2024


On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>> 
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>> 
>> To perform the optimizations, I added 3 new node types to C2:
>> 
>> - the pair
>>   ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>>   the cache probe
>>   
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>>   `get()` call in the IR graph.
>> 
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>> 
>> 
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>>   res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>>   res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>> 
>> 
>> In the snippet:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits:
> 
>  - Merge branch 'master' into JDK-8320649
>  - review
>  - test fix
>  - test fix
>  - Merge branch 'master' into JDK-8320649
>  - whitespaces
>  - review
>  - Merge branch 'master' into JDK-8320649
>  - review
>  - 32 bit build fix
>  - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e

Review of your tests.

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 2:

> 1: /*
> 2:  * Copyright (c) 2024, Red Hat, Inc. All rights reserved.

I like the tests, there is a lot of material here.

A few more ideas:
- have two scoped values, and then have a sequence of `get` and `getValue` calls on them, in some random mix. And check that everything gets commoned, and the result is correct.
- have a method that directly uses `get`, but also has inner scopes of `where`/`get`. Interleave these, maybe even with multiple different scoped values. And nest them with various depths. And then verify both the expected number of calls / loads, as well as the result.

Also: is it possible to stuff ScopedValues into ScopedValues? That would be another interesting stress-test with lots of options.

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 24:

> 22:  */
> 23: 
> 24: package compiler.c2.irTests;

Christian and I have discussed this a while back: it would be nicer to put tests where they belong thematically. For example now it would be difficult to find all ScopedValue compiler tests, some are in the `irTests` directory, some elsewhere. Hence, I suggest you put them all under `compiler/scoped_value` or similar.

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 137:

> 135:         MyDouble sv1 = sv.get();
> 136:         notInlined();
> 137:         MyDouble sv2 = sv.get(); // Doesn't optimize out (load of sv cannot common)

Is this a necessary constraint, or a limitation of the optimization? Please add a corresponding comment. That would be helpful if this test all of the sudden failed the IR rule, and one has to debug.

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 185:

> 183:     @IR(counts = {IRNode.IF, "<= 4", IRNode.LOAD_P_OR_N, "<= 5" })
> 184:     public static void testFastPath5() {
> 185:         Object unused = svObject.get(); // cannot be removed if result not used

why? could there be some exception? please add comment why.

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 236:

> 234:     @IR(counts = {IRNode.LOAD_D, "1" })
> 235:     public static double testFastPath7(boolean[] flags) {
> 236:         double res = 0;

Suggestion:

        double res = 0;
        // hoisted here before the loop, and commoned.

Would that be correct?

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 485:

> 483:         TestFramework.assertDeoptimizedByC2(m);
> 484:         // Compile again
> 485:         runAndCompile15();

Might it be good to do some result verification, i.e. that the `get` always returns the expected object?

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 524:

> 522:         MyDouble sv1 = localSV.get();
> 523:         notInlined();
> 524:         MyDouble sv2 = localSV.get(); // should optimize out

Why does this work now, and some other cases with `notInlined` in between do not work?

test/hotspot/jtreg/compiler/scoped_value/TestScopedValueBadDominatorAfterExpansion.java line 30:

> 28:  * @summary SIGSEGV in PhaseIdealLoop::get_early_ctrl()
> 29:  * @compile --enable-preview -source ${jdk.version} TestScopedValueBadDominatorAfterExpansion.java
> 30:  * @run main/othervm --enable-preview -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:+UseParallelGC TestScopedValueBadDominatorAfterExpansion

Why is the parallel GC required here? Can you also have a run without flags, so that other GC's could be tried with this code?

test/hotspot/jtreg/compiler/scoped_value/TestScopedValueBadDominatorAfterExpansion.java line 40:

> 38: 
> 39:     public static void main(String[] args) {
> 40:         Object o = new Object();

Would it not be nice to use 2 objects, and verify the fields afterwards, that they have the correct objects?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-2008386881
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570432089
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570395734
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570400334
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570403293
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570406718
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570419353
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570421326
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570390446
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570388963


More information about the hotspot-compiler-dev mailing list