RFR: 8320649: C2: Optimize scoped values [v13]

Thu Apr 4 13:42:18 UTC 2024

On Mon, 25 Mar 2024 14:19:57 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>> 
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>> 
>> To perform the optimizations, I added 3 new node types to C2:
>> 
>> - the pair
>>   ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>>   the cache probe
>>   
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>>   `get()` call in the IR graph.
>> 
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>> 
>> 
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>>   res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>>   res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>> 
>> 
>> In the snippet:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   test fix

Generally looks much better again :)
I left a few more comments.

I only did a quick pass this time. Now I'm tired. Maybe I can look at some critical parts again tomorrow. Then I'll approve it, but I want at least 2 other Reviewers to look at it, just for the sheer complexity.

src/hotspot/share/opto/callGenerator.cpp line 1018:

> 1016:     //                               result = slowGet();                        result = slowGet();
> 1017:     //                               goto continue;                             goto continue;
> 1018:     //

This is really nice, definitely keep this!

src/hotspot/share/opto/callGenerator.cpp line 1218:

> 1216:     // slow_call:
> 1217:     // result = slowGet();
> 1218:     // goto continue;

Now you have duplication of these comments, see above `remove_first_probe_if_when_it_never_hits`. Would it make sense to put this somewhere more "central"?

src/hotspot/share/opto/graphKit.cpp line 4276:

> 4274:   // carrier thread's cache.
> 4275:   // return _gvn.transform(LoadNode::make(_gvn, nullptr, immutable_memory(), p, p->bottom_type()->is_ptr(),
> 4276:   //       TypeRawPtr::NOTNULL, T_ADDRESS, MemNode::unordered));

This is not dead code, but for the purpose of showing that `immutable_memory` does not work?

src/hotspot/share/opto/loopnode.cpp line 5181:

> 5179: }
> 5180: 
> 5181: bool PhaseIdealLoop::optimize_scoped_value_get_nodes() {

This is a bit of a monster method, with deep nesting. Hard to read. Can you break it up somehow into smaller methods?

src/hotspot/share/opto/loopnode.cpp line 5193:

> 5191:       }
> 5192:       IfNode* iff = hits_in_cache->success_proj()->in(0)->as_If();
> 5193:       for (uint j = 0; j < _scoped_value_get_nodes.size(); j++) {

Do you need the whole range? Now you have all i's and all j's. That is intended?

src/hotspot/share/opto/type.cpp line 617:

> 615:   TypeInstKlassPtr::OBJECT_OR_NULL = TypeInstKlassPtr::make(TypePtr::BotPTR, current->env()->Object_klass(), 0);
> 616: 
> 617:   const Type** fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*));

Suggestion:

  const Type** fgetfromcache = (const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*));

src/hotspot/share/opto/type.cpp line 622:

> 620:   fgetfromcache[2] = TypeAryPtr::OOPS;
> 621:   TypeTuple::make(3, fgetfromcache);
> 622:   const Type** fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*));

Suggestion:

  const Type** fsvgetresult = (const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*));

test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 68:

> 66:                     "testSlowPath1,testSlowPath2,testSlowPath3,testSlowPath4,testSlowPath5,testSlowPath6,testSlowPath7,testSlowPath8,testSlowPath9,testSlowPath10");
> 67:             for (String test : tests) {
> 68:                 TestFramework.runWithFlags("-XX:+TieredCompilation", "--enable-preview", "-XX:CompileCommand=dontinline,java.lang.ScopedValue::slowGet", "-DTest=" + test);

What is the reason for running each test individually?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1979811468
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551634190
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551630624
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551643903
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551686872
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551693575
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551700779
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551701152
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551703782