RFR: 8320649: C2: Optimize scoped values [v13]
Emanuel Peter
epeter at openjdk.org
Thu Apr 4 13:42:18 UTC 2024
On Mon, 25 Mar 2024 14:19:57 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>>
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>>
>> To perform the optimizations, I added 3 new node types to C2:
>>
>> - the pair
>> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>> the cache probe
>>
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>> `get()` call in the IR graph.
>>
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>>
>>
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>> res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>>
>>
>> In the snippet:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> test fix
Generally looks much better again :)
I left a few more comments.
I only did a quick pass this time. Now I'm tired. Maybe I can look at some critical parts again tomorrow. Then I'll approve it, but I want at least 2 other Reviewers to look at it, just for the sheer complexity.
src/hotspot/share/opto/callGenerator.cpp line 1018:
> 1016: // result = slowGet(); result = slowGet();
> 1017: // goto continue; goto continue;
> 1018: //
This is really nice, definitely keep this!
src/hotspot/share/opto/callGenerator.cpp line 1218:
> 1216: // slow_call:
> 1217: // result = slowGet();
> 1218: // goto continue;
Now you have duplication of these comments, see above `remove_first_probe_if_when_it_never_hits`. Would it make sense to put this somewhere more "central"?
src/hotspot/share/opto/graphKit.cpp line 4276:
> 4274: // carrier thread's cache.
> 4275: // return _gvn.transform(LoadNode::make(_gvn, nullptr, immutable_memory(), p, p->bottom_type()->is_ptr(),
> 4276: // TypeRawPtr::NOTNULL, T_ADDRESS, MemNode::unordered));
This is not dead code, but for the purpose of showing that `immutable_memory` does not work?
src/hotspot/share/opto/loopnode.cpp line 5181:
> 5179: }
> 5180:
> 5181: bool PhaseIdealLoop::optimize_scoped_value_get_nodes() {
This is a bit of a monster method, with deep nesting. Hard to read. Can you break it up somehow into smaller methods?
src/hotspot/share/opto/loopnode.cpp line 5193:
> 5191: }
> 5192: IfNode* iff = hits_in_cache->success_proj()->in(0)->as_If();
> 5193: for (uint j = 0; j < _scoped_value_get_nodes.size(); j++) {
Do you need the whole range? Now you have all i's and all j's. That is intended?
src/hotspot/share/opto/type.cpp line 617:
> 615: TypeInstKlassPtr::OBJECT_OR_NULL = TypeInstKlassPtr::make(TypePtr::BotPTR, current->env()->Object_klass(), 0);
> 616:
> 617: const Type** fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*));
Suggestion:
const Type** fgetfromcache = (const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*));
src/hotspot/share/opto/type.cpp line 622:
> 620: fgetfromcache[2] = TypeAryPtr::OOPS;
> 621: TypeTuple::make(3, fgetfromcache);
> 622: const Type** fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*));
Suggestion:
const Type** fsvgetresult = (const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*));
test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 68:
> 66: "testSlowPath1,testSlowPath2,testSlowPath3,testSlowPath4,testSlowPath5,testSlowPath6,testSlowPath7,testSlowPath8,testSlowPath9,testSlowPath10");
> 67: for (String test : tests) {
> 68: TestFramework.runWithFlags("-XX:+TieredCompilation", "--enable-preview", "-XX:CompileCommand=dontinline,java.lang.ScopedValue::slowGet", "-DTest=" + test);
What is the reason for running each test individually?
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1979811468
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551634190
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551630624
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551643903
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551686872
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551693575
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551700779
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551701152
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551703782
More information about the hotspot-compiler-dev
mailing list