RFR: 8320649: C2: Optimize scoped values [v15]
Emanuel Peter
epeter at openjdk.org
Wed Apr 17 13:49:11 UTC 2024
On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>>
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>>
>> To perform the optimizations, I added 3 new node types to C2:
>>
>> - the pair
>> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>> the cache probe
>>
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>> `get()` call in the IR graph.
>>
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>>
>>
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>> res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>>
>>
>> In the snippet:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits:
>
> - Merge branch 'master' into JDK-8320649
> - review
> - test fix
> - test fix
> - Merge branch 'master' into JDK-8320649
> - whitespaces
> - review
> - Merge branch 'master' into JDK-8320649
> - review
> - 32 bit build fix
> - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e
src/hotspot/share/opto/loopnode.cpp line 5293:
> 5291: // ScopedValue object: either the ScopedValueGetResult and ScopedValueGetHitsInCache are from the same
> 5292: // ScopedValue.get() and we remove the ScopedValueGetResult because it's only useful to optimize
> 5293: // ScopedValue.get() where the slow path is taken. Or They are from difference ScopedValue.get() and we
Suggestion:
// ScopedValue.get() where the slow path is taken. Or they are from different ScopedValue.get() and we
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1568875169
More information about the hotspot-compiler-dev
mailing list