RFR: 8320649: C2: Optimize scoped values [v15]
Emanuel Peter
epeter at openjdk.org
Thu Apr 18 12:31:19 UTC 2024
On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>>
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>>
>> To perform the optimizations, I added 3 new node types to C2:
>>
>> - the pair
>> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>> the cache probe
>>
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>> `get()` call in the IR graph.
>>
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>>
>>
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>> res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>>
>>
>> In the snippet:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits:
>
> - Merge branch 'master' into JDK-8320649
> - review
> - test fix
> - test fix
> - Merge branch 'master' into JDK-8320649
> - whitespaces
> - review
> - Merge branch 'master' into JDK-8320649
> - review
> - 32 bit build fix
> - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e
Some more requests / comments.
src/hotspot/share/opto/graphKit.hpp line 71:
> 69:
> 70: const Type* scopedValueCache_type();
> 71: Node* scopedValueCache_handle();
Should these not all be `make_` methods?
src/hotspot/share/opto/graphKit.hpp line 914:
> 912: Node* vector_shift_count(Node* cnt, int shift_op, BasicType bt, int num_elem);
> 913:
> 914: Node* scopedValueCache();
Should these not all be `make_` methods?
src/hotspot/share/opto/intrinsicnode.cpp line 380:
> 378:
> 379: IfNode* ScopedValueGetLoadFromCacheNode::iff() const {
> 380: return in(0)->in(0)->as_If();
Suggestion:
return in(0)->as_IfTrue()->in(0)->as_If();
Would make verification below implied.
src/hotspot/share/opto/intrinsicnode.cpp line 386:
> 384: void ScopedValueGetLoadFromCacheNode::verify() const {
> 385: // check a ScopedValueGetHitsInCache guards this ScopedValueGetLoadFromCache
> 386: assert(in(0)->Opcode() == Op_IfTrue, "unexpected ScopedValueGetLoadFromCache shape");
You could remove this, if you just added `as_IfTrue` to `iff()`
src/hotspot/share/opto/intrinsicnode.cpp line 390:
> 388: assert(iff->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape");
> 389: assert(iff->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape");
> 390: assert(iff->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape");
Suggestion:
assert(iff()->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape");
assert(iff()->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape");
assert(iff()->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape");
src/hotspot/share/opto/loopPredicate.cpp line 1561:
> 1559: IfNode* iff, IfProjNode*& new_predicate_proj) {
> 1560: BoolNode* bol = iff->in(1)->as_Bool();
> 1561: if (bol->in(1)->Opcode() != Op_ScopedValueGetHitsInCache){
Suggestion:
if (bol->in(1)->Opcode() != Op_ScopedValueGetHitsInCache) {
src/hotspot/share/opto/loopPredicate.cpp line 1627:
> 1625: // It is easier to re-create the cache load subgraph rather than trying to change the inputs of the existing one to move
> 1626: // it out of loops
> 1627: Node* PhaseIdealLoop::scoped_value_cache_node(Node* raw_mem) {
Suggestion:
Node* PhaseIdealLoop::make_scoped_value_cache_node(Node* raw_mem_slice) {
This is really a `make` method. Not sure about the `slice`, just an idea.
src/hotspot/share/opto/loopnode.hpp line 703:
> 701: bool policy_peeling(PhaseIdealLoop* phase, bool scoped_value_only);
> 702:
> 703: uint estimate_peeling(PhaseIdealLoop* phase, bool peel_only_if_has_scoped_value);
Can we use the same name for `scoped_value_only` and `peel_only_if_has_scoped_value`? In `policy_peeling` you pass the value into `estimate_peeling`, so it seems to be the same.
Somehow it does not sit well with me that we have such a special-case flag in such a high-level and general method. But I don't know a fix now. It just looks like not the best design. But that may not be your fault. Are there any alternatives?
src/hotspot/share/opto/loopnode.hpp line 1801:
> 1799: Node*&second_index,
> 1800: float &prob_cache_miss_at_first_if, float &first_if_cnt,
> 1801: float &prob_cache_miss_at_second_if, float &second_if_cnt) const;
Suggestion:
void find_most_likely_cache_index(const ScopedValueGetHitsInCacheNode* hits_in_cache, Node*& first_index,
Node*& second_index,
float& prob_cache_miss_at_first_if, float& first_if_cnt,
float& prob_cache_miss_at_second_if, float& second_if_cnt) const;
src/hotspot/share/opto/loopopts.cpp line 3783:
> 3781: // ScopedValueGetLoadFromCache and companion ScopedValueGetHitsInCacheNode must stay together
> 3782: move_scoped_value_nodes_to_not_peel(peel, not_peel, peel_list, sink_list, i);
> 3783: incr = false;
Do we not have to increment the `cloned_for_outside_use`, which affects the `estimate`?
src/hotspot/share/opto/loopopts.cpp line 3997:
> 3995: }
> 3996:
> 3997: void PhaseIdealLoop::move_scoped_value_nodes_to_not_peel(VectorSet &peel, VectorSet ¬_peel, Node_List &peel_list,
Can you please add more comments to help the reader understand? So we are not peeling in this case?
src/hotspot/share/opto/loopopts.cpp line 4010:
> 4008: peel.remove(hits_in_cache->_idx);
> 4009: not_peel.set(hits_in_cache->_idx);
> 4010: peel_list.remove(i);
Looks like duplicated code from the call-site. A refactoring may help.
src/hotspot/share/opto/multnode.cpp line 284:
> 282: if (u->is_CFG()) {
> 283: if (wq.size() >= path_limit) {
> 284: return false;
Can you add a comment why it is safe to just return `false`, even if we might have returned `true` if the limit was higher?
src/hotspot/share/opto/node.hpp line 1660:
> 1658: map(i, top_of_stack);
> 1659: }
> 1660: }
Hmm. Technically, I think you now need to implement the `delete_at` also for `Node_List` and `Unique_Node_List`. Otherwise, if someone should use `delete_at`, for e `Unique_Node_List`, and expect a sane result, they will encounter strange bugs. At least, you should implement them with a `ShouldNotReachHere` or similar.
src/hotspot/share/opto/subnode.hpp line 341:
> 339: assert(req() == Index1, "wrong of inputs for ScopedValueGetHitsInCacheNode");
> 340: add_req(index1);
> 341: assert(req() == Index2, "wrong of inputs for ScopedValueGetHitsInCacheNode");
Suggestion:
assert(req() == Index2, "wrong number of inputs for ScopedValueGetHitsInCacheNode");
-------------
PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-2008454328
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570604281
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570604339
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570601339
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570600882
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570623797
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570592956
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570582483
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570577911
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570565104
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570556270
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570467750
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570558913
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570461310
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570453913
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570440916
More information about the hotspot-compiler-dev
mailing list