RFR: 8320649: C2: Optimize scoped values [v15]

Thu Apr 18 12:31:19 UTC 2024

On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>> 
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>> 
>> To perform the optimizations, I added 3 new node types to C2:
>> 
>> - the pair
>>   ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>>   the cache probe
>>   
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>>   `get()` call in the IR graph.
>> 
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>> 
>> 
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>>   res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>>   res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>> 
>> 
>> In the snippet:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits:
> 
>  - Merge branch 'master' into JDK-8320649
>  - review
>  - test fix
>  - test fix
>  - Merge branch 'master' into JDK-8320649
>  - whitespaces
>  - review
>  - Merge branch 'master' into JDK-8320649
>  - review
>  - 32 bit build fix
>  - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e

Some more requests / comments.

src/hotspot/share/opto/graphKit.hpp line 71:

> 69: 
> 70:   const Type* scopedValueCache_type();
> 71:   Node* scopedValueCache_handle();

Should these not all be `make_` methods?

src/hotspot/share/opto/graphKit.hpp line 914:

> 912:   Node* vector_shift_count(Node* cnt, int shift_op, BasicType bt, int num_elem);
> 913: 
> 914:   Node* scopedValueCache();

Should these not all be `make_` methods?

src/hotspot/share/opto/intrinsicnode.cpp line 380:

> 378: 
> 379: IfNode* ScopedValueGetLoadFromCacheNode::iff() const {
> 380:   return in(0)->in(0)->as_If();

Suggestion:

  return in(0)->as_IfTrue()->in(0)->as_If();

Would make verification below implied.

src/hotspot/share/opto/intrinsicnode.cpp line 386:

> 384: void ScopedValueGetLoadFromCacheNode::verify() const {
> 385:   // check a ScopedValueGetHitsInCache guards this ScopedValueGetLoadFromCache
> 386:   assert(in(0)->Opcode() == Op_IfTrue, "unexpected ScopedValueGetLoadFromCache shape");

You could remove this, if you just added `as_IfTrue` to `iff()`

src/hotspot/share/opto/intrinsicnode.cpp line 390:

> 388:   assert(iff->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape");
> 389:   assert(iff->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape");
> 390:   assert(iff->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape");

Suggestion:

  assert(iff()->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape");
  assert(iff()->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape");
  assert(iff()->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape");

src/hotspot/share/opto/loopPredicate.cpp line 1561:

> 1559:                                                            IfNode* iff, IfProjNode*& new_predicate_proj) {
> 1560:   BoolNode* bol = iff->in(1)->as_Bool();
> 1561:   if (bol->in(1)->Opcode() != Op_ScopedValueGetHitsInCache){

Suggestion:

  if (bol->in(1)->Opcode() != Op_ScopedValueGetHitsInCache) {

src/hotspot/share/opto/loopPredicate.cpp line 1627:

> 1625: // It is easier to re-create the cache load subgraph rather than trying to change the inputs of the existing one to move
> 1626: // it out of loops
> 1627: Node* PhaseIdealLoop::scoped_value_cache_node(Node* raw_mem) {

Suggestion:

Node* PhaseIdealLoop::make_scoped_value_cache_node(Node* raw_mem_slice) {

This is really a `make` method. Not sure about the `slice`, just an idea.

src/hotspot/share/opto/loopnode.hpp line 703:

> 701:   bool policy_peeling(PhaseIdealLoop* phase, bool scoped_value_only);
> 702: 
> 703:   uint estimate_peeling(PhaseIdealLoop* phase, bool peel_only_if_has_scoped_value);

Can we use the same name for `scoped_value_only` and `peel_only_if_has_scoped_value`? In `policy_peeling` you pass the value into `estimate_peeling`, so it seems to be the same.

Somehow it does not sit well with me that we have such a special-case flag in such a high-level and general method. But I don't know a fix now. It just looks like not the best design. But that may not be your fault. Are there any alternatives?

src/hotspot/share/opto/loopnode.hpp line 1801:

> 1799:                                     Node*&second_index,
> 1800:                                     float &prob_cache_miss_at_first_if, float &first_if_cnt,
> 1801:                                     float &prob_cache_miss_at_second_if, float &second_if_cnt) const;

Suggestion:

  void find_most_likely_cache_index(const ScopedValueGetHitsInCacheNode* hits_in_cache, Node*& first_index,
                                    Node*& second_index,
                                    float& prob_cache_miss_at_first_if, float& first_if_cnt,
                                    float& prob_cache_miss_at_second_if, float& second_if_cnt) const;

src/hotspot/share/opto/loopopts.cpp line 3783:

> 3781:             // ScopedValueGetLoadFromCache and companion ScopedValueGetHitsInCacheNode must stay together
> 3782:             move_scoped_value_nodes_to_not_peel(peel, not_peel, peel_list, sink_list, i);
> 3783:             incr = false;

Do we not have to increment the `cloned_for_outside_use`, which affects the `estimate`?

src/hotspot/share/opto/loopopts.cpp line 3997:

> 3995: }
> 3996: 
> 3997: void PhaseIdealLoop::move_scoped_value_nodes_to_not_peel(VectorSet &peel, VectorSet &not_peel, Node_List &peel_list,

Can you please add more comments to help the reader understand? So we are not peeling in this case?

src/hotspot/share/opto/loopopts.cpp line 4010:

> 4008:   peel.remove(hits_in_cache->_idx);
> 4009:   not_peel.set(hits_in_cache->_idx);
> 4010:   peel_list.remove(i);

Looks like duplicated code from the call-site. A refactoring may help.

src/hotspot/share/opto/multnode.cpp line 284:

> 282:         if (u->is_CFG()) {
> 283:           if (wq.size() >= path_limit) {
> 284:             return false;

Can you add a comment why it is safe to just return `false`, even if we might have returned `true` if the limit was higher?

src/hotspot/share/opto/node.hpp line 1660:

> 1658:       map(i, top_of_stack);
> 1659:     }
> 1660:   }

Hmm. Technically, I think you now need to implement the `delete_at` also for `Node_List` and `Unique_Node_List`. Otherwise, if someone should use `delete_at`, for e `Unique_Node_List`, and expect a sane result, they will encounter strange bugs. At least, you should implement them with a `ShouldNotReachHere` or similar.

src/hotspot/share/opto/subnode.hpp line 341:

> 339:     assert(req() == Index1, "wrong of inputs for ScopedValueGetHitsInCacheNode");
> 340:     add_req(index1);
> 341:     assert(req() == Index2, "wrong of inputs for ScopedValueGetHitsInCacheNode");

Suggestion:

    assert(req() == Index2, "wrong number of inputs for ScopedValueGetHitsInCacheNode");

-------------

PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-2008454328
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570604281
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570604339
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570601339
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570600882
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570623797
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570592956
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570582483
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570577911
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570565104
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570556270
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570467750
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570558913
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570461310
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570453913
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570440916