RFR: 8320649: C2: Optimize scoped values [v4]

Emanuel Peter epeter at openjdk.org
Thu Jan 18 10:45:32 UTC 2024


On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>> 
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>> 
>> To perform the optimizations, I added 3 new node types to C2:
>> 
>> - the pair
>>   ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>>   the cache probe
>>   
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>>   `get()` call in the IR graph.
>> 
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>> 
>> 
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>>   res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>>   res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>> 
>> 
>> In the snippet:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   merge fix

Path 1 of today.

src/hotspot/share/opto/callGenerator.cpp line 872:

> 870:     Node* second_index = nullptr; // index in the cache for second hash
> 871:     CallStaticJavaNode* slow_call = nullptr; // slowGet() call if any
> 872:     {

This setup really looks like it should be a class, maybe called `ScopedValueGetPatternMatcher`?
All your variables here could be fields, and the scope below a method, or even split into multiple methods.

src/hotspot/share/opto/loopPredicate.cpp line 1572:

> 1570: 
> 1571: 
> 1572: bool PhaseIdealLoop::loop_predication_for_scoped_value_get(IdealLoopTree* loop, IfProjNode* if_success_proj,

Add a short comment above, that we are trying to hoist the `If` for a `ScopedValueGetHitsInCache` out of the loop, if possible.
You have one on the first line, but I think generally we place them at the top, right?

src/hotspot/share/opto/loopPredicate.cpp line 1575:

> 1573:                                                            ParsePredicateSuccessProj* parse_predicate_proj,
> 1574:                                                            Invariance &invar, Deoptimization::DeoptReason reason,
> 1575:                                                            IfNode* iff, IfProjNode*&new_predicate_proj) {

Suggestion:

                                                           IfNode* iff, IfProjNode* &new_predicate_proj) {

src/hotspot/share/opto/loopPredicate.cpp line 1579:

> 1577:   BoolNode* bol = iff->in(1)->as_Bool();
> 1578:   if (bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->scoped_value()) &&
> 1579:       invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index1()) && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index2())) {

Please refactor this if:
- if we don't take it, you return false, so make it a bailout. This allows you to already bailout if `bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache` fails.
- After that check, you can already have this line: `ScopedValueGetHitsInCacheNode* hits_in_the_cache = (ScopedValueGetHitsInCacheNode*) bol->in(1);`, and simplify the other 3 conditions of the if, make it more readable.
- But make them bailouts as well, instead of indenting the rest of the function.

src/hotspot/share/opto/loopPredicate.cpp line 1648:

> 1646:       tty->print("Predicate invariant if: %d ", new_predicate_iff->_idx);
> 1647:       loop->dump_head();
> 1648:     } else if (TraceLoopOpts) {

Why not have them as separate ifs? What if someone enables both, will they not miss a line?

src/hotspot/share/opto/loopnode.cpp line 4717:

> 4715:   assert(!_igvn.delay_transform(), "");
> 4716:   _igvn.set_delay_transform(true);
> 4717:   for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) {

Suggestion:

  for (uint i = _scoped_value_get_nodes.size()-1; i >= 0; i--) {

src/hotspot/share/opto/loopnode.cpp line 4718:

> 4716:   _igvn.set_delay_transform(true);
> 4717:   for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) {
> 4718:     Node* n = _scoped_value_get_nodes.at(i - 1);

Suggestion:

    Node* n = _scoped_value_get_nodes.at(i);

src/hotspot/share/opto/loopnode.cpp line 4720:

> 4718:     Node* n = _scoped_value_get_nodes.at(i - 1);
> 4719:     if (n->Opcode() == Op_ScopedValueGetResult) {
> 4720:       // Remove the ScopedValueGetResult entirely

Suggestion:

      // Remove the ScopedValueGetResult and (its projections) entirely

src/hotspot/share/opto/loopnode.cpp line 4722:

> 4720:       // Remove the ScopedValueGetResult entirely
> 4721:       ScopedValueGetResultNode* get_result = (ScopedValueGetResultNode*) n;
> 4722:       Node* result_out = get_result->result_out();

Suggestion:

      ProjNode* result_out_proj = get_result->result_out();

knowing it is the projection helps understand that this is not a use of the result, but just the projection, which in turn has the uses below it.

src/hotspot/share/opto/loopnode.cpp line 4725:

> 4723:       Node* result_in = get_result->in(ScopedValueGetResultNode::GetResult);
> 4724:       if (result_out != nullptr) {
> 4725:         _igvn.replace_node(result_out, result_in);

Suggestion:

        _igvn.replace_node(result_out_proj, result_in);

Otherwise it has me wondering why you can replace the use here, if I don't know it is a projection.

src/hotspot/share/opto/loopnode.cpp line 4731:

> 4729:       lazy_replace(get_result->control_out(), get_result->in(ScopedValueGetResultNode::Control));
> 4730:       progress = true;
> 4731:       remove_scoped_value_get_at(i-1);

Suggestion:

      remove_scoped_value_get_at(i);

src/hotspot/share/opto/loopnode.cpp line 4734:

> 4732:     }
> 4733:   }
> 4734:   while (_scoped_value_get_nodes.size() > 0) {

Add a comment about why we need a separate loop here.

src/hotspot/share/opto/loopnode.cpp line 4737:

> 4735:     Node* n = _scoped_value_get_nodes.pop();
> 4736:     assert (n->Opcode() == Op_ScopedValueGetHitsInCache, "");
> 4737:     ScopedValueGetHitsInCacheNode* get_from_cache = (ScopedValueGetHitsInCacheNode*) n;

It would be nice to have some variable consistency. Elsewhere you use `sv_hits_in_cache`.
The name here suggest that this node actually "gets" something from the cache, but it only checks if we have a hit, and another node does the "getting", right?

src/hotspot/share/opto/loopnode.cpp line 4745:

> 4743: }
> 4744: 
> 4745: void PhaseIdealLoop::expand_get_from_sv_cache(ScopedValueGetHitsInCacheNode* get_from_cache) {

Suggestion:

void PhaseIdealLoop::expand_sv_get_hits_in_cache_and_load_from_cache(ScopedValueGetHitsInCacheNode* get_from_cache) {

src/hotspot/share/opto/loopnode.cpp line 4752:

> 4750:     assert(u->is_Bool() || u->Opcode() == Op_ScopedValueGetLoadFromCache, "");
> 4751:   }
> 4752: #endif

Why not make this an enhanced version of `ScopedValueGetHitsInCacheNode::verify`?

src/hotspot/share/opto/loopnode.cpp line 4758:

> 4756:   ProjNode* success = iff->proj_out(1);
> 4757:   ProjNode* failure = iff->proj_out(0);
> 4758: 

Suggestion:

src/hotspot/share/opto/loopnode.cpp line 4760:

> 4758: 
> 4759: 
> 4760:   ScopedValueGetLoadFromCacheNode* load_from_cache = (ScopedValueGetLoadFromCacheNode*)success->find_unique_out_with(Op_ScopedValueGetLoadFromCache);

Would it not be nice if `find_unique_out_with` already casted the type?

src/hotspot/share/opto/loopnode.cpp line 4767:

> 4765:   Node* second_index = get_from_cache->index2();
> 4766: 
> 4767:   if (first_index == C->top() && second_index == C->top()) {

could this not be done during igvn?

src/hotspot/share/opto/loopnode.cpp line 4772:

> 4770:     _igvn.replace_input_of(iff, 1, zero);
> 4771:     _igvn.replace_node(get_from_cache, C->top());
> 4772: 

Suggestion:

src/hotspot/share/opto/loopnode.cpp line 4776:

> 4774:   }
> 4775: 
> 4776:   Node* load_of_cache = get_from_cache->in(1);

Suggestion:

  Node* cache_adr = get_from_cache->in(1);

src/hotspot/share/opto/loopnode.cpp line 5027:

> 5025: }
> 5026: 
> 5027: void PhaseIdealLoop::remove_scoped_value_get_at(uint i) {

Should we not have this functionality at the level of `Node_List`?
`GrowableArray` also has an order-preserving and a non-preserving method (delete/remove).

src/hotspot/share/opto/node.cpp line 977:

> 975: }
> 976: 
> 977: Node* Node::find_unique_out_with(int opcode) const {

Random idea:
Would it not be nice if this method automatically casted the node to that node-class?
Suggestions:
- using templates: give the class name and the opcode. A bit annoying to use
- using macros: give it the node-type name: i.e. `Add` for `AddNode`. The macro then uses the template, filling in `AddNode` and `Op_Add`. What do you think?

src/hotspot/share/opto/subnode.hpp line 300:

> 298: };
> 299: 
> 300: // Does a ScopedValue.get() hits in the cache?

I finally reconstructed what this node is, and please add a comment saying something like this:
This node returns true iff this gets us a cache hit (cache reference not null, and at least one of the indices leads to a hit).
It is essencially a Cmp, comparing the `cache_adr` (you name it scoped_value_cache) with a nullptr.
But it also gets 2 indices to that cache (ints), which will either score a hit or miss.

src/hotspot/share/opto/subnode.hpp line 340:

> 338: 
> 339:   Node* mem() const {
> 340:     return in(Memory);

Why not verify that this is a `MemNode`?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1829122181
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457091960
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457097392
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457106474
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457105820
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457116768
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457126784
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457126959
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457133264
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457134974
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457136244
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457127201
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457137992
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457168449
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457219598
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457189722
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457196100
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457204400
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457255333
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457226374
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457244894
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457163283
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457203842
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457213443
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457251863


More information about the hotspot-compiler-dev mailing list