RFR: 8320649: C2: Optimize scoped values [v9]

Emanuel Peter epeter at openjdk.org
Mon Feb 26 16:13:55 UTC 2024


On Fri, 16 Feb 2024 09:40:17 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>> 
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>> 
>> To perform the optimizations, I added 3 new node types to C2:
>> 
>> - the pair
>>   ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>>   the cache probe
>>   
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>>   `get()` call in the IR graph.
>> 
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>> 
>> 
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>>   res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>>   res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>> 
>> 
>> In the snippet:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   32 bit build fix

And now I'm all the way through to:
`src/hotspot/share/opto/loopTransform.cpp`

src/hotspot/share/opto/compile.cpp line 2352:

> 2350:   if (failing())  return;
> 2351: 
> 2352:     inline_scoped_value_get_calls(igvn);

Suggestion:

  inline_scoped_value_get_calls(igvn);

Indentation was wrong. Was this a result of an IDE correcting the missing braces around the if above?

src/hotspot/share/opto/escape.cpp line 1039:

> 1037:       if ((n->as_Proj()->_con == TypeFunc::Parms && n->in(0)->is_Call() &&
> 1038:            n->in(0)->as_Call()->returns_pointer()) ||
> 1039:           (n->as_Proj()->_con == ScopedValueGetResultNode::Result && n->in(0)->Opcode() == Op_ScopedValueGetResult)) {

These conditions are repeated below

src/hotspot/share/opto/escape.cpp line 1208:

> 1206:       assert((n->as_Proj()->_con == TypeFunc::Parms && n->in(0)->is_Call() &&
> 1207:              n->in(0)->as_Call()->returns_pointer()) ||
> 1208:              (n->as_Proj()->_con == ScopedValueGetResultNode::Result && n->in(0)->Opcode() == Op_ScopedValueGetResult), "Unexpected node type");

You have the same condition above.
Would it make sense to encapsulate the condition in some method?
Or better: two methods: one for the pointer return, one for the scoped-value case?

src/hotspot/share/opto/graphKit.cpp line 4269:

> 4267: }
> 4268: 
> 4269: Node* GraphKit::scopedValueCache_helper() {

Suggestion:

Node* GraphKit::scopedValueCache_handle() {

That would be less generic, and it seems that is what you call it on the call-site, i.e. the variable.

src/hotspot/share/opto/graphKit.cpp line 4281:

> 4279: 
> 4280: Node* GraphKit::scopedValueCache() {
> 4281:   Node* cache_obj_handle = scopedValueCache_helper();

Suggestion:

  Node* cache_obj_handle = scopedValueCache_handle();

src/hotspot/share/opto/graphKit.hpp line 71:

> 69: 
> 70:   const Type* scopedValueCache_type();
> 71:   Node* scopedValueCache_helper();

Suggestion:

  Node* scopedValueCache_handle();

src/hotspot/share/opto/intrinsicnode.cpp line 388:

> 386:   assert(in(0)->in(0)->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape");
> 387:   assert(in(0)->in(0)->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape");
> 388:   assert(in(0)->in(0)->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape");

Suggestion:

  assert(iff()->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape");
  assert(iff()->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape");
  assert(iff()->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape");

Might help a little with readability

src/hotspot/share/opto/intrinsicnode.hpp line 346:

> 344: public:
> 345:   ScopedValueGetLoadFromCacheNode(Compile* C, Node* c, Node* hits_in_cache)
> 346:           : Node(c, hits_in_cache) {

Suggestion:

  ScopedValueGetLoadFromCacheNode(Compile* C, Node* ctrl, Node* hits_in_cache)
          : Node(ctrl, hits_in_cache) {

Is `ScopedValueGetLoadFromCacheNode` a CFG node?

src/hotspot/share/opto/loopPredicate.cpp line 1612:

> 1610: 
> 1611:   Node* all_mem = call->in(TypeFunc::Memory);
> 1612:   MergeMemNode* mm = all_mem->is_MergeMem() ? all_mem->as_MergeMem() : nullptr;

Suggestion:

  MergeMemNode* mm = all_mem->isa_MergeMem();

Would be equivalent, right? Not sure if you prefer this, or if you want the explicit formula...

src/hotspot/share/opto/loopPredicate.cpp line 1662:

> 1660:                                      T_ADDRESS, MemNode::unordered);
> 1661:   _igvn.register_new_node_with_optimizer(handle_load);
> 1662:   set_subtree_ctrl(handle_load, true);

How impossible is it to share code with the similar code in `GraphKit`?

src/hotspot/share/opto/loopTransform.cpp line 450:

> 448: // execute before we enter the loop. When TRUE, the estimated node budget is
> 449: // also requested.
> 450: bool IdealLoopTree::policy_peeling(PhaseIdealLoop* phase, bool scoped_value_only) {

What exactly is the meaning of `scoped_value_only`?
That we only have a scoped value?

Or that we only should peel if we see a scoped value? If that: `peel_only_if_has_scoped_value`.

src/hotspot/share/opto/loopTransform.cpp line 487:

> 485: 
> 486:   while (test != _head) {   // Scan till run off top of loop
> 487:     if (test->is_If() && !scoped_value_only) {    // Test?

What is happening here? Would a comment be helpful?

src/hotspot/share/opto/loopTransform.cpp line 503:

> 501:         return estimate;    // Found reason to peel!
> 502:       }
> 503:     } else if (test->Opcode() == Op_ScopedValueGetResult && !is_member(phase->get_loop(phase->get_ctrl((test->as_ScopedValueGetResult())->scoped_value())))) {

This line is a bit difficult to read 😅

src/hotspot/share/opto/loopTransform.cpp line 504:

> 502:       }
> 503:     } else if (test->Opcode() == Op_ScopedValueGetResult && !is_member(phase->get_loop(phase->get_ctrl((test->as_ScopedValueGetResult())->scoped_value())))) {
> 504:       return estimate;

Add a comment why we return here, and with that value.

src/hotspot/share/opto/loopTransform.cpp line 3786:

> 3784:       }
> 3785:       // if the loop body has a ScopedValueGetResult for a loop invariant ScopedValue object that dominates the backedge,
> 3786:       // then peeling one iteration of the loop body will allow the entire ScopedValue subgraph from the loop body

Suggestion:

      // If the loop body has a ScopedValueGetResult for a loop invariant ScopedValue object that dominates the backedge,
      // then peeling one iteration of the loop body will allow the entire ScopedValue subgraph to be hoisted from the loop body

src/hotspot/share/opto/loopTransform.cpp line 3790:

> 3788:         phase->do_peeling(this, old_new);
> 3789:         return false;
> 3790:       }

Just because I'm curious: why do the other places not already peel these loops? I.e. why do we need this here?

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1901178273
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502798150
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502805222
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502804702
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502811677
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502811930
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502812303
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502817370
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502823004
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502839951
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502864041
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502903287
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502876269
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502881761
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502897355
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502910544
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1502915649


More information about the hotspot-compiler-dev mailing list