RFR: 8320649: C2: Optimize scoped values [v4]
Emanuel Peter
epeter at openjdk.org
Thu Jan 18 10:45:32 UTC 2024
On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>>
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>>
>> To perform the optimizations, I added 3 new node types to C2:
>>
>> - the pair
>> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>> the cache probe
>>
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>> `get()` call in the IR graph.
>>
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>>
>>
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>> res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>>
>>
>> In the snippet:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> merge fix
Path 1 of today.
src/hotspot/share/opto/callGenerator.cpp line 872:
> 870: Node* second_index = nullptr; // index in the cache for second hash
> 871: CallStaticJavaNode* slow_call = nullptr; // slowGet() call if any
> 872: {
This setup really looks like it should be a class, maybe called `ScopedValueGetPatternMatcher`?
All your variables here could be fields, and the scope below a method, or even split into multiple methods.
src/hotspot/share/opto/loopPredicate.cpp line 1572:
> 1570:
> 1571:
> 1572: bool PhaseIdealLoop::loop_predication_for_scoped_value_get(IdealLoopTree* loop, IfProjNode* if_success_proj,
Add a short comment above, that we are trying to hoist the `If` for a `ScopedValueGetHitsInCache` out of the loop, if possible.
You have one on the first line, but I think generally we place them at the top, right?
src/hotspot/share/opto/loopPredicate.cpp line 1575:
> 1573: ParsePredicateSuccessProj* parse_predicate_proj,
> 1574: Invariance &invar, Deoptimization::DeoptReason reason,
> 1575: IfNode* iff, IfProjNode*&new_predicate_proj) {
Suggestion:
IfNode* iff, IfProjNode* &new_predicate_proj) {
src/hotspot/share/opto/loopPredicate.cpp line 1579:
> 1577: BoolNode* bol = iff->in(1)->as_Bool();
> 1578: if (bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->scoped_value()) &&
> 1579: invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index1()) && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index2())) {
Please refactor this if:
- if we don't take it, you return false, so make it a bailout. This allows you to already bailout if `bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache` fails.
- After that check, you can already have this line: `ScopedValueGetHitsInCacheNode* hits_in_the_cache = (ScopedValueGetHitsInCacheNode*) bol->in(1);`, and simplify the other 3 conditions of the if, make it more readable.
- But make them bailouts as well, instead of indenting the rest of the function.
src/hotspot/share/opto/loopPredicate.cpp line 1648:
> 1646: tty->print("Predicate invariant if: %d ", new_predicate_iff->_idx);
> 1647: loop->dump_head();
> 1648: } else if (TraceLoopOpts) {
Why not have them as separate ifs? What if someone enables both, will they not miss a line?
src/hotspot/share/opto/loopnode.cpp line 4717:
> 4715: assert(!_igvn.delay_transform(), "");
> 4716: _igvn.set_delay_transform(true);
> 4717: for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) {
Suggestion:
for (uint i = _scoped_value_get_nodes.size()-1; i >= 0; i--) {
src/hotspot/share/opto/loopnode.cpp line 4718:
> 4716: _igvn.set_delay_transform(true);
> 4717: for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) {
> 4718: Node* n = _scoped_value_get_nodes.at(i - 1);
Suggestion:
Node* n = _scoped_value_get_nodes.at(i);
src/hotspot/share/opto/loopnode.cpp line 4720:
> 4718: Node* n = _scoped_value_get_nodes.at(i - 1);
> 4719: if (n->Opcode() == Op_ScopedValueGetResult) {
> 4720: // Remove the ScopedValueGetResult entirely
Suggestion:
// Remove the ScopedValueGetResult and (its projections) entirely
src/hotspot/share/opto/loopnode.cpp line 4722:
> 4720: // Remove the ScopedValueGetResult entirely
> 4721: ScopedValueGetResultNode* get_result = (ScopedValueGetResultNode*) n;
> 4722: Node* result_out = get_result->result_out();
Suggestion:
ProjNode* result_out_proj = get_result->result_out();
knowing it is the projection helps understand that this is not a use of the result, but just the projection, which in turn has the uses below it.
src/hotspot/share/opto/loopnode.cpp line 4725:
> 4723: Node* result_in = get_result->in(ScopedValueGetResultNode::GetResult);
> 4724: if (result_out != nullptr) {
> 4725: _igvn.replace_node(result_out, result_in);
Suggestion:
_igvn.replace_node(result_out_proj, result_in);
Otherwise it has me wondering why you can replace the use here, if I don't know it is a projection.
src/hotspot/share/opto/loopnode.cpp line 4731:
> 4729: lazy_replace(get_result->control_out(), get_result->in(ScopedValueGetResultNode::Control));
> 4730: progress = true;
> 4731: remove_scoped_value_get_at(i-1);
Suggestion:
remove_scoped_value_get_at(i);
src/hotspot/share/opto/loopnode.cpp line 4734:
> 4732: }
> 4733: }
> 4734: while (_scoped_value_get_nodes.size() > 0) {
Add a comment about why we need a separate loop here.
src/hotspot/share/opto/loopnode.cpp line 4737:
> 4735: Node* n = _scoped_value_get_nodes.pop();
> 4736: assert (n->Opcode() == Op_ScopedValueGetHitsInCache, "");
> 4737: ScopedValueGetHitsInCacheNode* get_from_cache = (ScopedValueGetHitsInCacheNode*) n;
It would be nice to have some variable consistency. Elsewhere you use `sv_hits_in_cache`.
The name here suggest that this node actually "gets" something from the cache, but it only checks if we have a hit, and another node does the "getting", right?
src/hotspot/share/opto/loopnode.cpp line 4745:
> 4743: }
> 4744:
> 4745: void PhaseIdealLoop::expand_get_from_sv_cache(ScopedValueGetHitsInCacheNode* get_from_cache) {
Suggestion:
void PhaseIdealLoop::expand_sv_get_hits_in_cache_and_load_from_cache(ScopedValueGetHitsInCacheNode* get_from_cache) {
src/hotspot/share/opto/loopnode.cpp line 4752:
> 4750: assert(u->is_Bool() || u->Opcode() == Op_ScopedValueGetLoadFromCache, "");
> 4751: }
> 4752: #endif
Why not make this an enhanced version of `ScopedValueGetHitsInCacheNode::verify`?
src/hotspot/share/opto/loopnode.cpp line 4758:
> 4756: ProjNode* success = iff->proj_out(1);
> 4757: ProjNode* failure = iff->proj_out(0);
> 4758:
Suggestion:
src/hotspot/share/opto/loopnode.cpp line 4760:
> 4758:
> 4759:
> 4760: ScopedValueGetLoadFromCacheNode* load_from_cache = (ScopedValueGetLoadFromCacheNode*)success->find_unique_out_with(Op_ScopedValueGetLoadFromCache);
Would it not be nice if `find_unique_out_with` already casted the type?
src/hotspot/share/opto/loopnode.cpp line 4767:
> 4765: Node* second_index = get_from_cache->index2();
> 4766:
> 4767: if (first_index == C->top() && second_index == C->top()) {
could this not be done during igvn?
src/hotspot/share/opto/loopnode.cpp line 4772:
> 4770: _igvn.replace_input_of(iff, 1, zero);
> 4771: _igvn.replace_node(get_from_cache, C->top());
> 4772:
Suggestion:
src/hotspot/share/opto/loopnode.cpp line 4776:
> 4774: }
> 4775:
> 4776: Node* load_of_cache = get_from_cache->in(1);
Suggestion:
Node* cache_adr = get_from_cache->in(1);
src/hotspot/share/opto/loopnode.cpp line 5027:
> 5025: }
> 5026:
> 5027: void PhaseIdealLoop::remove_scoped_value_get_at(uint i) {
Should we not have this functionality at the level of `Node_List`?
`GrowableArray` also has an order-preserving and a non-preserving method (delete/remove).
src/hotspot/share/opto/node.cpp line 977:
> 975: }
> 976:
> 977: Node* Node::find_unique_out_with(int opcode) const {
Random idea:
Would it not be nice if this method automatically casted the node to that node-class?
Suggestions:
- using templates: give the class name and the opcode. A bit annoying to use
- using macros: give it the node-type name: i.e. `Add` for `AddNode`. The macro then uses the template, filling in `AddNode` and `Op_Add`. What do you think?
src/hotspot/share/opto/subnode.hpp line 300:
> 298: };
> 299:
> 300: // Does a ScopedValue.get() hits in the cache?
I finally reconstructed what this node is, and please add a comment saying something like this:
This node returns true iff this gets us a cache hit (cache reference not null, and at least one of the indices leads to a hit).
It is essencially a Cmp, comparing the `cache_adr` (you name it scoped_value_cache) with a nullptr.
But it also gets 2 indices to that cache (ints), which will either score a hit or miss.
src/hotspot/share/opto/subnode.hpp line 340:
> 338:
> 339: Node* mem() const {
> 340: return in(Memory);
Why not verify that this is a `MemNode`?
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1829122181
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457091960
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457097392
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457106474
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457105820
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457116768
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457126784
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457126959
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457133264
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457134974
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457136244
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457127201
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457137992
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457168449
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457219598
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457189722
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457196100
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457204400
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457255333
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457226374
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457244894
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457163283
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457203842
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457213443
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457251863
More information about the hotspot-compiler-dev
mailing list