RFR: 8320649: C2: Optimize scoped values [v7]
Emanuel Peter
epeter at openjdk.org
Thu Feb 8 14:10:14 UTC 2024
On Wed, 7 Feb 2024 08:06:09 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>>
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>>
>> To perform the optimizations, I added 3 new node types to C2:
>>
>> - the pair
>> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>> the cache probe
>>
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>> `get()` call in the IR graph.
>>
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>>
>>
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>> res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>>
>>
>> In the snippet:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>
> - review comment
> - Merge branch 'master' into JDK-8320649
> - Update src/hotspot/share/opto/callGenerator.cpp
>
> Co-authored-by: Emanuel Peter <emanuel.peter at oracle.com>
> - Update test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java
>
> Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>
> - merge fix
> - Merge branch 'master' into JDK-8320649
> - test failures
> - white spaces + bug id in test
> - test & fix
Nice work, Roland, everything looks better already.
Pushing a first part of today's review.
src/hotspot/share/opto/callGenerator.cpp line 816:
> 814: }
> 815:
> 816: class LateInlineScopedValueCallGenerator : public LateInlineCallGenerator {
Add comments above. What does this class do?
- inline
- parse inlined nodes
- transform inlined code to use the special nodes
This here is also a monster class.
Maybe you could split the "parse" component into a "Parse" class, and the "transform" part in yet another?
That would allow a clearer separation, also of all the fields.
One of these "stages" could take the last stages as "const".
Knowing what is modifiable and what is already given helps reading the code.
src/hotspot/share/opto/callGenerator.cpp line 865:
> 863: // Pattern matches:
> 864: // if (objects[n] == this) {
> 865: bool cache_probe(Node* c) {
Suggestion:
bool parse_cache_probe(Node* maybe_iff) {
if (maybe_iff->Opcode() != Op_If) {
return false;
}
IfNode* iff = maybe_if->as_If();
src/hotspot/share/opto/callGenerator.cpp line 908:
> 906: Node* offset2 = addp2->in(AddPNode::Offset);
> 907: assert(offset2->Opcode() == Op_LShiftX && offset2->in(2)->find_int_con(-1) == shift, "Not an array access?");
> 908: offset2 = offset2->in(1);
Why not just use a new name after the shift is removed?
src/hotspot/share/opto/callGenerator.cpp line 935:
> 933: // an uncommon trap, it could reach either the first or the second cache probe if first. Figure out which is the first
> 934: // here.
> 935: void find_first_probe_if(Unique_Node_List &wq) {
Suggestion:
void find_first_probe_if(const Unique_Node_List &wq) {
I don't think you are modifying `wq`?
What are the assumptions about `wq`? A better name would help here.
src/hotspot/share/opto/callGenerator.cpp line 936:
> 934: // here.
> 935: void find_first_probe_if(Unique_Node_List &wq) {
> 936: if (_second_cache_probe_iff != nullptr) {
bailout instead, less nesting.
src/hotspot/share/opto/callGenerator.cpp line 952:
> 950: swap(_first_cache_probe_iff, _second_cache_probe_iff);
> 951: swap(_first_index_in_cache, _second_index_in_cache);
> 952: break;
instead of breaking, can you return?
src/hotspot/share/opto/callGenerator.cpp line 961:
> 959: }
> 960: }
> 961: }
And do we assume that we find one of the two?
If so: we can assert that we don't return at the end of the method-body.
src/hotspot/share/opto/callGenerator.cpp line 966:
> 964: // first and second locations were probed. If the first if's other branch is to an uncommon trap, then that location
> 965: // never saw a cache hit. In that case, when the ScopedValueGetHitsInCacheNode is expanded, only code to probe
> 966: // the second location is added back to the IR.
> If the first if's other branch is to an uncommon trap, then that location never saw a cache hit
Does that imply that we would never have a hit, and don't even need to check for that?
src/hotspot/share/opto/callGenerator.cpp line 972:
> 970: _first_cache_probe_iff->in(1)->as_Bool()->_test._test == BoolTest::ne ? 0 : 1);
> 971: CallStaticJavaNode* get_first_iff_unc = get_first_iff_failure->is_uncommon_trap_proj(Deoptimization::Reason_none);
> 972: if (get_first_iff_unc != nullptr) {
This method here could again benefit from "bailout" instead of nesting.
src/hotspot/share/opto/callGenerator.cpp line 982:
> 980: }
> 981:
> 982: // The call for ScopedValue.get() was just inlined. The code here pattern matches the resulting subgraph. To make it
Is this "inlined" graph connected with the rest of the graph? If yes: how do we know that we only stay in the subgraph with this query? Do you have an argument for that?
src/hotspot/share/opto/callGenerator.cpp line 990:
> 988: // Thread.scopedValueCache() which acts as a marker for the beginning of the subgraph of interest. In the process a
> 989: // number of checks from the java code of ScopedValue.get() are expected to be encountered. They are recorded:
> 990: void pattern_match() {
Either rename to `parse_...`, or replace my `parse` suggestion with `pattern_match`
src/hotspot/share/opto/callGenerator.cpp line 992:
> 990: void pattern_match() {
> 991: ResourceMark rm;
> 992: Unique_Node_List wq;
Suggestion:
Unique_Node_List cfg_bfs_queue;
Would also help when you pass this to `adjust_order_or_first_and_second_probe_if` /`find_first_probe_if`, so that we know internally what this "set" holds when we query it there.
src/hotspot/share/opto/callGenerator.cpp line 995:
> 993: wq.push(_kit->control());
> 994: for (uint i = 0; i < wq.size(); ++i) {
> 995: Node* c = wq.at(i);
Suggestion:
Node* c = wq.at(i);
assert(c->is_CFG(), "only cfg nodes");
src/hotspot/share/opto/callGenerator.cpp line 1009:
> 1007: continue;
> 1008: }
> 1009: if (!cache_probe(c)) {
Suggestion:
if (!parse_cache_probe(c)) {
src/hotspot/share/opto/callGenerator.cpp line 1011:
> 1009: if (!cache_probe(c)) {
> 1010: if (c->is_RangeCheck()) {
> 1011: // Kill the range checks as they are known to always succeed
This really makes me nervous. Can you state a proof/argument in the comments?
Can this be avoided? Or verified? Or can the RC's detect that themselves and fold away?
What kinds of RC's even show up here?
Here it would be especially important to know that we do not ever under any circumstances leave the subgraph that we want to query. Otherwise all sorts of security-bugs can arise.
src/hotspot/share/opto/callGenerator.cpp line 1024:
> 1022: }
> 1023: }
> 1024: wq.push(c->in(0));
Given the complicated if/else/continue blocks, I'm not sure when the recursion happens.
Can you refactor this somehow? Maybe if you do what I suggested above that would already suffice.
src/hotspot/share/opto/callGenerator.cpp line 1032:
> 1030: // get_first_iff/get_second_iff contain the first/second check we ran into during the graph traversal but they may
> 1031: // not be the first/second one in execution order. Perform another traversal to figure out which is first.
> 1032: find_first_probe_if(wq);
Suggestion:
adjust_order_or_first_and_second_probe_if(wq);
Why does the first traversal not give the desired order? Is one traversal through the data-subgraph (no ordering) and the other through the cfg-subgraph (ordered)? If so: say so in the comment ;)
src/hotspot/share/opto/callGenerator.cpp line 1082:
> 1080: // slow_call:
> 1081: // result = slowGet();
> 1082: // goto continue;
Nice, this now looks better! 😊
src/hotspot/share/opto/callGenerator.cpp line 1086:
> 1084: // the transformed graph includes 2 copies of the cache probing logic. One represented by the
> 1085: // ScopedValueGetHitsInCache/ScopedValueGetLoadFromCache pair that is amenable to optimizations. The other from
> 1086: // the result of the parsing of the java code where the success path ends with an Halt node. The reason for that is
Suggestion:
// the result of the parsing of the java code where the success path ends with a Halt node. The reason for that is
src/hotspot/share/opto/callGenerator.cpp line 1087:
> 1085: // ScopedValueGetHitsInCache/ScopedValueGetLoadFromCache pair that is amenable to optimizations. The other from
> 1086: // the result of the parsing of the java code where the success path ends with an Halt node. The reason for that is
> 1087: // that some paths may end with an uncommon trap and if one traps, we want the trap to be recorded for the right bci.
> want the trap to be recorded for the right bci
Do we have testing for this kind of think, i.e. that we deopt correctly?
src/hotspot/share/opto/callGenerator.cpp line 1089:
> 1087: // that some paths may end with an uncommon trap and if one traps, we want the trap to be recorded for the right bci.
> 1088: // When the ScopedValueGetHitsInCache/ScopedValueGetLoadFromCache pair is expanded, split if finds the duplicate
> 1089: // logic and cleans it up.
Do we have an IR test for that?
src/hotspot/share/opto/callGenerator.cpp line 1092:
> 1090: void transform_get_subgraph() {
> 1091: Compile* C = _kit->C;
> 1092: make_current_exit_of_get_dead();
Where in the "pseudo code" above are we here? In one of the "halt" cases?
If so, we may want to rename:
Suggestion:
replace_current_exit_of_get_with_halt();
src/hotspot/share/opto/callGenerator.cpp line 1110:
> 1108:
> 1109: // replace it with its intrinsic code:
> 1110: Node* scoped_value_cache_load = scoped_value_cache_node();
Suggestion:
Node* load_scoped_value_cache_adr = make_load_scoped_value_cache_adr();
src/hotspot/share/opto/callGenerator.cpp line 1127:
> 1125: hits_in_cache->set_profile_data(1, if_cnt(_first_cache_probe_iff), probability_first_cache_probe_fails);
> 1126: float probability_second_cache_probe_fails = canonical_if_prob(_second_cache_probe_iff);
> 1127: hits_in_cache->set_profile_data(2, if_cnt(_second_cache_probe_iff), probability_second_cache_probe_fails);
Can these not be constructor arguments of `ScopedValueGetHitsInCacheNode`?
If so, maybe we can drop the `set_profile_data` method. Or do we need to modify it from elsewhere?
src/hotspot/share/opto/callGenerator.cpp line 1130:
> 1128:
> 1129: Node* sv_hits_in_cachex = _kit->gvn().transform(hits_in_cache);
> 1130: assert(sv_hits_in_cachex == hits_in_cache, "shouldn't be transformed to new node");
Suggestion:
Node* transformed_hits_in_cachex = _kit->gvn().transform(hits_in_cache);
assert(transformed_hits_in_cachex == hits_in_cache, "shouldn't be transformed to new node");
You also name it "transformed_iff" below ;)
That is nicer.
src/hotspot/share/opto/callGenerator.cpp line 1141:
> 1139: } else {
> 1140: float probability_cache_does_not_exist = 1 - probability_cache_exists;
> 1141: cache_miss_prob = probability_cache_does_not_exist + probability_cache_exists * probability_first_cache_probe_fails * probability_second_cache_probe_fails;
Nice, this is so much more readable than the previous formula 😊
src/hotspot/share/opto/callGenerator.cpp line 1150:
> 1148: assert(transformed_iff == iff, "shouldn't be transformed to new node");
> 1149: Node* not_in_cache = _kit->gvn().transform(new IfFalseNode(iff));
> 1150: Node* in_cache = _kit->gvn().transform(new IfTrueNode(iff));
names sound like conditions. But they are rather "proj" or "branch".
`if_in_cache` or `in_cache_proj`?
src/hotspot/share/opto/callGenerator.cpp line 1167:
> 1165: _kit->gvn().set_type(fallthrough, fallthrough->bottom_type());
> 1166: region_fast_slow->init_req(2, fallthrough);
> 1167: C->gvn_replace_by(slow_projs.fallthrough_catchproj, C->top());
Suggestion:
With this line moved up, the "region + 3 phi" are adjacent lines, just like you defined them above. Would be visually pleasing and easy to read.
src/hotspot/share/opto/callGenerator.cpp line 1172:
> 1170: phi_cache_value->init_req(2, slow_projs.resproj);
> 1171: }
> 1172: region_fast_slow->init_req(1, in_cache);
Move this line down to the 3 phis, so we have it all adjacent, like above.
src/hotspot/share/opto/callGenerator.cpp line 1209:
> 1207: }
> 1208: return iff->_fcnt;
> 1209: }
Might it be better to put this at `IfNode::...` as static methods?
src/hotspot/share/opto/callGenerator.cpp line 1211:
> 1209: }
> 1210:
> 1211: Node* scoped_value_cache_node() {
Suggestion:
Node* make_load_scoped_value_cache_adr() {
src/hotspot/share/opto/callGenerator.cpp line 1221:
> 1219: const TypeAryPtr* objects_type = TypeAryPtr::make(TypePtr::BotPTR, arr0, nullptr, true, 0);
> 1220: Node* scoped_value_cache_load = _kit->access_load(cache_obj_handle, objects_type, T_OBJECT, IN_NATIVE);
> 1221: return scoped_value_cache_load;
Suggestion:
Node* scoped_value_cache_load = _kit->access_load(cache_obj_handle, objects_type, T_OBJECT, IN_NATIVE);
return scoped_value_cache_load;
Suggestion:
Node* load_scoped_value_cache_adr = _kit->access_load(cache_obj_handle, objects_type, T_OBJECT, IN_NATIVE);
return load_scoped_value_cache_adr;
Because you are not loading **from** cache, but the cache adr itself
src/hotspot/share/opto/callGenerator.cpp line 1224:
> 1222: }
> 1223:
> 1224: void remove_thread_scoped_cache_call(const CallProjections& scoped_value_cache_projs) const {
What is the "thread" for?
src/hotspot/share/opto/callGenerator.cpp line 1243:
> 1241: }
> 1242:
> 1243: // Either the if leads to an Halt: that branch is never taken or it leads to an uncommon trap and the probability is
Suggestion:
// Either the if leads to a Halt: that branch is never taken or it leads to an uncommon trap and the probability is
src/hotspot/share/opto/callGenerator.cpp line 1315:
> 1313: pattern_match();
> 1314: // Now transform the subgraph in a way that makes it amenable to optimizations
> 1315: transform_get_subgraph();
Ok. So the pattern here is:
`parse_get_subgraph` -> find all the relevant inlined nodes
`transform_set_subgraph` -> replace the inlined code with dedicated Nodes that are easier to optimize.
consistent "parse" and "transform" naming would be nice. And state this at the top of the class in a comment, that we parse -> transform.
src/hotspot/share/opto/loopPredicate.cpp line 1589:
> 1587: return false;
> 1588: }
> 1589: ScopedValueGetHitsInCacheNode* hits_in_the_cache = bol->in(1)->as_ScopedValueGetHitsInCache();
Suggestion:
ScopedValueGetHitsInCacheNode* hits_in_cache = bol->in(1)->as_ScopedValueGetHitsInCache();
All other times you don't have "the" in the name
src/hotspot/share/opto/loopPredicate.cpp line 1604:
> 1602: Node* new_bol = bol->clone();
> 1603: register_new_node(new_bol, ctrl);
> 1604: Node* new_hits_in_the_cache = hits_in_the_cache->clone();
Suggestion:
Node* new_hits_in_cache = hits_in_cache->clone();
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1869771738
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482963343
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482896387
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482742265
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482842284
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482845085
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482846897
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482847077
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482873042
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482873639
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482913420
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482880408
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482933632
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482931231
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482897834
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482910876
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482891499
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482920953
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482938482
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482939484
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482941304
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482941879
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482949803
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482984953
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482990328
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1483003092
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1483001334
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1483005298
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1483022249
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1483023328
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482993542
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482981682
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482987060
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482974034
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1483027560
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482958401
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482995585
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1482996589
More information about the hotspot-compiler-dev
mailing list