RFR: 8320649: C2: Optimize scoped values [v4]
Emanuel Peter
epeter at openjdk.org
Wed Jan 17 11:24:10 UTC 2024
On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>>
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>>
>> To perform the optimizations, I added 3 new node types to C2:
>>
>> - the pair
>> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>> the cache probe
>>
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>> `get()` call in the IR graph.
>>
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>>
>>
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>> res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>>
>>
>> In the snippet:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> merge fix
Nice work @rwestrel
I'm sending out a first batch or comments, more coming later.
src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 109:
> 107: Node* ctl = opt_access.ctl();
> 108: assert(opt_access.mem()->is_MergeMem(), "");
> 109: MergeMemNode* mm = opt_access.mem()->as_MergeMem();
Does `as_MergeMem` not assert `is_MergeMem`?
src/hotspot/share/opto/callGenerator.cpp line 817:
> 815:
> 816: class LateInlineScopedValueCallGenerator : public LateInlineCallGenerator {
> 817: Node* _sv;
I would prefer a longer name. `_scoped_value`? Is this the pointer to the ScopedValue object? An oop? `_soped_value_oop`?
A fuller name would help me understand the pattern-matching code more easily.
src/hotspot/share/opto/callGenerator.cpp line 851:
> 849: }
> 850:
> 851: virtual void process_result(GraphKit& kit) {
It would be really nice if you refactored this huge method (6+ pages of code) into smaller units.
It would for example make separation of pattern-matching and transformation easier to see.
src/hotspot/share/opto/callGenerator.cpp line 854:
> 852: if (!_process_result) {
> 853: return;
> 854: }
Since it is not set in the constructor, and we seem to need `_sv` here, could we add this?
`assert(_sv != nullptr, "must have set scoped value to be pattern matched")`
src/hotspot/share/opto/callGenerator.cpp line 877:
> 875: wq.push(kit.control());
> 876: for (uint i = 0; i < wq.size(); ++i) {
> 877: Node* c = wq.at(i);
Could we assert that these are all CFG nodes? And give it a more expressive name?
src/hotspot/share/opto/callGenerator.cpp line 888:
> 886: } else {
> 887: if (c->Opcode() == Op_If) {
> 888: Node* bol = c->in(1);
Suggestion:
BoolNode* bol = c->in(1)->as_Bool();
Then you can drop the assert below
src/hotspot/share/opto/callGenerator.cpp line 897:
> 895: in1->in(0)->as_CallJava()->method()->intrinsic_id() == vmIntrinsics::_scopedValueCache) {
> 896: assert(in2->bottom_type() == TypePtr::NULL_PTR, "");
> 897: assert(get_cache_iff == nullptr, "");
Suggestion:
assert(get_cache_iff == nullptr, "should only find one get_cache_if");
src/hotspot/share/opto/callGenerator.cpp line 902:
> 900: scoped_value_cache = in1->in(0)->as_Call();
> 901: } else {
> 902: assert(scoped_value_cache == in1->in(0), "");
Suggestion:
assert(scoped_value_cache == in1->in(0), "should only find one scoped_value_cache");
src/hotspot/share/opto/callGenerator.cpp line 915:
> 913: assert(scoped_value_cache == in1->in(0), "");
> 914: }
> 915: continue;
Code duplication. Could easily be extracted as a helper method.
src/hotspot/share/opto/callGenerator.cpp line 923:
> 921: assert(in2 = _sv, "");
> 922: in = in1;
> 923: }
Might be less verbose:
assert(in1 == _sv || in2 == _sv, "one of the comparison values must be the scoped value (oop?)");
// pick the other:
Node* in = (in1 == _sv) ? in2 : in1;
Could we have a more expressive name for `in`?
src/hotspot/share/opto/callGenerator.cpp line 931:
> 929: assert(in->Opcode() == Op_LoadP || in->Opcode() == Op_LoadN, "");
> 930: assert(C->get_alias_index(in->adr_type()) == C->get_alias_index(TypeAryPtr::OOPS), "");
> 931: Node* addp1 = in->in(MemNode::Address);
Suggestion:
AddPNode* addp1 = in->in(MemNode::Address)->as_AddP();
And drop the assert below.
src/hotspot/share/opto/callGenerator.cpp line 940:
> 938: } else {
> 939: assert(scoped_value_cache == addp1->in(AddPNode::Base)->uncast()->in(0), "");
> 940: }
Looks like a reference to:
`ProjNode* sv_cache_proj = addp1->in(AddPNode::Base)->uncast()->as_Proj()`
Would get rid of half of your assert, and also make the code easier to read because of reuse.
And you could further simplify:
assert(scoped_value_cache == nullptr || scoped_value_cache == sv_cache_proj->in(0), "only one cache allowed");
scoped_value_cache == sv_cache_proj->in(0);
src/hotspot/share/opto/callGenerator.cpp line 949:
> 947: int header = arrayOopDesc::base_offset_in_bytes(bt);
> 948: assert(const_offset >= header, "");
> 949: const_offset -= header;
More comments about pattern would be nice, you lost me here 😅
src/hotspot/share/opto/callGenerator.cpp line 951:
> 949: const_offset -= header;
> 950:
> 951: Node* index = kit.gvn().intcon(const_offset >> shift);
Index for what? Please pick a more descriptive name.
src/hotspot/share/opto/callGenerator.cpp line 955:
> 953: assert(!addp2->in(AddPNode::Address)->is_AddP() &&
> 954: addp2->in(AddPNode::Base) == addp1->in(AddPNode::Base),
> 955: "");
An explanation in the string of the assert would be nice too.
src/hotspot/share/opto/callGenerator.cpp line 964:
> 962: if (offset2->Opcode() == Op_CastII && offset2->in(0)->is_Proj() &&
> 963: offset2->in(0)->in(0) == get_cache_iff) {
> 964: ShouldNotReachHere();
Why?
src/hotspot/share/opto/callGenerator.cpp line 981:
> 979: }
> 980: } else if (c->is_RangeCheck()) {
> 981: // Kill the range checks as they are known to always succeed
Wow, that looks a bit scary 😅
How do we know we do not accidentally kill a unrelated RangeCheck?
And what is the argument for why they always succeed?
src/hotspot/share/opto/callGenerator.cpp line 988:
> 986: assert(slow_call == nullptr, "");
> 987: slow_call = c->as_CallStaticJava();
> 988: assert(slow_call->method()->intrinsic_id() == vmIntrinsics::_SVslowGet, "");
I would move the assert to before the assignment, but that is a matter of taste.
src/hotspot/share/opto/callGenerator.cpp line 990:
> 988: assert(slow_call->method()->intrinsic_id() == vmIntrinsics::_SVslowGet, "");
> 989: } else {
> 990: assert(c->is_Proj() || c->is_Catch(), "");
Suggestion:
assert(c->is_Proj() || c->is_Catch(), "unexpected node in pattern matching");
src/hotspot/share/opto/callGenerator.cpp line 996:
> 994: }
> 995: // get_first_iff/get_second_iff contain the first/second check we ran into during the graph traversal but they may
> 996: // not be the first/second one in execution order. Perform another traversal to figure out which is first.
Why can this not be done in the first traversal, and why does this (down) traversal do the right thing?
Can we assert that `c` is always CFG?
Please mention in a comment that we are only traversing the same CFG nodes from the first traversal.
src/hotspot/share/opto/callGenerator.cpp line 998:
> 996: // not be the first/second one in execution order. Perform another traversal to figure out which is first.
> 997: if (get_second_iff != nullptr) {
> 998: Node_Stack stack(0);
No visited set. Can this trigger an exponential explosion with if/region diamonds?
src/hotspot/share/opto/callGenerator.cpp line 1031:
> 1029: CallStaticJavaNode* get_first_iff_unc = get_first_iff_failure->is_uncommon_trap_proj(Deoptimization::Reason_none);
> 1030: if (get_first_iff_unc != nullptr) {
> 1031: // first cache check never hits, keep only the second.
I'm struggling to understand:
We still have an unc-trap for the first. So we never failed so far, right? So we always found it in the cache, or am I wrong?
We are not removing this unc-trap though, right?
src/hotspot/share/opto/callGenerator.cpp line 1050:
> 1048: // Now move right above the scopedValueCache() call
> 1049: Node* mem = scoped_value_cache->in(TypeFunc::Memory);
> 1050: Node* c = scoped_value_cache->in(TypeFunc::Control);
Suggestion:
Node* ctrl = scoped_value_cache->in(TypeFunc::Control);
src/hotspot/share/opto/callGenerator.cpp line 1193:
> 1191: // continue:
> 1192: //
> 1193: // slow_call:
Makes it look like continue is a fall-through to slow_call, that is not what you want, right?
src/hotspot/share/opto/callGenerator.cpp line 1220:
> 1218: // goto continue;
> 1219: //
> 1220: // the transformed graph includes 2 copies of the cache probing logic. One represented by the
Suggestion:
// The transformed graph includes 2 copies of the cache probing logic. One represented by the
src/hotspot/share/opto/callGenerator.cpp line 1225:
> 1223: // that some paths may end with an uncommon trap and if one traps, we want the trap to be recorded for the right bci.
> 1224: // When the ScopedValueGetHitsInCache/ScopedValueGetLoadFromCache pair is expanded, split if finds the duplicate
> 1225: // logic and cleans it up.
I would prefer the comment section at the beginning of the method. Otherwise I may start reading down linearly, reverse-engineer the code, and only discover this afterwards... 😅
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1826905074
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455120226
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455185080
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455153771
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455190286
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455257893
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455157115
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455169129
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455170848
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455175485
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455200497
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455206181
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455225844
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455230819
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455245344
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455233801
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455241722
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455251712
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455253875
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455255529
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455274616
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455279448
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455292221
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455296497
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455143110
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455147489
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455150085
More information about the hotspot-compiler-dev
mailing list