RFR: 8320649: C2: Optimize scoped values [v4]
Emanuel Peter
epeter at openjdk.org
Wed Jan 17 14:44:00 UTC 2024
On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>>
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>>
>> To perform the optimizations, I added 3 new node types to C2:
>>
>> - the pair
>> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>> the cache probe
>>
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>> `get()` call in the IR graph.
>>
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>>
>>
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>> res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>>
>>
>> In the snippet:
>>
>>
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>>
>>
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> merge fix
Second badge of comments.
src/hotspot/share/classfile/vmIntrinsics.hpp line 301:
> 299: do_name( slowGet_name, "slowGet") \
> 300: do_intrinsic(_SVCacheInvalidate, java_lang_ScopedValue_Cache, invalidate_name, int_void_signature, F_S) \
> 301: do_name( invalidate_name, "invalidate") \
What prevents us from writing out `_scopedValueGet` etc, just like the other names here?
src/hotspot/share/opto/callGenerator.cpp line 1051:
> 1049: Node* mem = scoped_value_cache->in(TypeFunc::Memory);
> 1050: Node* c = scoped_value_cache->in(TypeFunc::Control);
> 1051: Node* io = scoped_value_cache->in(TypeFunc::I_O);
I would use `input_mem`, `input_ctrl`, `input_io`. Then the replacements below would read more intuitively.
src/hotspot/share/opto/callGenerator.cpp line 1058:
> 1056:
> 1057: // remove the scopedValueCache() call
> 1058: CallProjections scoped_value_cache_projs = CallProjections();
Suggestion:
CallProjections scoped_value_cache_projs;
Is the assignment really necessary, or style-wise preferrable? I see you use it without elsewhere.
src/hotspot/share/opto/callGenerator.cpp line 1089:
> 1087: second_index == nullptr ? C->top() : second_index);
> 1088:
> 1089: // It will later be expanded back to all the checks so record profile data
Should we also copy the node info (e.g. line number etc)?
src/hotspot/share/opto/callGenerator.cpp line 1115:
> 1113: } else {
> 1114: sv_hits_in_cache->set_profile_data(2, 0, 0);
> 1115: }
Another case of code duplication. Why not write a method that extracts `cnt, prob` for an `iff`?`Or maybe that already exists?
src/hotspot/share/opto/callGenerator.cpp line 1120:
> 1118:
> 1119: // And compute the probability of a miss in the cache
> 1120: float prob;
Suggestion:
float cache_miss_prob;
src/hotspot/share/opto/callGenerator.cpp line 1122:
> 1120: float prob;
> 1121: // get_cache_prob: probability that cache array is not null
> 1122: // get_first_prob: probability of a miss
`get_first_prob` sounds like the probability of "getting it", so not a miss. Which is it?
src/hotspot/share/opto/callGenerator.cpp line 1127:
> 1125: prob = PROB_UNKNOWN;
> 1126: } else {
> 1127: prob = (1 - get_cache_prob) + get_cache_prob * (get_first_prob + (1 - get_first_prob) * get_second_prob);
Suggestion:
cache_miss_prob = (1 - get_cache_prob) + // cache array is null
get_cache_prob * ( // cache array not null, and:
get_first_prob + // first has cache miss
(1 - get_first_prob) * get_second_prob // first hits, but second misses
);
src/hotspot/share/opto/callGenerator.cpp line 1139:
> 1137:
> 1138: // Merge the paths that produce the result (in case there's a slow path)
> 1139: Node* r = new RegionNode(3);
Suggestion:
Node* region_fast_slow = new RegionNode(3);
I think we can affort a slightly more expressive name.
src/hotspot/share/opto/callGenerator.cpp line 1151:
> 1149: phi_cache_value->init_req(1, C->top());
> 1150: phi_mem->init_req(1, C->top());
> 1151: phi_io->init_req(1, C->top());
Why not just put the slow path on input `2`, and make the size of the RegionNode depend on if there is a `slow_call`? Then you can avoid these top inputs, right?
src/hotspot/share/opto/callGenerator.cpp line 1155:
> 1153: CallProjections slow_projs;
> 1154: slow_call->extract_projections(&slow_projs, false);
> 1155: Node* fallthrough = slow_projs.fallthrough_catchproj->clone();
Why does that have to be cloned?
src/hotspot/share/opto/callGenerator.cpp line 1166:
> 1164:
> 1165: // ScopedValueGetLoadFromCache is a single that represents the result of a hit in the cache
> 1166: Node* cache_value = kit.gvn().transform(new ScopedValueGetLoadFromCacheNode(C, in_cache, sv_hits_in_cache));
Suggestion:
Node* sv_load_from_cache = kit.gvn().transform(new ScopedValueGetLoadFromCacheNode(C, in_cache, sv_hits_in_cache));
For consistency with `sv_hits_in_cache` and the node class name.
src/hotspot/share/opto/cfgnode.hpp line 737:
> 735: ProjNode* result_out() {
> 736: return proj_out_or_null(Result);
> 737: }
Either verify that we have not null, or else rename to `result_out_or_null`.
src/hotspot/share/opto/compile.cpp line 465:
> 463: remove_useless_late_inlines( &_boxing_late_inlines, useful);
> 464: remove_useless_late_inlines(&_vector_reboxing_late_inlines, useful);
> 465: remove_useless_late_inlines( &_scoped_value_late_inlines, useful);
Suggestion:
remove_useless_late_inlines( &_scoped_value_late_inlines, useful);
src/hotspot/share/opto/compile.cpp line 2034:
> 2032:
> 2033: void Compile::inline_scoped_value_calls(PhaseIterGVN& igvn) {
> 2034: if (_scoped_value_late_inlines.length() > 0) {
Rather than indenting everything, I would just check `_scoped_value_late_inlines.is_empty()` and return.
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1827249142
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455708715
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455535612
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455507963
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455600271
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455631389
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455639533
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455649353
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455651376
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455669565
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455678033
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455683828
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455692682
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455721814
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455724546
PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455742818
More information about the hotspot-compiler-dev
mailing list