RFR: 8320649: C2: Optimize scoped values [v18]

Andrew Haley aph at openjdk.org
Fri May 31 09:23:12 UTC 2024


On Thu, 2 May 2024 14:54:17 GMT, Roland Westrelin <roland at openjdk.org> wrote:

>> This change implements C2 optimizations for calls to
>> ScopedValue.get(). Indeed, in:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> `v2` can be replaced by `v1` and the second call to `get()` can be
>> optimized out. That's true whatever is between the 2 calls unless a
>> new mapping for `scopedValue` is created in between (when that happens
>> no optimizations is performed for the method being compiled). Hoisting
>> a `get()` call out of loop for a loop invariant `scopedValue` should
>> also be legal in most cases.
>> 
>> `ScopedValue.get()` is implemented in java code as a 2 step process. A
>> cache is attached to the current thread object. If the `ScopedValue`
>> object is in the cache then the result from `get()` is read from
>> there. Otherwise a slow call is performed that also inserts the
>> mapping in the cache. The cache itself is lazily allocated. One
>> `ScopedValue` can be hashed to 2 different indexes in the cache. On a
>> cache probe, both indexes are checked. As a consequence, the process
>> of probing the cache is a multi step process (check if the cache is
>> present, check first index, check second index if first index
>> failed). If the cache is populated early on, then when the method that
>> calls `ScopedValue.get()` is compiled, profile reports the slow path
>> as never taken and only the read from the cache is compiled.
>> 
>> To perform the optimizations, I added 3 new node types to C2:
>> 
>> - the pair
>>   ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
>>   the cache probe
>>   
>> - a cfg node ScopedValueGetResultNode to help locate the result of the
>>   `get()` call in the IR graph.
>> 
>> In pseudo code, once the nodes are inserted, the code of a `get()` is:
>> 
>> 
>> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue)
>> if (hits_in_the_cache) {
>>   res = ScopedValueGetLoadFromCache(hits_in_the_cache);
>> } else {
>>   res = ..; //slow call possibly inlined. Subgraph can be arbitray complex
>> }
>> res = ScopedValueGetResult(res)
>> 
>> 
>> In the snippet:
>> 
>> 
>> v1 = scopedValue.get();
>> ...
>> v2 = scopedValue.get();
>> 
>> 
>> Replacing `v2` by `v1` is then done by starting from the
>> `ScopedValueGetResult` node for the second `get()` and looking for a
>> dominating `ScopedValueGetResult` for the same `ScopedValue`
>> object. When one is found, it is used as a replacement. Eliminating
>> the second `get()` call is achieved by making
>> `ScopedValueGetHitsInCache` always successful if there's a dominating
>> `Scoped...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
> 
>   whitespaces

On 5/30/24 21:54, Dean Long wrote:
> I guess I don't understand how random replacement is supposed to help.
> Do you have a pointer to where I can read up on the topic?

https://en.wikipedia.org/wiki/Cache_replacement_policies#Random_replacement_(RR)
has links.

The maths is pretty simple: if you have only one slot for each entry, one
time in 16 two scoped locals will hit the same slot, and repeatedly kick
each other out. If you have two slots with random replacement, then each
get() will retry a couple of times until each entry is in a different slot.

Random replacement is good for software implementation because it doesn't
require history, unlike (say) LRU. Maintaining access history in order to
use LRU would be as expensive as the actual get().

> I would think that for small numbers of scoped values, assigning hash
> slots sequentially would work well.  Only if there is a collision, use
> secondary slot.

In practice, today's processors speculate both primary and secondary loads
in parallel, so there's no added latency for doing both. The cost is
almost nothing.

Random replacement is a low-cost way to increase the hit ratio of a cache
without incurring the added runtime cost of (say) LRU.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2141579998


More information about the hotspot-compiler-dev mailing list