RFR: 8320649: C2: Optimize scoped values [v18]

Thu May 30 16:41:15 UTC 2024

On Thu, 23 May 2024 23:18:49 GMT, Dean Long <dlong at openjdk.org> wrote:

> What's a good benchmark to run to show the benefit of this change, or to show the effect of different cache sizes and/or Java implementation changes?
> 
> I tried running micro:ScopedValue benchmarks with -Djava.lang.ScopedValue.cacheSize=2 and didn't see a difference. But the new compiler/scoped_value/TestScopedValue.java test fails in compiler.c2.irTests.TestScopedValue.testFastPath16 with the cache size set to 2.

With `-jvmArgsAppend -Djava.lang.ScopedValue.cacheSize=4` I get

Benchmark                               Mode  Cnt   Score   Error  Units
`ScopedValues.sixValues_ScopedValue      avgt   10  11.881 ± 0.017  us/op`

before this patch and 

`ScopedValues.sixValues_ScopedValue      avgt   10  0.006 ±  0.001  us/op`

after.

> Given the right benchmark, there are some experiments I'd like to try, related to the ScopedValue Java implemenation:
> 
>     1. use only a primary slot probe, no secondary
> 
>     2. use a deterministic secondary probe (based on the hash), not random

Looks pretty deterministic to me. Every value has two hash codes, primary and secondary,and they are different.

>     3. fix put() so it will reuse an existing slot.  Currently it blindly set both `victim` and `other` slots.  It seems like it should check the `other` slot first and reuse it if already set.

Put another conditional load in the control flow? I'm not sure that would do much, but OK. I guess I don't know how this would work.

>     4. separate cache bitmap from slow path bitmaps, which could be 64-bits with only 1 bit per SV, not 2.

I guess that might help.

>     5. Use a per-SV MethodHandle getter using MethodHandles.guardWithTest() to avoid profile pollution

Interesting. I did a version of the code that used bytecode generation to produce a new accessor method for each scoped value a year or two ago, for that same reason. It did work, but was rather heavyweight.

Re benchmarks: the benchmarks are all there, but the current design is based on principles, as well as benchmark results.

0. As much as possible, and this is hard to do with just random hashing, I (and I believe, we) want the performance to be linear and predictable, rather than mostly luck. That's what this patch really brings to the party! 

1. We have a hard guarantee, not just a probabilistic one, that if you are repeatedly using two different scoped values, we never have to do a fallback linear search. 
This is hard to capture with a benchmark, but I guess setting `java.lang.ScopedValue.cacheSize=2` would do it.

2. The bind method may do a little bit of extra work to help the `get()`, but not too much: I can see some cases where binding is done fairly frequently, and it should not be too heavyweight. But I don't know what _too heavyweight_ really means, so...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2140201010