RFR: 8372701: Randomized profile counters [v2]
Andrew Haley
aph at openjdk.org
Thu Nov 27 17:59:55 UTC 2025
On Thu, 27 Nov 2025 17:18:35 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Please use [this link](https://github.com/openjdk/jdk/pull/28541/files?w=1) to view the files changed.
>>
>> Profile counters scale very badly.
>>
>> The overhead for profiled code isn't too bad with one thread, but as the thread count increases, things go wrong very quickly.
>>
>> For example, here's a benchmark from the OpenJDK test suite, run at TieredLevel 3 with one thread, then three threads:
>>
>>
>> Benchmark (randomized) Mode Cnt Score Error Units
>> InterfaceCalls.test2ndInt5Types false avgt 4 27.468 ± 2.631 ns/op
>> InterfaceCalls.test2ndInt5Types false avgt 4 240.010 ± 6.329 ns/op
>>
>>
>> This slowdown is caused by high memory contention on the profile counters. Not only is this slow, but it can also lose profile counts.
>>
>> This patch is for C1 only. It'd be easy to randomize C1 counters as well in another PR, if anyone thinks it's worth doing.
>>
>> One other thing to note is that randomized profile counters degrade very badly with small decimation ratios. For example, using a ratio of 2 with `-XX:ProfileCaptureRatio=2` with a single thread results in
>>
>>
>> Benchmark (randomized) Mode Cnt Score Error Units
>> InterfaceCalls.test2ndInt5Types false avgt 4 80.147 ± 9.991 ns/op
>>
>>
>> The problem is that the branch prediction rate drops away very badly, leading to many mispredictions. It only really makes sense to use higher decimation ratios, e.g. 64.
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits:
>
> - Merge remote-tracking branch 'refs/remotes/origin/JDK-8134940' into JDK-8134940
> - whitespace
> - AArch64
> - Minimize deltas to master
> - Better
> - Inter
> - Cleanup
> - Cleanup
> - Merge master
> - D'oh
> - ... and 42 more: https://git.openjdk.org/jdk/compare/b2f97131...49d52d82
> > Also, I believe there are some kinds of event that should never be missed, even when subsampling profile counters in this way. I'd like people to advise me which events these are
>
> One other thing that comes into mind: the initial swing from `0` -> `1` for a type counter is important, since `0` means "never seen the type at all", and `>0` means "maybe the type is present, however rare". I would suspect subsampling a small count to `0` would cause performance anomalies. Especially if, say, this anomaly causes a deopt - reprofile - compile cycle. It would doubly hurt, if _reprofile_ would miss the type _again_. Probably hard to do with RNG, but maybe we should be doing the initial counter seed on installation without consulting RNG. I don't think current patch does it, but maybe I am looking at the wrong place. Would be fairly trivial to do after #25305.
OK, all useful thoughts. I'll have a look.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28541#issuecomment-3586932951
More information about the hotspot-dev
mailing list