RFR: 8372701: Randomized profile counters [v2]

Tobias Hartmann thartmann at openjdk.org
Wed Dec 3 11:54:02 UTC 2025


On Thu, 27 Nov 2025 17:18:35 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Please use [this link](https://github.com/openjdk/jdk/pull/28541/files?w=1) to view the files changed.
>> 
>> Profile counters scale very badly.
>> 
>> The overhead for profiled code isn't too bad with one thread, but as the thread count increases, things go wrong very quickly.
>> 
>> For example, here's a benchmark from the OpenJDK test suite, run at TieredLevel 3 with one thread, then three threads:
>> 
>> 
>> Benchmark (randomized) Mode Cnt Score Error Units
>> InterfaceCalls.test2ndInt5Types false avgt 4 27.468 ± 2.631 ns/op
>> InterfaceCalls.test2ndInt5Types false avgt 4 240.010 ± 6.329 ns/op
>> 
>> 
>> This slowdown is caused by high memory contention on the profile counters. Not only is this slow, but it can also lose profile counts.
>> 
>> This patch is for C1 only. It'd be easy to randomize C1 counters as well in another PR, if anyone thinks it's worth doing.
>> 
>> One other thing to note is that randomized profile counters degrade very badly with small decimation ratios. For example, using a ratio of 2 with `-XX:ProfileCaptureRatio=2` with a single thread results in
>> 
>> 
>> Benchmark                        (randomized)  Mode  Cnt   Score   Error  Units
>> InterfaceCalls.test2ndInt5Types         false  avgt    4  80.147 ± 9.991  ns/op
>> 
>> 
>> The problem is that the branch prediction rate drops away very badly, leading to many mispredictions. It only really makes sense to use higher decimation ratios, e.g. 64.
>
> Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits:
> 
>  - Merge remote-tracking branch 'refs/remotes/origin/JDK-8134940' into JDK-8134940
>  - whitespace
>  - AArch64
>  - Minimize deltas to master
>  - Better
>  - Inter
>  - Cleanup
>  - Cleanup
>  - Merge master
>  - D'oh
>  - ... and 42 more: https://git.openjdk.org/jdk/compare/b2f97131...49d52d82

I can run our internal performance testing with this but it currently fails to build on AArch64:

[2025-12-03T11:49:29,644Z] * For target hotspot_variant-server_libjvm_objs_c1_LIRAssembler_aarch64.o:
[2025-12-03T11:49:29,644Z] /System/Volumes/Data/mesos/work_dir/slaves/da1065b5-7b94-4f0d-85e9-a3a252b9a32e-S5842/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/a3ab2e9e-0898-4ceb-94ab-4f606db9de4d/runs/44169997-4fbe-4f98-98b9-d11781843c5e/workspace/open/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp:2739:18: error: lambda capture 'op' is not used [-Werror,-Wunused-lambda-capture]
[2025-12-03T11:49:29,644Z]   auto lambda = [op, stub] (LIR_Assembler* ce, LIR_Op* base_op) {
[2025-12-03T11:49:29,644Z]                  ^~~
[2025-12-03T11:49:29,644Z] 1 error generated.
[2025-12-03T11:49:29,644Z] * For target hotspot_variant-server_libjvm_objs_static_c1_LIRAssembler_aarch64.o:
[2025-12-03T11:49:29,644Z] /System/Volumes/Data/mesos/work_dir/slaves/da1065b5-7b94-4f0d-85e9-a3a252b9a32e-S5842/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/a3ab2e9e-0898-4ceb-94ab-4f606db9de4d/runs/44169997-4fbe-4f98-98b9-d11781843c5e/workspace/open/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp:2739:18: error: lambda capture 'op' is not used [-Werror,-Wunused-lambda-capture]
[2025-12-03T11:49:29,644Z]   auto lambda = [op, stub] (LIR_Assembler* ce, LIR_Op* base_op) {
[2025-12-03T11:49:29,644Z]                  ^~~
[2025-12-03T11:49:29,644Z] 1 error generated.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28541#issuecomment-3606493180


More information about the hotspot-dev mailing list