RFR: 8357258: x86: Improve receiver type profiling reliability [v8]
Vladimir Ivanov
vlivanov at openjdk.org
Thu Dec 4 19:17:29 UTC 2025
On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> See the bug for discussion what issues current machinery has.
>>
>> This PR executes the plan outlined in the bug:
>> 1. Common the receiver type profiling code in interpreter and C1
>> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed
>>
>> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
>>
>> Additional testing:
>> - [x] Linux x86_64 server fastdebug, `compiler/`
>> - [x] Linux x86_64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits:
>
> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
> - More comments
> - Tighten up the comments
> - Simplify third case: no need to loop, just restart the search
> - Actually have a second "fast" case: receiver is not found in the table, and the table is full
> - Pushing/popping for rare CAS path is counter-productive
> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
> - Tighten up some more
> - Offset is always rscratch1, no need to save it
> - Grossly simplify register shuffling
> - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9
Overall, looks good to me. Nice work, Aleksey!
I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it?
For example:
- 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)?
- fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1]
[1]
// Fastest: receiver is already installed
int i = 0;
for (; i < receiver_count(); i++) {
if (receiver(i) == recv) goto found_recv(i);
if (receiver(i) == null) goto found_null(i);
}
goto polymorphic
// Slow: try to install receiver
found_null(i):
// Finish the search
for (int j = i ; j < receiver_count(); j++) {
if (receiver(j) == recv) goto found_recv(j);
}
CAS(&receiver(i), null, recv);
goto restart
...
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3613949570
More information about the hotspot-compiler-dev
mailing list