RFR: 8357258: x86: Improve receiver type profiling reliability [v8]

Vladimir Ivanov vlivanov at openjdk.org
Thu Dec 4 19:17:29 UTC 2025


On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the bug for discussion what issues current machinery has. 
>> 
>> This PR executes the plan outlined in the bug:
>>  1. Common the receiver type profiling code in interpreter and C1
>>  2. Rewrite receiver type profiling code to only do atomic receiver slot installations
>>  3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed 
>> 
>> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
>> 
>> Additional testing:
>>   - [x] Linux x86_64 server fastdebug, `compiler/`
>>   - [x] Linux x86_64 server fastdebug, `all`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits:
> 
>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>  - More comments
>  - Tighten up the comments
>  - Simplify third case: no need to loop, just restart the search
>  - Actually have a second "fast" case: receiver is not found in the table, and the table is full
>  - Pushing/popping for rare CAS path is counter-productive
>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>  - Tighten up some more
>  - Offset is always rscratch1, no need to save it
>  - Grossly simplify register shuffling
>  - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9

Overall, looks good to me. Nice work, Aleksey!

I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it?

For example:
  - 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)?
  -  fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1]

[1]

    // Fastest: receiver is already installed
    int i = 0;
    for (; i < receiver_count(); i++) {
      if (receiver(i) == recv) goto found_recv(i);
      if (receiver(i) == null) goto found_null(i);
    }
  
    goto polymorphic
  
    // Slow: try to install receiver
  found_null(i):
    // Finish the search
    for (int j = i ; j < receiver_count(); j++) {
      if (receiver(j) == recv) goto found_recv(j);
    }
    CAS(&receiver(i), null, recv);
    goto restart
...

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3613949570


More information about the hotspot-compiler-dev mailing list