RFR: 8357258: x86: Improve receiver type profiling reliability [v3]

Aleksey Shipilev shade at openjdk.org
Wed Nov 26 11:59:04 UTC 2025


On Sat, 22 Nov 2025 21:21:49 GMT, John R Rose <jrose at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
>> 
>>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>>  - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>>  - Drop atomic counters
>>  - Initial version
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4760:
> 
>> 4758: }
>> 4759: 
>> 4760: void MacroAssembler::type_profile(Register recv, Register mdp, int mdp_offset) {
> 
> The name chosen is subtly misleading.  We have value (argument/parameter/return) profiling as well as receiver profiling.  Since this particular macro-instruction is closely coupled to `ReceiverTypeData`, I suggest calling it `profile_receiver_type`, and documenting, up top, that it is precisely for collecting data into that structure.
> 
> The name being replaced (`record_klass_in_profile_helper`) has the same problem.  This is a historical artifact; the name was chosen before other sorts of type profiles were introduced.
> 
> (And `profile_receiver_type` is surely better than `receiver_type_profile`, which is not a verb phrase.)
> 
> Eventually we may wish to improve the other kinds of profiling, which have their own structures and representations.  I thought for a while about what that might look like, and particularly if it factored into a different set of macro-instructions.  Could we factor this proposed macro into a "find entry" part and an "increment counter" part?  But no, it doesn't seem to pay off.  There's benefit to preserving the jewel-like conciseness of the code pattern here.  So I guess future work on other type profiles is mostly independent.
> 
> But we do need a more specific name, that makes very clear the coupling to `ReceiverTypeData`.  Even if the old code had that problem also.  Putting it way out here in the macro-assembler makes such a problem worse, since the interpreter "knows about" MDOs, but the macro-assembler doesn't.
> 
> I don't object to moving this down to the macro-assembler.  It is no longer coupled to the interpreter, after the JIT learned the same trick.  I think we should prepare ourselves, mentally, for similar moves with the other type profile mechanisms.
> 
> I think the definition of `class ReceiverTypeData` should mention this macro.  Otherwise we won't know where to look for updates (since it's no longer bundled with the interpreter).  This macro is, in effect, a member of that class.   (That's true of other MDO structures:  Random assembly code is part of their APIs.  The C++ code is very vague about how and where this happens.  That's a problem for another time, I guess.)
> 
> Another point.  I would like to see pseudo-code that sketches what this complicated macro emits.  (I was the author of the other pseudo-code deleted by this patch; I like that sort of thing.)  I sugge...

Renamed to `profile_receiver_type`, added some comments.

My gripe with adding overly verbose comments outside the code is that they get desynced pretty often. So I opted to do a bit more generic version of the comments, and then inlined them near the code in question.

> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4812:
> 
>> 4810: 
>> 4811:   // Optimistic: search for already set up receiver.
>> 4812:   movptr(offset, base_receiver_offset);
> 
> I wondered about using REP-CMPSQ to search the receiver array.  It would require reformatting the MDO to make the receiver klasses contiguous.  The x86 manual ORM (August 2023) cheers me down:
> 
>> Using a REP prefix with string move instructions can provide high performance in the situations described above. However, using a REP prefix with string scan instructions (SCASB, SCASW, SCASD, SCASQ) or compare instructions (CMPSB, CMPSW, SMPSD, SMPSQ) is not recommended for high performance. Consider using SIMD instructions instead.
> 
> I still wonder if, at some point, it will be profitable to make the receivers contiguous so we can use SIMD instructions to search them.  Probably not any time soon.

I would say we cross that bridge when we come to it. I think it would only be useful if we bump `TypeProfileWidth` beyond `2` for C2 configurations. Otherwise, having a very dense loop looks more profitable. We shall also see whatever comes out of scalable compiler counters, before we do any other moves in this area.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564696630
PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564703946


More information about the hotspot-compiler-dev mailing list