RFR: 8357258: x86: Improve receiver type profiling reliability [v3]
Aleksey Shipilev
shade at openjdk.org
Wed Nov 26 11:59:04 UTC 2025
On Sat, 22 Nov 2025 21:21:49 GMT, John R Rose <jrose at openjdk.org> wrote:
>> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits:
>>
>> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>> - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls
>> - Drop atomic counters
>> - Initial version
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4760:
>
>> 4758: }
>> 4759:
>> 4760: void MacroAssembler::type_profile(Register recv, Register mdp, int mdp_offset) {
>
> The name chosen is subtly misleading. We have value (argument/parameter/return) profiling as well as receiver profiling. Since this particular macro-instruction is closely coupled to `ReceiverTypeData`, I suggest calling it `profile_receiver_type`, and documenting, up top, that it is precisely for collecting data into that structure.
>
> The name being replaced (`record_klass_in_profile_helper`) has the same problem. This is a historical artifact; the name was chosen before other sorts of type profiles were introduced.
>
> (And `profile_receiver_type` is surely better than `receiver_type_profile`, which is not a verb phrase.)
>
> Eventually we may wish to improve the other kinds of profiling, which have their own structures and representations. I thought for a while about what that might look like, and particularly if it factored into a different set of macro-instructions. Could we factor this proposed macro into a "find entry" part and an "increment counter" part? But no, it doesn't seem to pay off. There's benefit to preserving the jewel-like conciseness of the code pattern here. So I guess future work on other type profiles is mostly independent.
>
> But we do need a more specific name, that makes very clear the coupling to `ReceiverTypeData`. Even if the old code had that problem also. Putting it way out here in the macro-assembler makes such a problem worse, since the interpreter "knows about" MDOs, but the macro-assembler doesn't.
>
> I don't object to moving this down to the macro-assembler. It is no longer coupled to the interpreter, after the JIT learned the same trick. I think we should prepare ourselves, mentally, for similar moves with the other type profile mechanisms.
>
> I think the definition of `class ReceiverTypeData` should mention this macro. Otherwise we won't know where to look for updates (since it's no longer bundled with the interpreter). This macro is, in effect, a member of that class. (That's true of other MDO structures: Random assembly code is part of their APIs. The C++ code is very vague about how and where this happens. That's a problem for another time, I guess.)
>
> Another point. I would like to see pseudo-code that sketches what this complicated macro emits. (I was the author of the other pseudo-code deleted by this patch; I like that sort of thing.) I sugge...
Renamed to `profile_receiver_type`, added some comments.
My gripe with adding overly verbose comments outside the code is that they get desynced pretty often. So I opted to do a bit more generic version of the comments, and then inlined them near the code in question.
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4812:
>
>> 4810:
>> 4811: // Optimistic: search for already set up receiver.
>> 4812: movptr(offset, base_receiver_offset);
>
> I wondered about using REP-CMPSQ to search the receiver array. It would require reformatting the MDO to make the receiver klasses contiguous. The x86 manual ORM (August 2023) cheers me down:
>
>> Using a REP prefix with string move instructions can provide high performance in the situations described above. However, using a REP prefix with string scan instructions (SCASB, SCASW, SCASD, SCASQ) or compare instructions (CMPSB, CMPSW, SMPSD, SMPSQ) is not recommended for high performance. Consider using SIMD instructions instead.
>
> I still wonder if, at some point, it will be profitable to make the receivers contiguous so we can use SIMD instructions to search them. Probably not any time soon.
I would say we cross that bridge when we come to it. I think it would only be useful if we bump `TypeProfileWidth` beyond `2` for C2 configurations. Otherwise, having a very dense loop looks more profitable. We shall also see whatever comes out of scalable compiler counters, before we do any other moves in this area.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564696630
PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2564703946
More information about the hotspot-compiler-dev
mailing list