RFR: 8367611: Enable vblendvp[sd] on Future ECore [v3]
Mohamed Issa
missa at openjdk.org
Tue Sep 23 19:11:50 UTC 2025
On Tue, 23 Sep 2025 19:06:49 GMT, Mohamed Issa <missa at openjdk.org> wrote:
>> The patch itself looks fine, but the way instruction emulation is implemented (as introduced by JDK-8320347) doesn't fit the design well. It does look appealing to abstract away the choice, but it becomes confusing when there's a need to reason about exact code shape. Another consequence is code duplication where both x86.ad and `MacroAssembler::vblendvp[sd]` provide their own emulated versions.
>>
>> I suggest to reshape it and extract emulated variant into separate methods leaving `Assembler::vblendvp[sd]` as is. Then, for convenience, `C2_MacroAssembler` can provide new methods which do CPU dispatching or just do dispatching explicitly at call sites in `c2_MacroAssembler_x86.cpp`. Also, it's better to move CPU sensing logic to `VM_Version`, so the checks and code emission logic can be unified across all use sites (in `c2_MacroAssembler_x86.cpp` and `x86.ad`).
>>
>> I'm fine with addressing it either as part of this PR or as a follow-up enhancement.
>
>> The patch itself looks fine, but the way instruction emulation is implemented (as introduced by JDK-8320347) doesn't fit the design well. It does look appealing to abstract away the choice, but it becomes confusing when there's a need to reason about exact code shape. Another consequence is code duplication where both x86.ad and `MacroAssembler::vblendvp[sd]` provide their own emulated versions.
>>
>> I suggest to reshape it and extract emulated variant into separate methods leaving `Assembler::vblendvp[sd]` as is. Then, for convenience, `C2_MacroAssembler` can provide new methods which do CPU dispatching or just do dispatching explicitly at call sites in `c2_MacroAssembler_x86.cpp`. Also, it's better to move CPU sensing logic to `VM_Version`, so the checks and code emission logic can be unified across all use sites (in `c2_MacroAssembler_x86.cpp` and `x86.ad`).
>>
>> I'm fine with addressing it either as part of this PR or as a follow-up enhancement.
>
> Thank you for the suggestion. I created [JDK-8368492](https://bugs.openjdk.org/browse/JDK-8368492) to track the enhancement.
> Hi @missa-prime , VBLENDVPS was always supported by AVX2 targets. We enabled its software emulation for performance reasons, as these needs microcode assistance on AVX2 targets.
> <img alt="image" width="1187" height="228" src="https://private-user-images.githubusercontent.com/59989778/492717717-a32c66bb-df82-4534-925a-1c75dd78bea7.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTg2MTk2NjcsIm5iZiI6MTc1ODYxOTM2NywicGF0aCI6Ii81OTk4OTc3OC80OTI3MTc3MTctYTMyYzY2YmItZGY4Mi00NTM0LTkyNWEtMWM3NWRkNzhiZWE3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA5MjMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwOTIzVDA5MjI0N1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg5ZTI3N2EyMjAyNDE0ZTBiMGM0ODI0ZTM4NjcwMmFiNDVhNDlkYmViNjhiMTVmOTRmZWY2MmFmZmMwMjIyZGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.aiPMZsYuQMVbnY0DLqkoycN0SxmckTMUbdGckxNxdoQ">
>
> It seems your PR is lifting this limitation for Darkmont under a constraint on dst and src1 register equivalence?
>
> Please add appropriate comments in code and documentary reference to the relevant section of the Intel Architectural Set Extension Manual.
Currently, there is no public documentation mentioning the (dst == src1) register optimization on Darkmont. Once it's available, I'll add comments and reference it.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27354#issuecomment-3325219482
More information about the hotspot-dev
mailing list