RFR: 8373613: PEXT/PDEP intrinsics cause performance regression on AMD pre-Zen 3 CPUs [v4]
Alessandro Autiero
duke at openjdk.org
Wed Feb 25 21:46:27 UTC 2026
> The `Integer/Long.compress()` and `Integer/Long.expand()` intrinsics (added in JDK 19 by JDK-8283893) unconditionally use BMI2 PEXT/PDEP instructions on all BMI2-capable x86 CPUs. However, AMD processors before Zen 3 (Family < 0x19) and Zhaoxin CPUs implement PEXT/PDEP via microcode with ~18-cycle latency, making the intrinsified path 1.4-2.4x slower than the Java software fallback.
>
> This patch introduces a `CPU_FAST_BMI2` feature flag that distinguishes CPUs with native hardware PEXT/PDEP support (Intel Haswell+, AMD Zen 3+) from those with slow microcoded implementations. The `Op_CompressBits`/`Op_ExpandBits` match rules in `x86.ad` are gated on this new flag instead of the general `supports_bmi2()`, so CPUs without fast hardware support fall back to the Java implementation. C2 IR tests are updated to use the new `fast_bmi2` CPU feature predicate.
Alessandro Autiero has updated the pull request incrementally with one additional commit since the last revision:
added fast_bmi2 to verifiedCPUFeatures in ApplicableIRRulesPrinter
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/29809/files
- new: https://git.openjdk.org/jdk/pull/29809/files/64748dc1..fe1f7cb8
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=29809&range=03
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=29809&range=02-03
Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
Patch: https://git.openjdk.org/jdk/pull/29809.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/29809/head:pull/29809
PR: https://git.openjdk.org/jdk/pull/29809
More information about the hotspot-dev
mailing list