RFR: 8373613: PEXT/PDEP intrinsics cause performance regression on AMD pre-Zen 3 CPUs
Jasmine Karthikeyan
jkarthikeyan at openjdk.org
Wed Feb 25 11:02:25 UTC 2026
On Tue, 24 Feb 2026 13:08:18 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> The `Integer/Long.compress()` and `Integer/Long.expand()` intrinsics (added in JDK 19 by JDK-8283893) unconditionally use BMI2 PEXT/PDEP instructions on all BMI2-capable x86 CPUs. However, AMD processors before Zen 3 (Family < 0x19) and Zhaoxin CPUs implement PEXT/PDEP via microcode with ~18-cycle latency, making the intrinsified path 1.4-2.4x slower than the Java software fallback.
>>
>> This patch introduces a `CPU_FAST_BMI2` feature flag that distinguishes CPUs with native hardware PEXT/PDEP support (Intel Haswell+, AMD Zen 3+) from those with slow microcoded implementations. The `Op_CompressBits`/`Op_ExpandBits` match rules in `x86.ad` are gated on this new flag instead of the general `supports_bmi2()`, so CPUs without fast hardware support fall back to the Java implementation. C2 IR tests are updated to use the new `fast_bmi2` CPU feature predicate.
>
> @Auties00 Is there a benchmark integrated that shows the performance impact of the regression and this patch?
@eme64 It looks like there are compress tests in the `java/lang/Integers.java`/`Longs.java` benchmarks, running them on a Zen 2 device gives me these results:
Baseline Patch
Benchmark (size) Mode Cnt Score Error Units Score Error Units Improvement
Integers.compress 500 avgt 15 7.499 ± 0.011 us/op 0.755 ± 0.001 us/op (9.93x)
Longs.compress 500 avgt 15 9.011 ± 0.014 us/op 0.851 ± 0.001 us/op (10.58x)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29809#issuecomment-3956664287
More information about the hotspot-dev
mailing list