RFR: 8365205: C2: Optimize popcount value computation using knownbits
Jatin Bhateja
jbhateja at openjdk.org
Thu Sep 25 16:01:05 UTC 2025
On Thu, 4 Sep 2025 06:26:36 GMT, Hannes Greule <hgreule at openjdk.org> wrote:
>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>>
>>
>>
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>>
>> Baseline:
>> Benchmark Mode Cnt Score Error Units
>> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 ops/s
>>
>> Withopt:
>> Benchmark Mode Cnt Score Error Units
>> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 ops/s
>>
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> The change looks good, but I wonder:
>
> - if it makes sense to have some kind of IR tests (i.e., it's folded away when unneeded, when the input is a constant, ...)?
> - whether the explanation could be simplified: Assuming a correct implementation of the KnownBits canonicalization, we can argue
> - `_zeroes` has the bits set that are known to be always 0. So `BitsPer<Type> - popCount(x)` gives you an upper limit of how many bits *might* be 1. And `BitsPer<Type> - popCount(_zeroes)` is equivalent to `popCount(~_zeroes)`.
> - `_ones` has the bits set that are known to be always 1. Trivially, `popCount(_ones)` is a valid lower bound.
> - The rest repeats how `adjust_bits_from_unsigned_bounds` works, but that's not specific to the popcount nodes.
Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3334870778
More information about the hotspot-compiler-dev
mailing list