RFR: 8365205: C2: Optimize popcount value computation using knownbits

Thu Sep 25 16:01:05 UTC 2025

On Thu, 4 Sep 2025 06:26:36 GMT, Hannes Greule <hgreule at openjdk.org> wrote:

>> This patch optimizes PopCount value transforms using KnownBits information.
>> Following are the results of the micro-benchmark included with the patch
>> 
>> 
>> 
>> System: 13th Gen Intel(R) Core(TM) i3-1315U
>> 
>> Baseline:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s
>> 
>> Withopt:
>> Benchmark                                      Mode  Cnt       Score   Error  Units
>> PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
>> PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> The change looks good, but I wonder:
> 
> - if it makes sense to have some kind of IR tests (i.e., it's folded away when unneeded, when the input is a constant, ...)?
> - whether the explanation could be simplified: Assuming a correct implementation of the KnownBits canonicalization, we can argue
> 	- `_zeroes` has the bits set that are known to be always 0. So `BitsPer<Type> - popCount(x)` gives you an upper limit of how many bits *might* be 1. And `BitsPer<Type> - popCount(_zeroes)` is equivalent to `popCount(~_zeroes)`.
> 	- `_ones` has the bits set that are known to be always 1. Trivially, `popCount(_ones)` is a valid lower bound.
> 	- The rest repeats how `adjust_bits_from_unsigned_bounds` works, but that's not specific to the popcount nodes.

Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3334870778