RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7]

Thu Aug 7 04:12:19 UTC 2025

On Thu, 7 Aug 2025 02:13:16 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> src/hotspot/share/opto/countbitsnode.cpp line 89:
>> 
>>> 87:   return TypeInt::make(count_leading_zeros_long(~tl->_bits._zeros),
>>> 88:                        count_leading_zeros_long(tl->_bits._ones),
>>> 89:                        tl->_widen);
>> 
>> Hi @MaxXSoft, Taking the liberty to add some comments related to the proof of the above assumptions, hope you won't mind :-)
>> 
>> **Z3 proof for [KnownBits.ONES, ~KnownBits.ZEROS] >= [LB, UB] where >= is a superset relation.**
>> 
>> 
>> from z3 import *
>> from functools import reduce
>> 
>> # 64-bit symbolic unsigned integers
>> UB = BitVec('UB', 64)
>> LB = BitVec('LB', 64)
>> 
>> # XOR for detecting differing bits
>> xor_val = UB ^ LB
>> 
>> # COUNT_LEADING_ZEROS
>> def count_leading_zeros(x):
>>     """Returns number of leading zeros in a 64-bit word."""
>>     clauses = []
>>     for i in range(64):
>>         bit_is_one = LShR(x, 63 - i) & 1 == 1
>>         bits_before_zero = And([LShR(x, 63 - j) & 1 == 0 for j in range(i)])
>>         clauses.append(If(And(bits_before_zero, bit_is_one), BitVecVal(i, 64), BitVecVal(64, 64)))
>>     return reduce(lambda a, b: If(a == 64, b, a), clauses)
>> 
>> # Step 1: Compute common prefix length
>> CPL = count_leading_zeros(xor_val)
>> 
>> # Step 2: COMMON_PREFIX_MASK = ((1 << CPL) - 1) << (64 - CPL)
>> one_shifted = (BitVecVal(1, 64) << CPL)
>> mask = one_shifted - 1
>> COMMON_PREFIX_MASK = mask << (64 - CPL)
>> 
>> # Step 3: COMMON_PREFIX
>> COMMON_PREFIX = UB & COMMON_PREFIX_MASK
>> 
>> # Step 4: ZEROS and ONES
>> ZEROS = COMMON_PREFIX_MASK & (~COMMON_PREFIX)
>> ONES = COMMON_PREFIX
>> 
>> # Step 5: Prove that [ONES, ~ZEROS] ⊇ [LB, UB]
>> prop = And(ULE(ONES, LB), ULE(UB, ~ZEROS))
>> 
>> # Step 6: Try to disprove (i.e., check if any UB, LB violates the above)
>> s = Solver()
>> s.add(Not(prop))  # Look for counterexamples
>> 
>> # Check the result
>> if s.check() == sat:
>>     m = s.model()
>>     ub_val = m.eval(UB).as_long()
>>     lb_val = m.eval(LB).as_long()
>>     ones_val = m.eval(ONES).as_long()
>>     zeros_val = m.eval(ZEROS).as_long()
>>     not_zeros_val = (~zeros_val) & 0xFFFFFFFFFFFFFFFF
>> 
>>     print("❌ Property does NOT hold. Counterexample found:")
>>     print(f"LB     = {lb_val:#018x}")
>>     print(f"UB     = {ub_val:#018x}")
>>     print(f"ONES   = {ones_val:#018x}")
>>     print(f"~ZEROS = {not_zeros_val:#018x}")
>>     print(f"ONES <= LB? {ones_val <= lb_val}")
>>     print(f"UB <= ~ZEROS? {ub_val <= not_zeros_val}")
>> else:
>>     print("✅ Property holds: [ONES, ~ZEROS] always covers [LB, UB] (UNS...
>
> Thanks for the addition! Very nice and solid proof.

Hi @MaxXSoft, Constant folding optimizations like these qualify for addition of a new benchmark. Please add one; I see a significant gain with following micro kernel on my Intel 13th Gen Intel(R) Core(TM) i3-1315U laptop.

    public static long micro(long param) {
        long constrained_param = Math.min(175, Math.max(param, 160));
        return Long.numberOfLeadingZeros(constrained_param);
    }

PROMPT>#baseline
PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz
[time] 17ms [res] 11200000

PROMPT>#withopt
PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz
[time] 5ms [res] 11200000

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2258958363