RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7]
Jatin Bhateja
jbhateja at openjdk.org
Thu Aug 7 04:12:19 UTC 2025
On Thu, 7 Aug 2025 02:13:16 GMT, Qizheng Xing <qxing at openjdk.org> wrote:
>> src/hotspot/share/opto/countbitsnode.cpp line 89:
>>
>>> 87: return TypeInt::make(count_leading_zeros_long(~tl->_bits._zeros),
>>> 88: count_leading_zeros_long(tl->_bits._ones),
>>> 89: tl->_widen);
>>
>> Hi @MaxXSoft, Taking the liberty to add some comments related to the proof of the above assumptions, hope you won't mind :-)
>>
>> **Z3 proof for [KnownBits.ONES, ~KnownBits.ZEROS] >= [LB, UB] where >= is a superset relation.**
>>
>>
>> from z3 import *
>> from functools import reduce
>>
>> # 64-bit symbolic unsigned integers
>> UB = BitVec('UB', 64)
>> LB = BitVec('LB', 64)
>>
>> # XOR for detecting differing bits
>> xor_val = UB ^ LB
>>
>> # COUNT_LEADING_ZEROS
>> def count_leading_zeros(x):
>> """Returns number of leading zeros in a 64-bit word."""
>> clauses = []
>> for i in range(64):
>> bit_is_one = LShR(x, 63 - i) & 1 == 1
>> bits_before_zero = And([LShR(x, 63 - j) & 1 == 0 for j in range(i)])
>> clauses.append(If(And(bits_before_zero, bit_is_one), BitVecVal(i, 64), BitVecVal(64, 64)))
>> return reduce(lambda a, b: If(a == 64, b, a), clauses)
>>
>> # Step 1: Compute common prefix length
>> CPL = count_leading_zeros(xor_val)
>>
>> # Step 2: COMMON_PREFIX_MASK = ((1 << CPL) - 1) << (64 - CPL)
>> one_shifted = (BitVecVal(1, 64) << CPL)
>> mask = one_shifted - 1
>> COMMON_PREFIX_MASK = mask << (64 - CPL)
>>
>> # Step 3: COMMON_PREFIX
>> COMMON_PREFIX = UB & COMMON_PREFIX_MASK
>>
>> # Step 4: ZEROS and ONES
>> ZEROS = COMMON_PREFIX_MASK & (~COMMON_PREFIX)
>> ONES = COMMON_PREFIX
>>
>> # Step 5: Prove that [ONES, ~ZEROS] ⊇ [LB, UB]
>> prop = And(ULE(ONES, LB), ULE(UB, ~ZEROS))
>>
>> # Step 6: Try to disprove (i.e., check if any UB, LB violates the above)
>> s = Solver()
>> s.add(Not(prop)) # Look for counterexamples
>>
>> # Check the result
>> if s.check() == sat:
>> m = s.model()
>> ub_val = m.eval(UB).as_long()
>> lb_val = m.eval(LB).as_long()
>> ones_val = m.eval(ONES).as_long()
>> zeros_val = m.eval(ZEROS).as_long()
>> not_zeros_val = (~zeros_val) & 0xFFFFFFFFFFFFFFFF
>>
>> print("❌ Property does NOT hold. Counterexample found:")
>> print(f"LB = {lb_val:#018x}")
>> print(f"UB = {ub_val:#018x}")
>> print(f"ONES = {ones_val:#018x}")
>> print(f"~ZEROS = {not_zeros_val:#018x}")
>> print(f"ONES <= LB? {ones_val <= lb_val}")
>> print(f"UB <= ~ZEROS? {ub_val <= not_zeros_val}")
>> else:
>> print("✅ Property holds: [ONES, ~ZEROS] always covers [LB, UB] (UNS...
>
> Thanks for the addition! Very nice and solid proof.
Hi @MaxXSoft, Constant folding optimizations like these qualify for addition of a new benchmark. Please add one; I see a significant gain with following micro kernel on my Intel 13th Gen Intel(R) Core(TM) i3-1315U laptop.
public static long micro(long param) {
long constrained_param = Math.min(175, Math.max(param, 160));
return Long.numberOfLeadingZeros(constrained_param);
}
PROMPT>#baseline
PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz
[time] 17ms [res] 11200000
PROMPT>#withopt
PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz
[time] 5ms [res] 11200000
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2258958363
More information about the hotspot-compiler-dev
mailing list