RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v7]

Jatin Bhateja jbhateja at openjdk.org
Wed Aug 6 10:46:07 UTC 2025


On Wed, 6 Aug 2025 08:24:45 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases.
>> 
>> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch:
>> 
>> 
>> public static int numberOfNibbles(int i) {
>>   int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i);
>>   return Math.max((mag + 3) / 4, 1);
>> }
>> 
>> 
>> Testing: tier1, IR test
>
> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision:
> 
>  - Merge branch 'master' into enhance-clz-type
>  - Add checks for results of all test methods
>  - Replace `isa_*` with `is_*` and add checks for `Type::BOTTOM`
>  - Merge branch 'master' into enhance-clz-type
>  - Move `TestCountBitsRange` to `compiler.c2.gvn`
>  - Fix null checks
>  - Narrow type bound
>  - Use `BitsPerX` constant instead of `sizeof`
>  - Make the type of count leading/trailing zero nodes more precise

src/hotspot/share/opto/countbitsnode.cpp line 89:

> 87:   return TypeInt::make(count_leading_zeros_long(~tl->_bits._zeros),
> 88:                        count_leading_zeros_long(tl->_bits._ones),
> 89:                        tl->_widen);

Hi @MaxXSoft, Taking the librty to add some comments related to the proof of the above assumptions, hope you won't mind :-)

**Z3 proof for [KnownBits.ONES, ~KnownBits.ZEROS] >= [LB, UB] where >= is a superset relation.**


from z3 import *
from functools import reduce

# 64-bit symbolic unsigned integers
UB = BitVec('UB', 64)
LB = BitVec('LB', 64)

# XOR for detecting differing bits
xor_val = UB ^ LB

# COUNT_LEADING_ZEROS
def count_leading_zeros(x):
    """Returns number of leading zeros in a 64-bit word."""
    clauses = []
    for i in range(64):
        bit_is_one = LShR(x, 63 - i) & 1 == 1
        bits_before_zero = And([LShR(x, 63 - j) & 1 == 0 for j in range(i)])
        clauses.append(If(And(bits_before_zero, bit_is_one), BitVecVal(i, 64), BitVecVal(64, 64)))
    return reduce(lambda a, b: If(a == 64, b, a), clauses)

# Step 1: Compute common prefix length
CPL = count_leading_zeros(xor_val)

# Step 2: COMMON_PREFIX_MASK = ((1 << CPL) - 1) << (64 - CPL)
one_shifted = (BitVecVal(1, 64) << CPL)
mask = one_shifted - 1
COMMON_PREFIX_MASK = mask << (64 - CPL)

# Step 3: COMMON_PREFIX
COMMON_PREFIX = UB & COMMON_PREFIX_MASK

# Step 4: ZEROS and ONES
ZEROS = COMMON_PREFIX_MASK & (~COMMON_PREFIX)
ONES = COMMON_PREFIX

# Step 5: Prove that [ONES, ~ZEROS] ⊇ [LB, UB]
prop = And(ULE(ONES, LB), ULE(UB, ~ZEROS))

# Step 6: Try to disprove (i.e., check if any UB, LB violates the above)
s = Solver()
s.add(Not(prop))  # Look for counterexamples

# Check the result
if s.check() == sat:
    m = s.model()
    ub_val = m.eval(UB).as_long()
    lb_val = m.eval(LB).as_long()
    ones_val = m.eval(ONES).as_long()
    zeros_val = m.eval(ZEROS).as_long()
    not_zeros_val = (~zeros_val) & 0xFFFFFFFFFFFFFFFF

    print("❌ Property does NOT hold. Counterexample found:")
    print(f"LB     = {lb_val:#018x}")
    print(f"UB     = {ub_val:#018x}")
    print(f"ONES   = {ones_val:#018x}")
    print(f"~ZEROS = {not_zeros_val:#018x}")
    print(f"ONES <= LB? {ones_val <= lb_val}")
    print(f"UB <= ~ZEROS? {ub_val <= not_zeros_val}")
else:
    print("✅ Property holds: [ONES, ~ZEROS] always covers [LB, UB] (UNSAT)")		 



<img width="975" height="88" alt="image" src="https://github.com/user-attachments/assets/244a373d-99ac-45d7-ab57-24e78b1ea216" />

Manually worked out Example 1:-
        UB = 0xFFFF00FF
        LB = 0xFFFF0000
        COMMON_PREFIX = 0xFFFF0000
        ONES  = 0xFFFF0000
        ZEROS = 0x00000000
        ---------------------------
        MAX = ~ZEROS
            = 0xFFFFFFFF
        MIN = ONES
            = 0xFFFF0000
 
**Thus, it's evident that MAX >= UB AND MIN <= LB**

Manually worked out Example 2:-
         UB = 0xFF0F00FF
         LB = 0xFF0F0000
         COMMON_PREFIX = 0xFF0F0000
         ONES  = 0xFF0F0000
         ZEROS = 0x00F00000

Since logical OR b/w ONES and ZEROS forms the COMMON_PREFIX MASK i.e. both ZEROS and ONES are derived from COMMON_PREFIX therefore COMMON_PREFIX_MASK & ~ZERO == COMMON_PREFIX_MASK & ONES, apart from COMMON_PREFIX there are other lower order bits which are not included in COMMON_PREFIX, when we flip the ZEROS then we get a VALUE which equals ONES + we set all other lower order bits, thus this value is guaranteed to be greater than UB since it also set other non-set bits of UB.

Thus, UB = COMMON_PREFIX + OTHER_SET_LOWER_ORDER_BITS_IN_UB
      ~ZEROS = COMMON_PREGIX + REMAINING_BITS_SET_TO_ONE.

While ONES is just the common prefix, which is guaranteed to be less than UB since we don’t account for set bits of UB, which are not part of the common prefix. 

Thus, the following inequality holds good.

**~ZEROS >= UB >= LB >= ONES**

**Proof for the KnownBits application to CTZ / CLZ**

-	In both these cases, we are only concerned about the number of ZEROs which are either present at the start or towards the end of the integral bits sequence. 
-	KnownBits.ZEROS and KnownBits.ONES captures the known set bits in value range, i.e., if a bit is set on ONES it means it's guaranteed to be one in all the values in the value range; similarly, a set bit in ZEROS signifies that the corresponding bit remains unset in all the values in the range. All unset bits in ONES/ZEROS signify not known at compile time.

Since the flipped value of ZEROS gives us the maximum value in the range, and given that number of leading zeros of maximum value will never be less than the number of leading zeros of minimum value of the range hence, value range of CLZ varies b/w CLZ(~ZEROS) and CLZ(ONES) as its lower and upper bounds.

While counting leading zeros is relatively straightforward to prove given that each integral value is the sum of 2^position of set bits and given that TypeInt/TypeLong comply with the invariant (LB <= UB)
Hence highest set bits position of UB can never be less than the LB. 

For CTZ, consider the following example
    UB = 0xFF800000
    LB = 0xFF8FF0F0
    COMMON_PREFIX = 0xFF800000
    ONES  = 0xFF800000
    ZEROS = 0x00700000

Again, both ONES and ZEROS bits are extracted from the common prefix of the delimiting bounds of the value range.

   MAX = ~ZEROS = 0xFF8FFFFF 
   MIN = ONES   = 0xFF800000 
 
Thus, the number of trailing zeros will be [lb:0, ub:20], given that the lower order bits of knowbits which are not part of the common prefix will always be set to zero, hence CTZ(~ZERO) will always be 0. This is also in line with the fact that the number of leading/trailing zeros will never be a non-negative value. Therefore, delimiting lower and upper bounds of CTZ are CTZ(~ZEROS) and CTZ(ONES).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2256750507


More information about the hotspot-compiler-dev mailing list