RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64]

Tue Jun 10 07:48:58 UTC 2025

On Tue, 10 Jun 2025 07:36:22 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> @merykitty , your comments on the following understand will be helpful
>> 
>> Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values?
>> 
>> A.  Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT,  and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD.  
>> 
>> Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations.  Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization,  it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a  TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync.
>> 
>> During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization,  in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point,  signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this.
>> 
>> Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible.   The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong  i.e. unsigned bounds, signed bounds or known bits may change.   In practice only known bits (bit level df) and sign...
>
>> @jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below:
>> 
>> In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all `TypeInt`s into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 `TypeInt`s, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned.
>> 
>> I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, `Integer::compareUnsigned`, `Integer::divide/remainderUnsigned`, `Integer::toUnsignedString`, and `Integer::toUnsignedLong`. Mul-hi does not have a native operation for both variants and `j.l.Math` provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants.
> 
> "As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned"
> 
> Sounds good, I think since now integral types has multiple lattics point associated with it, and all our existing Value transforms are based on signed value ranges, we will also need to extend existing value transforms to make use of KnownBits, for that we will need to extend newly added KnownBits class to support other operations which accepts KnownBits inputs operates over them as per the operation semantics and returns a new [KnownBits](https://github.com/llvm/llvm-project/blob/0c3a7725375ec583147429cc367320f0e8506847/llvm/include/llvm/Support/KnownBits.h#L384) whic...

@jatin-bhateja 
FYI: https://github.com/openjdk/jdk/pull/17508#issuecomment-2847009418

The goal is to make it possible to run gtests with any size integral types (e.g. 4 bit), so that we can efficiently test the corresponding value optimizations. @merykitty Did you already file a JBS issue for that?

So when we refactor, we should try to create methods that take in types, and not nodes. That would allow us to generate types in the gtest, and get a type back. We can do all sorts of enhanced verification that way. For example, we can feed in wider and narrower types as inputs, and then expect that narrower types lead to narrower outputs, and wider types to wider outputs. If constants are fed in, then constants should come out. etc. This would really allow us to exhaustively verify for all sorts of ranges and bit patterns - at least for the smaller types (e.g. 4 bits).

So there is indeed a lot of extension work to do, like @jatin-bhateja said. But we can use that to also refactor the code for testability.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2958020317
PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2958023447