RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64]

Tue Jun 10 05:37:49 UTC 2025

On Tue, 10 Jun 2025 04:00:49 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please?
>
> @merykitty , your comments of following understand will be helpful
> 
> Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values?
> 
> A.  Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT,  and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD.  
> 
> Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations.  Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization,  it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a  TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync.
> 
> During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization,  in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point,  signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this.
> 
> Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible.   The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong  i.e. unsigned bounds, signed bounds or known bits may change.   In practice only known bits (bit level df) and signed bounds may be usable a ...

@jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below:

In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all `TypeInt`s into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 `TypeInt`s, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned.

I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, `Integer::compareUnsigned`, `Integer::divide/remainderUnsigned`, `Integer::toUnsignedString`, and `Integer::toUnsignedLong`. Mul-hi does not have a native operation for both variants and `j.l.Math` provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and `int` as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2957736235