RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64]

Tue Jun 10 04:03:49 UTC 2025

On Mon, 9 Jun 2025 17:25:47 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> This looks fine to me but to be on safe side lets push it into JDK 26 when it is forked.
>> And I don't see link in RFE to recent testing of this. It needs to be tested in all tiers including tier10, xcomp and stress.
>
> @vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please?

@merykitty , your comments of following understand will be helpful

Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values?

A.  Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT,  and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD.  

Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations.  Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization,  it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a  TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync.

During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization,  in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point,  signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this.

Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible.   The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong  i.e. unsigned bounds, signed bounds or known bits may change.   In practice only known bits (bit level df) and signed bounds may be usable a flag with signify a bound is unsigned may suffice, associating 3 lattice points with TypeInt is on account of flexibility, it may facilitate injecting opaque IR with unsigned bounds in SoN by optimization passes and then let the type canonicalization and iterative data flow analysis do the magic.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2957609331