RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v39]

Emanuel Peter epeter at openjdk.org
Thu Jan 30 13:54:05 UTC 2025


On Thu, 30 Jan 2025 07:12:38 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot.
>> 
>> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance.
>> 
>> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization.
>> 
>> Please kindly review, thanks a lot.
>> 
>> Testing
>> 
>> - [x] GHA
>> - [x] Linux x64, tier 1-4
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   include

Wow, this stuff is really impressive!

I'm dedicated to doing this bit by bit, as it takes a lot of concentration and time.

I'm leaving you with a list of comments, and hope we can interactively get there 😊 

Thanks for your patience! If I'm too slow, and there are other reviewers that are willing to reviewer faster, then someone else can take it over, but otherwise I will just keep on reviewing at my speed 😅

src/hotspot/share/opto/rangeinference.cpp line 59:

> 57: // This class deals with each interval with both bounds being >= 0 or < 0 in
> 58: // the signed domain. We call it Simple because a canonicalized TypeIntPrototype
> 59: // may contain 1 or 2 SimpleCanonicalResult.

Aha! Now I understand why "Simple"!
It could be nice if you had a constructor where you validate the "simple" constraint, i.e. that we are in either of these cases:
- empty.
- `[lo, hi] >= 0` and `[lo, hi] == [ulo, uhi]`
- `[lo, hi] < 0` and `[ulo, uhi] >= 2^31` and `[lo, hi] = [(jint)ulo, (jint)uhi]`
I did understand this correctly, right?

src/hotspot/share/opto/rangeinference.cpp line 66:

> 64:   bool _present; // whether this is an empty set
> 65:   RangeInt<U> _bounds;
> 66:   KnownBits<U> _bits;

Could these be const? That would additionally guaratee that nobody can come in and violate the "simple" constraint.

src/hotspot/share/opto/rangeinference.cpp line 85:

> 83: // bit (the LSB), a bit comes before (being higher than) another if it is more
> 84: // significant, and a bit comes after (being lower than) another if it is less
> 85: // significant.

I would already explain that `v[0]` is the first bit of `v` here.

src/hotspot/share/opto/rangeinference.cpp line 95:

> 93:   // lo. Similarly, one_violation = 0001, i.e the last bit should be one, but
> 94:   // it is 0 in lo. These make lo not satisfy the bit constraints, which
> 95:   // results in us having to find the smallest value that satisfies bits

Suggestion:

  // results in us having to find the smallest value that satisfies bits.

src/hotspot/share/opto/rangeinference.cpp line 126:

> 124: 
> 125:   2. Formality:
> 126:   Call i the largest value such that (with v[0] being the first bit of v, v[1]

Suggestion:

  Call i the largest index such that (with v[0] being the first bit of v, v[1]

Or maybe `bit index`. That would avoid confusing the `values` (bit patterns) from `index`, i.e. position inside the bit pattern.

src/hotspot/share/opto/rangeinference.cpp line 127:

> 125:   2. Formality:
> 126:   Call i the largest value such that (with v[0] being the first bit of v, v[1]
> 127:   being the second bit of v and so on):

Why not move the `v[0]` etc comment up to the `MSB` / `LSB` explanation about first and last bits?

src/hotspot/share/opto/rangeinference.cpp line 137:

> 135:   - v[x] = lo[x], for 0 <= x < i
> 136:   - v[i] = 1
> 137:   - v[x] = ones[x], for j > i

Suggestion:

  - v[x] = ones[x], for i < x

src/hotspot/share/opto/rangeinference.cpp line 142:

> 140:   satisfies bits.
> 141: 
> 142:   Call r the smallest value not smaller than lo that satisfies bits.

Above you call `r` `res`, right? I would pick one consistently ;)

src/hotspot/share/opto/rangeinference.cpp line 144:

> 142:   Call r the smallest value not smaller than lo that satisfies bits.
> 143: 
> 144:   a. Firstly, we prove that r <= v:

A bit of indentation would make it clearer what belongs to what section below.

src/hotspot/share/opto/rangeinference.cpp line 146:

> 144:   a. Firstly, we prove that r <= v:
> 145: 
> 146:   Trivially, lo < v since lo[i] < v[i] and lo[x] == v[x] for x < i.

Suggestion:

  Trivially, lo < v since:
    lo[x] = v[x], for 0 <= x < i
    lo[i] < v[i]
    bits for x < i have lower significance, and are thus irrelevant.

To make it complete

src/hotspot/share/opto/rangeinference.cpp line 148:

> 146:   Trivially, lo < v since lo[i] < v[i] and lo[x] == v[x] for x < i.
> 147: 
> 148:   As established above, the first (i + 1) bits of v satisfy bits.

When reading through here, I'm getting a sense of overwhealm with all the defined variables and statements we have proven already. It could be very helpful if you give statements a number / name, so that we can refer back to them.

That way, I never need to fully cache the whole proof in my head, I can just combine the steps and check if they are correct.

That is what I have extensively done in proofs of `AlignmentSolver::solve`, which also got a little long...

src/hotspot/share/opto/type.hpp line 593:

> 591:  *
> 592:  * A TypeInt represents a set of jint values. An jint v is an element of a
> 593:  * TypeInt iff:

You may want to say `non-empty`, right?

src/hotspot/share/opto/type.hpp line 597:

> 595:  * v >= _lo && v <= _hi && juint(v) >= _ulo && juint(v) <= _uhi && _bits.is_satisfied_by(v)
> 596:  *
> 597:  * Multiple set of parameters can represent the same set.

Suggestion:

 * Multiple sets of parameters can represent the same set.

src/hotspot/share/opto/type.hpp line 601:

> 599:  *
> 600:  * t1._lo = 2, t1._hi = 7, t1._ulo = 0, t1._uhi = 5, t1._bits._zeros = 0x0, t1._bits._ones = 0x1
> 601:  * t2._lo = 3, t2._hi = 5, t2._ulo = 3, t2._uhi = 5, t2._bits._zeros = 0xFFFFFFF8, t2._bits._ones = 0x1

Suggestion:

 * t1._lo = 2, t1._hi = 7, t1._ulo = 0, t1._uhi = 5, t1._bits._zeros = 0x0000000, t1._bits._ones = 0x1
 * t2._lo = 3, t2._hi = 5, t2._ulo = 3, t2._uhi = 5, t2._bits._zeros = 0xFFFFFFF8, t2._bits._ones = 0x1

Just for alignment, easier to read.

src/hotspot/share/opto/type.hpp line 617:

> 615:  * E.g a TypeInt t with t._lo < 0 will definitely contain negative values. It
> 616:  * also makes it trivial to determine if a TypeInt instance is a subset of
> 617:  * another.

First you talk about "optimal", later about "canonicalized".
This is a bit of a "nit", but there is no cost function we are optimizing for here ;)
Maybe you could say that it is narrowest, and that this is what we would like to canonicalize to.

Then you can define what the canonical form is:

 * t3._lo > t2._lo || t3._hi < t2._hi || t3._ulo > t2._ulo || t3._uhi < t2._uhi ||
 *     (t3._bits._zeros & t2._bis._zeros) != t3._bits._zeros || (t3._bits._ones & t2._bits._ones) != t3._bits._ones

src/hotspot/share/opto/type.hpp line 646:

> 644:  *   is _lo. Which means that _lo <= _ulo in the signed domain, or in a more
> 645:  *   programmatical way, _lo <= jint(_ulo).
> 646:  *   The other inequalities can be proved in a similar manner.

I don't see how you would prove `_lo <= _hi` in a similar way. I suppose if it was not true, then the type would be empty?

Now if you say that `TypeInt` is `non-empty`, then we could prove it by contradiction, but that is a different approach ;)

It also seems that there are duplicates here: `_lo <= _hi` and `_hi >= _lo` for example.

src/hotspot/share/opto/type.hpp line 650:

> 648:  * 3. Either _lo == jint(_ulo) and _hi == jint(_uhi), or all elements of a
> 649:  * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] (note
> 650:  * that these intervals are disjoint in this case).

Aha, these are the simple intervals again. Maybe giving them a proper definition would help?

Then you could refer to that definition from other places!

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2583823880
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935561400
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935563518
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935579338
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935565578
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935582215
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935578619
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935593740
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935587208
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935589189
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935596694
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935605483
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935636922
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935610376
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935611532
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935618744
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935639651
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1935643097


More information about the hotspot-compiler-dev mailing list