RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v62]

Emanuel Peter epeter at openjdk.org
Mon May 5 06:42:08 UTC 2025


On Sat, 3 May 2025 01:23:18 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot.
>> 
>> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance.
>> 
>> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization.
>> 
>> Please kindly review, thanks a lot.
>> 
>> Testing
>> 
>> - [x] GHA
>> - [x] Linux x64, tier 1-4
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   alignment wording

Some more minor comments around `adjust_lo`.

src/hotspot/share/opto/rangeinference.cpp line 366:

> 364:     // We start with all bits where lo[x] == zeros[x] == 0:
> 365:     //           0 1 1 0 0 0 0 1
> 366:     U either = lo | bits._zeros;

I can let this one slide, but the code gives you exactly the negation of what the text describes. The reader might be confused about this, and have to figure out that the bits are all inverted. Not horrible, but not hard to fix either. Up to you.

src/hotspot/share/opto/rangeinference.cpp line 370:

> 368:     // lo[x] == zeros[x] == 0. The last one of these bits must be at index i.
> 369:     //           0 1 1 0 0 0 0 0
> 370:     U tmp = ~either & find_mask;

To me a variable name `tmp` smells a little. I prefer expressive names. Up to you :)

src/hotspot/share/opto/rangeinference.cpp line 379:

> 377:     // In our example, i == 2
> 378:     //           0 0 1 0 0 0 0 0
> 379:     U alignment = tmp & (-tmp);

This line is still magic. Most compiler devs I know do not see this as "standard" math. Could be nice to at least refer to something one could find online on this.

I did sleep over it and had a proof in mind:
What do we know about `-tmp`? It cannot have any bits after the last bit set in `bit`, otherwise those bits would not zero out in `tmp + -tmp`. `-tmp` must have the same last bit set as `tmp`, otherwise it would not cancle out. The addition of those bits create a carry bit, that must be cancled out all the way up. This means that the bits before that last bit must be set either exactly in `tmp` or in `-tmp`, but certainly not in both, otherwise the carry bit would not be cancled away. Hence, only that last bit remains in `tmp & (-tmp)`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2813893517
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072898993
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072900385
PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072904495


More information about the hotspot-compiler-dev mailing list