RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5]
John R Rose
jrose at openjdk.org
Sat Jan 20 02:23:28 UTC 2024
On Wed, 10 Jan 2024 18:08:39 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Hi,
>>
>> This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below.
>>
>> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance.
>>
>> Please kindly review, thanks very much.
>>
>> Testing
>>
>> - [x] GHA
>> - [x] Linux x64, tier 1-4
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits:
>
> - Merge branch 'master' into improvevalue
> - Merge branch 'master' into improvevalue
> - improve add/sub implementation
> - Merge branch 'master' into improvevalue
> - typo
> - whitespace
> - fix tests for x86_32
> - fix widen of ConvI2L
> - problem lists
> - format
> - ... and 17 more: https://git.openjdk.org/jdk/compare/f0169341...843ad076
As you might expect (from an examination of [JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436)) I’m very excited to see this work going forward.
A very long time ago I worked out tight, cheap estimates for bitwise effects of arithmetic operations, and arithmetic effects for bitwise operators. (BTW, did you know that ‘x^y = x+y-2*(x&y)’? That’s the sort of identity we are working with here.) The overall point is that, if you know both arithmetic and bitwise ranges for operands, you can infer tight arithmetic and bitwise ranges for all common operators. I see you’ve gone farther than I did, adding unsigned ranges as well, and more ops (popc). Excellent, excellent.
I have a request, though, and it is the same request as with your work on division strength reduction. The inference rules need to be factored out and separately g-tested.
I see equations like this in the middle of the C2 code in this PR:
U known = (min ^~ max) & (i1->_zeros | i1->_ones) & (i2->_zeros | i2->_ones);
It is hard to demonstrate that this is correct. It cannot be unit-tested separately when it appears like this in the midst of IR transforms. We have had many subtle bugs before from “math inclusions” like this in the middle of the IR optimizer.
I believe that formula is PROBABLY correct, because I deeply respect your mathematical ability, but that is not good enough. If someone accidentally changes it (or needs to change it for some maintenance reason), we might not have a math savant handy to re-verify it. We need an executable test to give us confidence.
In addition, I personally dislike doing pencil-and-paper proofs of code snippets I have to extract manually from wherever it is to do their work. I would much prefer to see the relevant algorithms factored out in their own source files (if they are subtle, as these are). I like to reason from clean formulas, not from formulas that I had to tear away from their duty stations in the optimizer.
I think we need a separate file (header and source) for this set of algorithms. Something like rangeInference.[ch]pp. The API points would take one or two inputs, each represented as a full bundle of range data (hi/lo/uh/ul/zs/os). The would also take an operator name (not one API point per operator, please, but maybe one for unaries and one for binaries). And pass 64-bit ints plus length indicators, rather than doing it more than once for different primitive sizes. For naming the operators I suggest either opto node names (Mul not MulL), or (a non-opto-centric choice) the operator names from the Panama Vector API (JEP 460 at last sighting).
The API point would symbolically execute the named op over the given symbolic (ranged) arguments, returning a new set of ranged (by C++ references I assume, or maybe a “little struct” for range tuples).
The switch-over-op at the heart of it would be carefully written for clarity, to prove (once and for all) that we know what we are talking about. The gtest would work like my BitTypes demo program, exercising all of the operations through a systematic set of concrete input values and (then) ranges containing those values. We would make sure that (a) the concrete result never “escapes” the predicted range (for any of a series of representative containing ranges). And also, when possible, (b) that the predicted range is “tight” in some good way, probably that the concrete result meets each of its inclusive bounds (for each range data point), for at least one set of concrete inputs.
I think this kind of testing work is necessary to ensure that our system can be understood and maintained by more than a few people on the planet. And (although I came up with some of the formulas) I’d personally prefer it that way as well. It would be nice to know that if someone bumps an operator or drops a parenthesis, a unit test will catch the bug, rather than either a PhD student re-analyzing the code, or (more likely) a bug that shows up long after system regression tests.
The benefits of these optimizations are, basically, that we can push around many more inferences about bitwise operations and unsigned comparisons, beyond what the original C2 design envisioned for inferences about (signed) arithmetic operations.
This in turn will give us CC_NE (a nice thing, finally). Moreover, it will help us infer the low-order bits of loop indexes, something useful for proving alignment (in Panama at least). If applied to the lanes of vector types, it will give us a much more robust set of information about what’s going on in vectorized code.
So I’m all for it! But, with unit tests, please…
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1901609719
More information about the hotspot-compiler-dev
mailing list