RFR: 8282365: Consolidate and improve division by constant idealizations [v33]

Raffaello Giulietti rgiulietti at openjdk.org
Mon Oct 30 11:47:59 UTC 2023


On Tue, 24 Oct 2023 13:12:20 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32.
>> 
>> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that:
>> 
>>     floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1)
>>     ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2)
>> 
>> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result.
>> 
>> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either:
>> 
>>     c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3)
>>     c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4)
>> 
>> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication.
>> 
>> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case.
>> 
>> More tests are added to cover the possible patterns.
>> 
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 74 commits:
> 
>  - fix proof
>  - Merge branch 'master' into unsignedDiv
>  - fix assert macro, benchmarks
>  - comment styles
>  - disable test with Xcomp
>  - remove verify
>  - fix x86 test
>  - more rigorous control
>  - verify the effectiveness of test
>  - require x64
>  - ... and 64 more: https://git.openjdk.org/jdk/compare/5224e979...529bd0f9

The aim of my proposal is to come up with very simple calculations and proofs that one can understand in a short time.

Yes, the unsigned case is harder not because the theory is limited, but because the multiplication can exceed $2W$-bit values. I would need more time to come up with a scheme for unsigned `int`.

I have no experience with vectorization or low-level code, but I don't see how to perform the computations in $W$-bit arithmetic without severely restricting the range of the dividend $x$. That is, I think that $2 W$-bit arithmetic is intrinsically unavoidable for the full range of the dividend, even if you only care about the upper half of the product $x c$.

During the compilation step, one could find the maximal integer $k$ such that $2^k$ divides $c$, and reduce both $c$ and $m$

int k = Integer.numberOfTrailingZeros((int) c);
c >>>= k;
m -= k;

In the alternative algorithm for the `int` case, after this reduction the inequality $m > W$ still holds, but $c$ might turn out to meet $c < 2^{W-1}$ (a signed `int`). This might help in emitting better code for the multiplication.

And yes, for `long` one would need to implement a 128-bit division with known dividend for the compile-time preparation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1785014147


More information about the hotspot-compiler-dev mailing list