RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v13]

Fri Dec 9 20:27:09 UTC 2022

On Tue, 29 Nov 2022 14:29:27 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32.
>> 
>> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that:
>> 
>>     floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1)
>>     ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2)
>> 
>> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result.
>> 
>> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either:
>> 
>>     c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3)
>>     c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4)
>> 
>> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication.
>> 
>> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case.
>> 
>> More tests are added to cover the possible patterns.
>> 
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with seven additional commits since the last revision:
> 
>  - change julong to uint64_t
>  - uint
>  - various fixes
>  - add constexpr
>  - add constexpr
>  - add message to static_assert
>  - missing powerOfTwo.hpp

I like the way this is shaping up.  Thank you for doing the gtests; they provide significant protection against hard-to-track JIT bugs.

I suggest hardening the gtests a litte more, by actually performing the arithmetic in question in the gtest.

For example, test_magic_int_divide could loop through a few thousand numbers performing signed division both ways and verifying that the answers agree.

Suggestion for test numbers:  critical values (min, max, zero, selected multiples of d), plus or minus 0, 1, … N (some small N like 3).  Selected multiples of d should span the full dynamic range, so 0 +/- dK, max - dK, min + dK, maybe for about 20 K in 0,1,2,4,8,16,20,…,56,60.  That's about 300 numbers.  Maybe also sums of pairs of the preceding multiples (there are about 200 such pairs), offset by small N from zero, min, and max, for a total of about 3000 division operations to verify.

The gtest should also try a an arbitrary set of additional d values, without the need to quote a magic constant obtained from a C++ compiler.  If we try a few hundred extra d values (of varying shapes and sizes), the gtest will complete after doing only a few million divisions.

-------------

PR: https://git.openjdk.org/jdk/pull/9947