RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v3]

Tue Aug 23 01:39:36 UTC 2022

On Tue, 23 Aug 2022 01:21:07 GMT, Quan Anh Mai <duke at openjdk.org> wrote:

>> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32.
>> 
>> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that:
>> 
>>     floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1)
>>     ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2)
>> 
>> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result.
>> 
>> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either:
>> 
>>     c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3)
>>     c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4)
>> 
>> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication.
>> 
>> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case.
>> 
>> More tests are added to cover the possible patterns.
>> 
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits:
> 
>  - Merge branch 'master' into unsignedDiv
>  - micro
>  - whitespace
>  - whitespace
>  - large divisor
>  - fix build
>  - fix 32-bit
>  - constant folding, mod idealisation, tests
>  - 32 bit
>  - backend
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/38a81913...3052b4ee

The updated benchmark shows improvements in unsigned divisions by a constant, also signed divisions where the magic constant lies outside the limit of an i32 also show improvements in comparison with the baseline:

                                                                                  Before              After
    Benchmark                                    (BUFFER_SIZE)  Mode  Cnt     Score     Error     Score    Error  Units
    IntegerDivMod.testDivideConstant                      1024  avgt   15  1017.464 ±  43.674   899.816 ± 14.511  ns/op
    IntegerDivMod.testDivideUnsignedConstant              1024  avgt   15  2125.401 ±  54.692   762.749 ± 17.280  ns/op
    IntegerDivMod.testRemainderUnsignedConstant           1024  avgt   15  2155.461 ±  60.759   918.103 ± 10.772  ns/op
    LongDivMod.testDivideUnsignedConstant                 1024  avgt   15  6668.891 ±  81.072   949.033 ± 15.485  ns/op
    LongDivMod.testRemainderUnsignedConstant              1024  avgt   15  7275.035 ± 135.520  1355.340 ± 17.853  ns/op

-------------

PR: https://git.openjdk.org/jdk/pull/9947