RFR: 8282365: Consolidate and improve division by constant idealizations [v33]

Mon Oct 30 10:53:00 UTC 2023

On Sun, 29 Oct 2023 16:14:29 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:

>> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 74 commits:
>> 
>>  - fix proof
>>  - Merge branch 'master' into unsignedDiv
>>  - fix assert macro, benchmarks
>>  - comment styles
>>  - disable test with Xcomp
>>  - remove verify
>>  - fix x86 test
>>  - more rigorous control
>>  - verify the effectiveness of test
>>  - require x64
>>  - ... and 64 more: https://git.openjdk.org/jdk/compare/5224e979...529bd0f9
>
> Please take a look [here](https://github.com/rgiulietti/docu/blob/main/div.md) for an alternative.
> 
> It covers the signed `int` case where the divisor is neither a power of 2, nor a negated power of 2.
> 
> The unsigned case, if I can find a similar scheme, will follow later.

@rgiulietti I can see that your approach is derived from the idea that given `m` such that `2^m > d.2^W + 1`, there always exists `c` such that `2^m < c.d < 2^m + 2^m / v` with `v` being the largest positive integer not larger than `2^W` that `v + 1` is divisible by `d`. This will not work for unsigned integer because the upper bound is approximately twice as large, which leads to `m` being `W + ceil(log2(d))` and `c` being bounded by `2^(W + 1)`. A solution to this presented by Robison, A.D. is that we try the previous value of `m`, which is `W + floor(log2(d))`, if there is a value of `c` satisfying `2^m < c.d < 2^m + 2^m / v` then we can use that, otherwise we round down `1/d` instead of rounding up (`c/2^m` is essentially the rounding result of `1/d` up to `m` binary places).

I think that this may lead to faster calculation due to us not having to iterate each value of `m` and test if a corresponding value of `c` exists. However, there are some limitations:

- It may not give the smallest satisfying value of `c` and `m`. This may be important as a small enough value of `c` can limit the calculation magnitude such that it can be done using W-bit integer arithmetic, enable vectorisation at a higher level of parallelism.
- It is not applicable for 64-bit division since we need 128-bit division to find `c`. This leads to the need to calculate by induction, which is essentially what Warren does. I also use induction to calculate the values of `c` and `m`.

Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1784932619