RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v4]

Fri Sep 30 23:41:21 UTC 2022

On Fri, 30 Sep 2022 20:20:43 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> The question is: do we complicate the code for very rare case (negative variable devisor with unknown type during compilation) or we slightly improve common case (by remove check for negative devisor at runtime) but introducing regression for that rare case?
>> 
>> May be this discussion is mute if regression is small. Please, get data for negative divisor in your JMH test before we decide what to do.
>> 
>> I assume `divl` instruction produces correct result for negative divisor - we will be using it now with your changes.
>
> The JMH results and the source are below. The overhead of a branch in the common path is less noticeable in the cases of long divisions where the cost of the division is much larger. However, my CPU is based on Skylake architecture with really expensive `divq` instruction, and according to my knowledge, other architectures (Intel from Icelake, AMD Zen, and maybe AArch) may have a `divq` with a cost similar to a `divl`.
> 
>                                                                                        Baseline           This patch
>     Benchmark                                  (BUFFER_SIZE)  (type)  Mode  Cnt     Score     Error     Score     Error  Units
>     IntegerDivMod.testDivideUnsignedNaive               1024     pos  avgt   25  2549.410 ±  45.532  2162.241 ±  38.560  ns/op
>     IntegerDivMod.testDivideUnsignedNaive               1024     neg  avgt   25  1115.032 ±  33.985  2176.231 ±  23.436  ns/op
>     IntegerDivMod.testDivideUnsignedNaive               1024     mix  avgt   25  1814.831 ±  28.754  2185.103 ±  47.785  ns/op
>     IntegerDivMod.testDivideUnsignedOptimised           1024     pos  avgt   25  2703.873 ±  47.586  2345.205 ±  35.003  ns/op
>     IntegerDivMod.testDivideUnsignedOptimised           1024     neg  avgt   25   926.830 ±  16.540   922.096 ±  17.555  ns/op
>     IntegerDivMod.testDivideUnsignedOptimised           1024     mix  avgt   25  1877.774 ± 204.858  1547.250 ± 120.576  ns/op
>     LongDivMod.testDivideUnsignedNaive                  1024     pos  avgt   25  7245.008 ±  92.670  7239.190 ±  67.525  ns/op
>     LongDivMod.testDivideUnsignedNaive                  1024     neg  avgt   25   987.773 ±  13.338  7176.254 ±  68.509  ns/op
>     LongDivMod.testDivideUnsignedNaive                  1024     mix  avgt   25  4537.198 ±  63.926  7189.320 ±  58.801  ns/op
>     LongDivMod.testDivideUnsignedOptimised              1024     pos  avgt   25  7787.518 ±  74.144  7379.395 ±  63.933  ns/op
>     LongDivMod.testDivideUnsignedOptimised              1024     neg  avgt   25   617.233 ±   8.030   615.633 ±   5.545  ns/op
>     LongDivMod.testDivideUnsignedOptimised              1024     mix  avgt   25  4532.729 ±  57.028  4440.418 ±  42.612  ns/op
> 
> The benchmark is derived from the existing ones:
> 
>     public class IntegerDivMod {
> 
>         RandomGenerator randomGenerator;
> 
>         @Param({"pos", "neg", "mix"})
>         String type;
> 
>         @Param({"1024"})
>         int BUFFER_SIZE;
>         int[] dividends, divisors, quotients, remainders;
> 
>         @Setup
>         public void setup() {
>             dividends = new int[BUFFER_SIZE];
>             divisors = new int[BUFFER_SIZE];
>             quotients =  new int[BUFFER_SIZE];
>             remainders =  new int[BUFFER_SIZE];
>             var rng = RandomGenerator.getDefault();
>             for (int i = 0; i < BUFFER_SIZE; i++) {
>                 dividends[i] = rng.nextInt();
>                 int divisor = rng.nextInt();
>                 divisor = divisor == 0 ? 1 : divisor;
>                 if (type.equals("pos")) {
>                     if (divisor == Integer.MIN_VALUE) {
>                         divisor = Integer.MAX_VALUE;
>                     } else if (divisor < 0) {
>                         divisor = -divisor;
>                     }
>                 } else if (type.equals("neg")) {
>                     if (divisor > 0) {
>                         divisor = -divisor;
>                     }
>                 }
>                 divisors[i] = divisor;
>             }
>         }
> 
>         @Benchmark
>         public void testDivideUnsignedNaive() {
>             for (int i = 0; i < BUFFER_SIZE; i++) {
>                 quotients[i] = Integer.divideUnsigned(dividends[i], divisors[i]);
>             }
>         }
> 
>         @Benchmark
>         public void testDivideUnsignedOptimised() {
>             for (int i = 0; i < BUFFER_SIZE; i++) {
>                 int divisor = divisors[i];
>                 if (divisor < 0) {
>                     quotients[i] = Integer.compareUnsigned(dividends[i], divisor) >= 0 ? 1 : 0;
>                 } else if (divisor > 0) {
>                     quotients[i] = Integer.divideUnsigned(dividends[i], divisor);
>                 } else {
>                     throw new ArithmeticException();
>                 }
>             }
>         }
>     }

Even ignoring Long case the difference is almost 2x for Integer negative case vs ~15% improvement for positive. It is not good tradeoff.

RFE is filed for constants. I don't think you need to touch variable case.

-------------

PR: https://git.openjdk.org/jdk/pull/9947