RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v4]

Quan Anh Mai qamai at openjdk.org
Fri Sep 30 20:24:31 UTC 2022


On Thu, 29 Sep 2022 19:32:03 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Thanks for your explanation. I believe this can be achieved by changing the name of the intrinsic to `divideUnsigned0` and implementing `Integer/Long::divideUnsigned` as follow:
>> 
>>     int divideUnsigned(int dividend, int divisor) {
>>         if (divisor > 0) {
>>             return divideUnsigned0(dividend, divisor);
>>         } else if (divisor < 0) {
>>             return compareUnsigned(dividend, divisor) >= 0 ? 1 : 0;
>>         } else {
>>             throw new ArithmeticException();
>>         }
>>     }
>> 
>> However, I think that this is premature optimisation as:
>> 
>> - Integer values are heavily biased toward 0, in unsigned contexts, a negative divisor is a really large divisor, which is highly unlikely. As a result, this results in redundant code for the majority of cases.
>> - This optimisation can be implemented purely in Java, which means that the users of this method can implement it themselves if it is known that the distribution of divisors is different from the general cases.
>> 
>> If you don't agree then I would implement the optimisation in Java as presented above. Thanks very much.
>
> The question is: do we complicate the code for very rare case (negative variable devisor with unknown type during compilation) or we slightly improve common case (by remove check for negative devisor at runtime) but introducing regression for that rare case?
> 
> May be this discussion is mute if regression is small. Please, get data for negative divisor in your JMH test before we decide what to do.
> 
> I assume `divl` instruction produces correct result for negative divisor - we will be using it now with your changes.

The JMH results and the source are below. The overhead of a branch in the common path is less noticeable in the cases of long divisions where the cost of the division is much larger. However, my CPU is based on Skylake architecture with really expensive `divq` instruction, and according to my knowledge, other architectures (Intel from Icelake, AMD Zen, and maybe AArch) may have a `divq` with a cost similar to a `divl`.

                                                                                       Baseline           This patch
    Benchmark                                  (BUFFER_SIZE)  (type)  Mode  Cnt     Score     Error     Score     Error  Units
    IntegerDivMod.testDivideUnsignedNaive               1024     pos  avgt   25  2549.410 ±  45.532  2162.241 ±  38.560  ns/op
    IntegerDivMod.testDivideUnsignedNaive               1024     neg  avgt   25  1115.032 ±  33.985  2176.231 ±  23.436  ns/op
    IntegerDivMod.testDivideUnsignedNaive               1024     mix  avgt   25  1814.831 ±  28.754  2185.103 ±  47.785  ns/op
    IntegerDivMod.testDivideUnsignedOptimised           1024     pos  avgt   25  2703.873 ±  47.586  2345.205 ±  35.003  ns/op
    IntegerDivMod.testDivideUnsignedOptimised           1024     neg  avgt   25   926.830 ±  16.540   922.096 ±  17.555  ns/op
    IntegerDivMod.testDivideUnsignedOptimised           1024     mix  avgt   25  1877.774 ± 204.858  1547.250 ± 120.576  ns/op
    LongDivMod.testDivideUnsignedNaive                  1024     pos  avgt   25  7245.008 ±  92.670  7239.190 ±  67.525  ns/op
    LongDivMod.testDivideUnsignedNaive                  1024     neg  avgt   25   987.773 ±  13.338  7176.254 ±  68.509  ns/op
    LongDivMod.testDivideUnsignedNaive                  1024     mix  avgt   25  4537.198 ±  63.926  7189.320 ±  58.801  ns/op
    LongDivMod.testDivideUnsignedOptimised              1024     pos  avgt   25  7787.518 ±  74.144  7379.395 ±  63.933  ns/op
    LongDivMod.testDivideUnsignedOptimised              1024     neg  avgt   25   617.233 ±   8.030   615.633 ±   5.545  ns/op
    LongDivMod.testDivideUnsignedOptimised              1024     mix  avgt   25  4532.729 ±  57.028  4440.418 ±  42.612  ns/op

The benchmark is derived from the existing ones:

    public class IntegerDivMod {

        RandomGenerator randomGenerator;

        @Param({"pos", "neg", "mix"})
        String type;

        @Param({"1024"})
        int BUFFER_SIZE;
        int[] dividends, divisors, quotients, remainders;

        @Setup
        public void setup() {
            dividends = new int[BUFFER_SIZE];
            divisors = new int[BUFFER_SIZE];
            quotients =  new int[BUFFER_SIZE];
            remainders =  new int[BUFFER_SIZE];
            var rng = RandomGenerator.getDefault();
            for (int i = 0; i < BUFFER_SIZE; i++) {
                dividends[i] = rng.nextInt();
                int divisor = rng.nextInt();
                divisor = divisor == 0 ? 1 : divisor;
                if (type.equals("pos")) {
                    if (divisor == Integer.MIN_VALUE) {
                        divisor = Integer.MAX_VALUE;
                    } else if (divisor < 0) {
                        divisor = -divisor;
                    }
                } else if (type.equals("neg")) {
                    if (divisor > 0) {
                        divisor = -divisor;
                    }
                }
                divisors[i] = divisor;
            }
        }

        @Benchmark
        public void testDivideUnsignedNaive() {
            for (int i = 0; i < BUFFER_SIZE; i++) {
                quotients[i] = Integer.divideUnsigned(dividends[i], divisors[i]);
            }
        }

        @Benchmark
        public void testDivideUnsignedOptimised() {
            for (int i = 0; i < BUFFER_SIZE; i++) {
                int divisor = divisors[i];
                if (divisor < 0) {
                    quotients[i] = Integer.compareUnsigned(dividends[i], divisor) >= 0 ? 1 : 0;
                } else if (divisor > 0) {
                    quotients[i] = Integer.divideUnsigned(dividends[i], divisor);
                } else {
                    throw new ArithmeticException();
                }
            }
        }
    }

-------------

PR: https://git.openjdk.org/jdk/pull/9947


More information about the hotspot-compiler-dev mailing list