RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v4]
Vladimir Kozlov
kvn at openjdk.org
Fri Sep 30 23:41:21 UTC 2022
On Fri, 30 Sep 2022 20:20:43 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> The question is: do we complicate the code for very rare case (negative variable devisor with unknown type during compilation) or we slightly improve common case (by remove check for negative devisor at runtime) but introducing regression for that rare case?
>>
>> May be this discussion is mute if regression is small. Please, get data for negative divisor in your JMH test before we decide what to do.
>>
>> I assume `divl` instruction produces correct result for negative divisor - we will be using it now with your changes.
>
> The JMH results and the source are below. The overhead of a branch in the common path is less noticeable in the cases of long divisions where the cost of the division is much larger. However, my CPU is based on Skylake architecture with really expensive `divq` instruction, and according to my knowledge, other architectures (Intel from Icelake, AMD Zen, and maybe AArch) may have a `divq` with a cost similar to a `divl`.
>
> Baseline This patch
> Benchmark (BUFFER_SIZE) (type) Mode Cnt Score Error Score Error Units
> IntegerDivMod.testDivideUnsignedNaive 1024 pos avgt 25 2549.410 ± 45.532 2162.241 ± 38.560 ns/op
> IntegerDivMod.testDivideUnsignedNaive 1024 neg avgt 25 1115.032 ± 33.985 2176.231 ± 23.436 ns/op
> IntegerDivMod.testDivideUnsignedNaive 1024 mix avgt 25 1814.831 ± 28.754 2185.103 ± 47.785 ns/op
> IntegerDivMod.testDivideUnsignedOptimised 1024 pos avgt 25 2703.873 ± 47.586 2345.205 ± 35.003 ns/op
> IntegerDivMod.testDivideUnsignedOptimised 1024 neg avgt 25 926.830 ± 16.540 922.096 ± 17.555 ns/op
> IntegerDivMod.testDivideUnsignedOptimised 1024 mix avgt 25 1877.774 ± 204.858 1547.250 ± 120.576 ns/op
> LongDivMod.testDivideUnsignedNaive 1024 pos avgt 25 7245.008 ± 92.670 7239.190 ± 67.525 ns/op
> LongDivMod.testDivideUnsignedNaive 1024 neg avgt 25 987.773 ± 13.338 7176.254 ± 68.509 ns/op
> LongDivMod.testDivideUnsignedNaive 1024 mix avgt 25 4537.198 ± 63.926 7189.320 ± 58.801 ns/op
> LongDivMod.testDivideUnsignedOptimised 1024 pos avgt 25 7787.518 ± 74.144 7379.395 ± 63.933 ns/op
> LongDivMod.testDivideUnsignedOptimised 1024 neg avgt 25 617.233 ± 8.030 615.633 ± 5.545 ns/op
> LongDivMod.testDivideUnsignedOptimised 1024 mix avgt 25 4532.729 ± 57.028 4440.418 ± 42.612 ns/op
>
> The benchmark is derived from the existing ones:
>
> public class IntegerDivMod {
>
> RandomGenerator randomGenerator;
>
> @Param({"pos", "neg", "mix"})
> String type;
>
> @Param({"1024"})
> int BUFFER_SIZE;
> int[] dividends, divisors, quotients, remainders;
>
> @Setup
> public void setup() {
> dividends = new int[BUFFER_SIZE];
> divisors = new int[BUFFER_SIZE];
> quotients = new int[BUFFER_SIZE];
> remainders = new int[BUFFER_SIZE];
> var rng = RandomGenerator.getDefault();
> for (int i = 0; i < BUFFER_SIZE; i++) {
> dividends[i] = rng.nextInt();
> int divisor = rng.nextInt();
> divisor = divisor == 0 ? 1 : divisor;
> if (type.equals("pos")) {
> if (divisor == Integer.MIN_VALUE) {
> divisor = Integer.MAX_VALUE;
> } else if (divisor < 0) {
> divisor = -divisor;
> }
> } else if (type.equals("neg")) {
> if (divisor > 0) {
> divisor = -divisor;
> }
> }
> divisors[i] = divisor;
> }
> }
>
> @Benchmark
> public void testDivideUnsignedNaive() {
> for (int i = 0; i < BUFFER_SIZE; i++) {
> quotients[i] = Integer.divideUnsigned(dividends[i], divisors[i]);
> }
> }
>
> @Benchmark
> public void testDivideUnsignedOptimised() {
> for (int i = 0; i < BUFFER_SIZE; i++) {
> int divisor = divisors[i];
> if (divisor < 0) {
> quotients[i] = Integer.compareUnsigned(dividends[i], divisor) >= 0 ? 1 : 0;
> } else if (divisor > 0) {
> quotients[i] = Integer.divideUnsigned(dividends[i], divisor);
> } else {
> throw new ArithmeticException();
> }
> }
> }
> }
Even ignoring Long case the difference is almost 2x for Integer negative case vs ~15% improvement for positive. It is not good tradeoff.
RFE is filed for constants. I don't think you need to touch variable case.
-------------
PR: https://git.openjdk.org/jdk/pull/9947
More information about the hotspot-compiler-dev
mailing list