RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v4]
Quan Anh Mai
qamai at openjdk.org
Fri Sep 30 20:24:31 UTC 2022
On Thu, 29 Sep 2022 19:32:03 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> Thanks for your explanation. I believe this can be achieved by changing the name of the intrinsic to `divideUnsigned0` and implementing `Integer/Long::divideUnsigned` as follow:
>>
>> int divideUnsigned(int dividend, int divisor) {
>> if (divisor > 0) {
>> return divideUnsigned0(dividend, divisor);
>> } else if (divisor < 0) {
>> return compareUnsigned(dividend, divisor) >= 0 ? 1 : 0;
>> } else {
>> throw new ArithmeticException();
>> }
>> }
>>
>> However, I think that this is premature optimisation as:
>>
>> - Integer values are heavily biased toward 0, in unsigned contexts, a negative divisor is a really large divisor, which is highly unlikely. As a result, this results in redundant code for the majority of cases.
>> - This optimisation can be implemented purely in Java, which means that the users of this method can implement it themselves if it is known that the distribution of divisors is different from the general cases.
>>
>> If you don't agree then I would implement the optimisation in Java as presented above. Thanks very much.
>
> The question is: do we complicate the code for very rare case (negative variable devisor with unknown type during compilation) or we slightly improve common case (by remove check for negative devisor at runtime) but introducing regression for that rare case?
>
> May be this discussion is mute if regression is small. Please, get data for negative divisor in your JMH test before we decide what to do.
>
> I assume `divl` instruction produces correct result for negative divisor - we will be using it now with your changes.
The JMH results and the source are below. The overhead of a branch in the common path is less noticeable in the cases of long divisions where the cost of the division is much larger. However, my CPU is based on Skylake architecture with really expensive `divq` instruction, and according to my knowledge, other architectures (Intel from Icelake, AMD Zen, and maybe AArch) may have a `divq` with a cost similar to a `divl`.
Baseline This patch
Benchmark (BUFFER_SIZE) (type) Mode Cnt Score Error Score Error Units
IntegerDivMod.testDivideUnsignedNaive 1024 pos avgt 25 2549.410 ± 45.532 2162.241 ± 38.560 ns/op
IntegerDivMod.testDivideUnsignedNaive 1024 neg avgt 25 1115.032 ± 33.985 2176.231 ± 23.436 ns/op
IntegerDivMod.testDivideUnsignedNaive 1024 mix avgt 25 1814.831 ± 28.754 2185.103 ± 47.785 ns/op
IntegerDivMod.testDivideUnsignedOptimised 1024 pos avgt 25 2703.873 ± 47.586 2345.205 ± 35.003 ns/op
IntegerDivMod.testDivideUnsignedOptimised 1024 neg avgt 25 926.830 ± 16.540 922.096 ± 17.555 ns/op
IntegerDivMod.testDivideUnsignedOptimised 1024 mix avgt 25 1877.774 ± 204.858 1547.250 ± 120.576 ns/op
LongDivMod.testDivideUnsignedNaive 1024 pos avgt 25 7245.008 ± 92.670 7239.190 ± 67.525 ns/op
LongDivMod.testDivideUnsignedNaive 1024 neg avgt 25 987.773 ± 13.338 7176.254 ± 68.509 ns/op
LongDivMod.testDivideUnsignedNaive 1024 mix avgt 25 4537.198 ± 63.926 7189.320 ± 58.801 ns/op
LongDivMod.testDivideUnsignedOptimised 1024 pos avgt 25 7787.518 ± 74.144 7379.395 ± 63.933 ns/op
LongDivMod.testDivideUnsignedOptimised 1024 neg avgt 25 617.233 ± 8.030 615.633 ± 5.545 ns/op
LongDivMod.testDivideUnsignedOptimised 1024 mix avgt 25 4532.729 ± 57.028 4440.418 ± 42.612 ns/op
The benchmark is derived from the existing ones:
public class IntegerDivMod {
RandomGenerator randomGenerator;
@Param({"pos", "neg", "mix"})
String type;
@Param({"1024"})
int BUFFER_SIZE;
int[] dividends, divisors, quotients, remainders;
@Setup
public void setup() {
dividends = new int[BUFFER_SIZE];
divisors = new int[BUFFER_SIZE];
quotients = new int[BUFFER_SIZE];
remainders = new int[BUFFER_SIZE];
var rng = RandomGenerator.getDefault();
for (int i = 0; i < BUFFER_SIZE; i++) {
dividends[i] = rng.nextInt();
int divisor = rng.nextInt();
divisor = divisor == 0 ? 1 : divisor;
if (type.equals("pos")) {
if (divisor == Integer.MIN_VALUE) {
divisor = Integer.MAX_VALUE;
} else if (divisor < 0) {
divisor = -divisor;
}
} else if (type.equals("neg")) {
if (divisor > 0) {
divisor = -divisor;
}
}
divisors[i] = divisor;
}
}
@Benchmark
public void testDivideUnsignedNaive() {
for (int i = 0; i < BUFFER_SIZE; i++) {
quotients[i] = Integer.divideUnsigned(dividends[i], divisors[i]);
}
}
@Benchmark
public void testDivideUnsignedOptimised() {
for (int i = 0; i < BUFFER_SIZE; i++) {
int divisor = divisors[i];
if (divisor < 0) {
quotients[i] = Integer.compareUnsigned(dividends[i], divisor) >= 0 ? 1 : 0;
} else if (divisor > 0) {
quotients[i] = Integer.divideUnsigned(dividends[i], divisor);
} else {
throw new ArithmeticException();
}
}
}
}
-------------
PR: https://git.openjdk.org/jdk/pull/9947
More information about the hotspot-compiler-dev
mailing list