RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16]
Emanuel Peter
epeter at openjdk.org
Mon Jul 3 13:29:30 UTC 2023
On Sat, 10 Jun 2023 01:30:09 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32.
>>
>> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that:
>>
>> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1)
>> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2)
>>
>> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result.
>>
>> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either:
>>
>> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3)
>> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4)
>>
>> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication.
>>
>> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case.
>>
>> More tests are added to cover the possible patterns.
>>
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits:
>
> - missing java_negate
> - Merge branch 'master' into unsignedDiv
> - whitespace
> - move asserts to use sites
> - windows complaints
> - compiler complaints
> - undefined internal linkage
> - add tests, special casing large shift
> - draft
> - Merge branch 'master' into unsignedDiv
> - ... and 40 more: https://git.openjdk.org/jdk/compare/5b147eb5...eb1f5dd9
@merykitty This is a great RFE, let's keep working on it!
But I'm concerned about the tests. Yes, the gtests do maybe ensure that the magic constant computation is ok. But all the other code in all the `Ideal` melthods is not tested by your gtests. For that you can only really generate java tests. I think it would be really nice to see end to end tests here. Also the `Ideal` methods do different things for different ranges of the arguments. This really requires more testing. Maybe you already have some of those, but it would be nice to hear from you if they are covering all cases.
Ah. I did not review the magic constant computation code itself. It looks reasonable, but reviewing that for correctness is almost impossible anyway. We really have to rely on the tests for that.
Thanks again for the work, and let's make this happen!
Emanuel
src/hotspot/share/opto/divnode.cpp line 39:
> 37: #include "utilities/powerOfTwo.hpp"
> 38:
> 39: // Portions of code courtesy of Clifford Click
Not sure if this line should be removed?
src/hotspot/share/opto/divnode.cpp line 123:
> 121: // magic_const should be a u32
> 122: assert(magic_const >= 0 && magic_const <= jlong(max_juint), "sanity");
> 123: assert(shift_const >= 0 && shift_const < 32, "sanity");
Maybe just move this inside the `magic_int_divide_constants`? You already have similar asserts there.
src/hotspot/share/opto/divnode.cpp line 141:
> 139: // this has the effect of negating the quotient.
> 140: if (!d_pos) {
> 141: Node* temp = addend0; addend0 = addend1; addend1 = temp;
`swap(addend0, addend1)` - would that work? Would be easier to read.
src/hotspot/share/opto/divnode.cpp line 152:
> 150: }
> 151:
> 152: //--------------------------transform_int_udivide------------------------------
You could remove these lines `//------xxx------`, they are not required by the style guide any more
src/hotspot/share/opto/divnode.cpp line 154:
> 152: //--------------------------transform_int_udivide------------------------------
> 153: // Convert an unsigned division by constant divisor into an alternate Ideal graph.
> 154: // Return NULL if no transformation occurs.
There may have been the `NULL` -> `nullptr` change since you wrote this code. Please update your patch accordingly.
src/hotspot/share/opto/divnode.cpp line 154:
> 152: //--------------------------transform_int_udivide------------------------------
> 153: // Convert an unsigned division by constant divisor into an alternate Ideal graph.
> 154: // Return NULL if no transformation occurs.
null
src/hotspot/share/opto/divnode.cpp line 160:
> 158:
> 159: // Result
> 160: Node* q = NULL;
nullptr
src/hotspot/share/opto/divnode.cpp line 160:
> 158:
> 159: // Result
> 160: Node* q = NULL;
nullptr
src/hotspot/share/opto/divnode.cpp line 178:
> 176: magic_int_unsigned_divide_constants_down(divisor, magic_const, shift_const);
> 177: assert(magic_const >= 0 && magic_const <= 0x1FFFFFFFFL, "sanity");
> 178: assert(shift_const >= 0 && shift_const < 33, "sanity");
Add these inside the function, that way also your gtest run these asserts.
src/hotspot/share/opto/divnode.cpp line 184:
> 182: julong max_dividend;
> 183: if (dividend_type->_hi < 0 || dividend_type->_lo >= 0) {
> 184: max_dividend = julong(juint(dividend_type->_hi));
This is quite dense. A bit more explanation would help.
Ah, you are checking if conversion from `i32` to `u32` of the dividend leads to an overflow?
Maybe also add an assert like this (this should hold, right?):
`assert( julong(juint(dividend_type->_hi)) >= julong(juint(dividend_type->_lo)), "sanity")`
This code also tells me that we probably want to have some tests with different dividends (ie dividends that have different kinds of ranges). Ranges like that could be acheived if you use the tripcount `i` as the `dividend`. The tripcount has a range defined by init and limit of the counted loop.
src/hotspot/share/opto/divnode.cpp line 188:
> 186: max_dividend = max_juint;
> 187: }
> 188: if (julong(magic_const) <= max_julong / max_dividend) {
Could `max_dividend` ever be `zero`? I guess only if the dividend was exactly `zero`, in which case we should probably not end up here, or is that somehow possible?
src/hotspot/share/opto/divnode.cpp line 191:
> 189: // No overflow here, just do the transformation
> 190: if (shift_const == 32) {
> 191: q = phase->intcon(0);
Would it not be nicer to handle this special case directly in the `URShiftLNode`? Just replace it during `Value` with zero, if the shift constant is too large.
src/hotspot/share/opto/divnode.cpp line 208:
> 206: magic_int_unsigned_divide_constants_up(divisor, magic_const, shift_const);
> 207: assert(magic_const >= 0 && magic_const <= jlong(max_juint), "sanity");
> 208: assert(shift_const >= 0 && shift_const < 32, "sanity");
Again, add asserts inside the method.
src/hotspot/share/opto/divnode.cpp line 406:
> 404: // this has the effect of negating the quotient.
> 405: if (!d_pos) {
> 406: Node *temp = addend0; addend0 = addend1; addend1 = temp;
If we are already here, might as well replace with `swap`
src/hotspot/share/opto/divnode.cpp line 419:
> 417: //--------------------------transform_long_udivide-----------------------------
> 418: // Convert an unsigned division by constant divisor into an alternate Ideal graph.
> 419: // Return NULL if no transformation occurs.
null
src/hotspot/share/opto/divnode.cpp line 424:
> 422:
> 423: // Result
> 424: Node* q = NULL;
nullptr
src/hotspot/share/opto/divnode.cpp line 441:
> 439: jlong magic_const;
> 440: jint shift_const;
> 441: bool magic_const_ovf;
`does_magic_const_overflow` Would that work too?
src/hotspot/share/opto/divnode.cpp line 443:
> 441: bool magic_const_ovf;
> 442: magic_long_unsigned_divide_constants(divisor, magic_const, shift_const, magic_const_ovf);
> 443: assert(shift_const >= 0 && shift_const < 65, "sanity");
Move asserts inside function.
src/hotspot/share/opto/divnode.cpp line 448:
> 446: Node* mul_hi = phase->transform(new UMulHiLNode(dividend, magic));
> 447:
> 448: if (!magic_const_ovf) {
Don't understand this case, what happens here?
src/hotspot/share/opto/divnode.cpp line 462:
> 460: }
> 461:
> 462: // Just do the minimum for now
Minimum of what? Not sure what you mean
src/hotspot/share/opto/divnode.cpp line 469:
> 467: mul_hi = phase->transform(new AddLNode(mul_hi, dividend));
> 468: q = new URShiftLNode(mul_hi, phase->intcon(shift_const));
> 469: }
I need more comments here.
src/hotspot/share/opto/divnode.cpp line 906:
> 904: }
> 905:
> 906: // TODO: Improve Value inference of both signed and unsigned division
Did you miss a `TODO` here?
src/hotspot/share/opto/divnode.cpp line 915:
> 913:
> 914: //------------------------------Idealize---------------------------------------
> 915: Node *UDivLNode::Ideal(PhaseGVN *phase, bool can_reshape) {
Exactly same comments apply as for `UDivINode::Ideal`
src/hotspot/share/opto/divnode.cpp line 921:
> 919: Node* UDivINode::Ideal(PhaseGVN* phase, bool can_reshape) {
> 920: // Check for dead control input
> 921: if (in(0) && remove_dead_region(phase, can_reshape)) {
Please make nullptr check explicit
Suggestion:
if (in(0) != nullptr && remove_dead_region(phase, can_reshape)) {
src/hotspot/share/opto/divnode.cpp line 925:
> 923: }
> 924: // Don't bother trying to transform a dead node
> 925: if(in(0) && in(0)->is_top()) {
Suggestion:
if(in(0) != nullptr && in(0)->is_top()) {
src/hotspot/share/opto/divnode.cpp line 931:
> 929: const Type* t = phase->type(in(2));
> 930: if(t == TypeInt::ONE) { // Identity?
> 931: return nullptr; // Skip it
Does `Value` handle this?
src/hotspot/share/opto/divnode.cpp line 936:
> 934: const TypeInt* ti = t->isa_int();
> 935: if(ti == nullptr) {
> 936: return nullptr;
Can this ever happen? Only if it is top? If so, add assert!
src/hotspot/share/opto/divnode.cpp line 941:
> 939: // Check for useless control input
> 940: // Check for excluding div-zero case
> 941: if (in(0) && (ti->_hi < 0 || ti->_lo > 0)) {
Suggestion:
if (in(0) != nullptr && (ti->_hi < 0 || ti->_lo > 0)) {
src/hotspot/share/opto/divnode.cpp line 947:
> 945:
> 946: // Divisor very large, constant 2**31 can be transform to a shift
> 947: if (ti->_hi <= 0 && ti->_hi > min_jint) {
Would be easier to read as a range like this
Suggestion:
if (min_jint < ti->_hi && ti->_hi <= 0) {
src/hotspot/share/opto/divnode.cpp line 956:
> 954: return nullptr;
> 955: }
> 956: juint i = ti->get_con(); // Get divisor
I'd replace `i` with `divisor_con`.
src/hotspot/share/opto/divnode.cpp line 994:
> 992: }
> 993:
> 994: // TODO: Improve Value inference of both signed and unsigned division
Another stranded `TODO`
src/hotspot/share/opto/divnode.cpp line 997:
> 995: const TypeLong* i1 = t1->isa_long();
> 996: const TypeLong* i2 = t2->isa_long();
> 997: assert(i1 != nullptr && i2 != nullptr, "");
Use `t1->is_long()`, it already has a built in assert for `long` :)
src/hotspot/share/opto/divnode.cpp line 1038:
> 1036: Node* cmp = phase->transform(new CmpULNode(in(1), in(2)));
> 1037: Node* bol = phase->transform(new BoolNode(cmp, BoolTest::ge));
> 1038: return new CMoveLNode(bol, phase->longcon(0), phase->longcon(1), TypeLong::make(0, 1, Type::WidenMin));
Do we have tests for this case?
src/hotspot/share/opto/divnode.cpp line 1391:
> 1389: //=============================================================================
> 1390: //------------------------------Idealize---------------------------------------
> 1391: Node* UModINode::Ideal(PhaseGVN* phase, bool can_reshape) {
Same comments again
src/hotspot/share/opto/divnode.cpp line 1419:
> 1417: }
> 1418: juint con = ti->get_con();
> 1419: const Type* u = phase->type(in(1));
This is a constant foldable bailout? Why do you do it earlier here?
Generally, I'm starting to wonder if all this code duplication makes sense in all the `Ideal` methods?
src/hotspot/share/opto/divnode.cpp line 1426:
> 1424: // See if we are MOD'ing by 2^k
> 1425: if (is_power_of_2(con)) {
> 1426: return new AndINode(in(1), phase->intcon(con - 1));
Do we have tests?
src/hotspot/share/opto/divnode.cpp line 1428:
> 1426: return new AndINode(in(1), phase->intcon(con - 1));
> 1427: }
> 1428: // TODO: This can be calculated directly, see https://arxiv.org/abs/1902.01961
Stranded `TODO`?
src/hotspot/share/opto/divnode.cpp line 1431:
> 1429: Node* q = transform_int_udivide(phase, in(1), con);
> 1430: if (q == nullptr) {
> 1431: return nullptr;
Is this possible? Can it ever return `nullptr`?
src/hotspot/share/opto/divnode.cpp line 1465:
> 1463: //=============================================================================
> 1464: //------------------------------Idealize---------------------------------------
> 1465: Node* UModLNode::Ideal(PhaseGVN* phase, bool can_reshape) {
Same comments as for other `Ideal` methods
src/hotspot/share/opto/divnode.cpp line 1503:
> 1501: }
> 1502: Node* q = transform_long_udivide(phase, in(1), con);
> 1503: if (q == nullptr) {
Maybe assert `!Matcher::match_rule_supported(Op_UMulHiL)`, if that is the only case it should happen?
src/hotspot/share/utilities/javaArithmetic.cpp line 71:
> 69:
> 70: assert(M < java_shift_left(jlong(1), 32), "");
> 71: assert(s < 32, "");
Just in case: assert that they are non-negative (`>=0`)
src/hotspot/share/utilities/javaArithmetic.cpp line 148:
> 146: int64_t p;
> 147: uint64_t ad, anc, delta, q1, r1, q2, r2, t;
> 148: const uint64_t two63 = UCONST64(0x8000000000000000); // 2**63.
A shifted one might be more readable
src/hotspot/share/utilities/javaArithmetic.cpp line 176:
> 174:
> 175: M = q2 + 1;
> 176: s = p - 64; // shift amount to return.
Add asserts here
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/9947#pullrequestreview-1510911188
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250669894
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250686803
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250702239
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250729639
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250732261
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250732637
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250730579
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250732745
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250736080
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250765990
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250776391
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250788763
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250789401
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250794972
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250733095
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250733208
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250802121
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250800018
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250807470
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250808763
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250811574
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250814645
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250850870
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250817906
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250818223
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250819940
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250822359
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250827220
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250830792
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250837670
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250842873
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250846780
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250852739
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250853251
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250855965
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250856771
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250857127
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250861546
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250865067
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250869173
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250682316
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250877478
PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250792970
More information about the hotspot-compiler-dev
mailing list