RFR(M): 8027754: Enable loop optimizations for loops with MathExact inside

Tue Feb 11 10:17:55 PST 2014

On 2/11/14 6:15 AM, Rickard Bäckman wrote:
> Vladimir,
>
> sorry for missing those!
> Now here is an updated webrev again: http://cr.openjdk.java.net/~rbackman/8027754.4/

Looks good.

>
> It doesn't replace ConINode::make() with phase->intcon(), because
> it doesn't always return a NEW node and we assert if Ideal doesn't
> return a new Node.

Okay, it make sense.

Thanks,
Vladimir

>
> Thanks
> /R
>
> On 02/10, Vladimir Kozlov wrote:
>> On 2/10/14 4:19 AM, Rickard Bäckman wrote:
>>> Vladimir,
>>>
>>> thanks for your suggestions.
>>> Here is an updated webrev with the changes: http://cr.openjdk.java.net/~rbackman/8027754.3/
>>
>> You missed to replace rax with general register in instructions
>> which generate CMP. These instructions does not kill the register.
>>
>> In x86_64.ad format for overflowMulL_rReg_imm() misses op2.
>>
>> I asked to move can_overflow() method before IdealHelper template
>> definition in mathexactnode.cpp. Why you kept them at the end of
>> file?
>>
>> Why you did not replace ConINode::make(phase->C, 0)?
>>
>> Nothing was done with (i1 == NULL || i2 == NULL) check.
>>
>>>
>>> Unfortunately DOMAIN is defined to 1 by math.h so I named my new field
>>> in type.hpp to TypeInt::TYPE_DOMAIN and TypeLong::TYPE_DOMAIN. Not quite
>>> as pretty, hope it works.
>>>
>>> I'll file a RFE for unifying rax/eax and eflags/rflags.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Thanks
>>> /R
>>>
>>> On 02/05, Vladimir Kozlov wrote:
>>>> Hi Rickard,
>>>>
>>>> About .ad cahgnes. I talked with Igor and he said that you have to
>>>> use specific register when the effect is USE_KILL. I agree with
>>>> that. But when you don't need to KILL or TEMP, please use general
>>>> register.
>>>>
>>>> Second. Mach instructions names, for example addofI_rReg for
>>>> OverflowAddI, are not clear. I would read addofI as 'add of
>>>> integer'. I think we should use full name to match ideal node name:
>>>> overflowAddI_rReg.
>>>>
>>>> Third. It would be nice to have these new mach instructions in one
>>>> place in x86.ad to avoid duplication. The only differences now
>>>> between x86_32.ad and x86_64.ad changes are eAXRegI vs rax_RegI and
>>>> rFlagsReg vs eFlagsReg names. rRegI was unified sometime ago.
>>>>
>>>> Since we can't push anything anyway this week. Can you file RFE and
>>>> rename eAXRegI, eFlagsReg and may be other similar things in
>>>> x86_32.ad to match x86_64.ad? After that your code can be put in
>>>> x86.ad.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 2/4/14 4:42 PM, Vladimir Kozlov wrote:
>>>>> I forgot to say about tests.
>>>>> Rickard, you don't need /othervm if flags are not used. The test will
>>>>> run in the same VM as jtreg tool in such case and will complete faster.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 2/4/14 4:26 PM, Vladimir Kozlov wrote:
>>>>>> Rickard,
>>>>>>
>>>>>> I like new implementation because it is more clean solution (no
>>>>>> specialized checks for FlagsProj all over our code).
>>>>>>
>>>>>> About changes in .ad files.
>>>>>> Why you use specifically rax instead of normal reg? Before it was
>>>>>> required because of projection nodes matching. Now it is not needed.
>>>>>>
>>>>>> You can skip affects like next because they are default:
>>>>>> effect(DEF cr, USE op1, USE op2);
>>>>>>
>>>>>> New formats are missing tab "\t".
>>>>>>
>>>>>> For OverflowMul with immediate the temp register should be declared as
>>>>>> next (in such case you do need to use specific register):
>>>>>>
>>>>>> instruct mulofI_rReg_imm(rFlagsReg cr, rRegI op1, immI op2, raxRegI tmp)
>>>>>> %{
>>>>>>    match(Set cr (OverflowMulI op1 op2));
>>>>>>    effect(TEMP tmp);
>>>>>>
>>>>>>    format %{ "imull    $tmp, $op1, $op2 #overflow check int" %}
>>>>>>    ins_encode %{
>>>>>>      __ imull($tmp$$Register, $op1$$Register, $op2$$constant);
>>>>>>    %}
>>>>>>    ins_pipe(ialu_reg_reg_alu0);
>>>>>> %}
>>>>>>
>>>>>> The question is do you want to kill an other register (rax) or you are
>>>>>> fine if you kill the input one (op1)?:
>>>>>>
>>>>>> instruct mulofI_rReg_imm(rFlagsReg cr, rRegI op1, immI op2)
>>>>>> %{
>>>>>>    match(Set cr (OverflowMulI op1 op2));
>>>>>>    effect(USE_KILL op1);
>>>>>>    format %{ "imull    $op1, $op1, $op2 #overflow check int" %}
>>>>>>    ins_encode %{
>>>>>>      __ imull($op1$$Register, $op1$$Register, $op2$$constant);
>>>>>>    %}
>>>>>>    ins_pipe(ialu_reg_reg_alu0);
>>>>>> %}
>>>>>>
>>>>>> Not big deal but: in OverflowINode class missed 'virtual' keyword in
>>>>>> Ideal() declaration. Missed 'virtual' for Value() in both classes. In
>>>>>> OverflowNode sub() is also virtual.
>>>>>>
>>>>>> Why you put OverflowNode() constructor out-of-line? It is empty and
>>>>>> could be defined in head file.
>>>>>>
>>>>>> mathexactnode.cpp: extra unneeded '()' and space:
>>>>>>
>>>>>> 46     if ( (((value1 ^ result) & (value2 ^ result)) >= 0)) {
>>>>>>
>>>>>> Can you move can_overflow() methods after will_overflow() methods and
>>>>>> before IdealHelper definition? They use the same templates.
>>>>>>
>>>>>> Why OverflowSubLNode::can_overflow() does not have (in(1) == in(2))
>>>>>> check?
>>>>>>
>>>>>> I think you can use phase->intcon(0) instead of ConINode::make(phase->C,
>>>>>> 0) in IdealHelper::Ideal() as you did in library_call.cpp.
>>>>>>
>>>>>> In IdealHelper::Value() the check (i1 == NULL || i2 == NULL) should be
>>>>>> before singleton() check otherwise you will reference through NULL in
>>>>>> non-singleton case. It would be nice to have the same check in
>>>>>> IdealHelper::Ideal().
>>>>>>
>>>>>> ConINode::make(phase->C, 0);
>>>>>>
>>>>>> In OverflowNode::sub() use fatal() instead of ShouldNotReachHere() to
>>>>>> print for what node it is called:
>>>>>>
>>>>>> fatal(err_msg_res("sub() should not be called for '%s'",
>>>>>> NodeClassNames[this->Opcode()]));
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 2/4/14 2:00 AM, Rickard Bäckman wrote:
>>>>>>> Igor,
>>>>>>>
>>>>>>> what about DOMAIN?
>>>>>>> Also looking for an additional Reviewer.
>>>>>>>
>>>>>>> Thanks
>>>>>>> /R
>>>>>>>
>>>>>>>
>>>>>>> On 01/31, Igor Veresov wrote:
>>>>>>>> library_call.cpp:
>>>>>>>>
>>>>>>>> You forgot to remove this:
>>>>>>>>   208   template <typename OverflowOp>
>>>>>>>>   209   bool inline_math_overflow(bool is_unary);
>>>>>>>>
>>>>>>>> type.hpp:
>>>>>>>>
>>>>>>>> Come to think of it again bottom() is not really a bottom. I can’t
>>>>>>>> come up with good name for it. Perhaps SIGNED ?
>>>>>>>> And define it as static constant:
>>>>>>>> static const TypeInt *SIGNED = INT;
>>>>>>>> and
>>>>>>>> static const TypeLong *SIGNED = LONG;
>>>>>>>> to follow the existing convention?
>>>>>>>>
>>>>>>>> Otherwise looks excellent.
>>>>>>>>
>>>>>>>> igor
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jan 31, 2014, at 10:22 AM, Rickard Bäckman
>>>>>>>> <rickard.backman at oracle.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Seem to have pasted the wrong link, try this one:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~rbackman/8027754.2/
>>>>>>>>>
>>>>>>>>> /R
>>>>>>>>>
>>>>>>>>> On 01/31, Christian Thalinger wrote:
>>>>>>>>>>
>>>>>>>>>> On Jan 31, 2014, at 3:44 AM, Rickard Bäckman
>>>>>>>>>> <rickard.backman at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Igor,
>>>>>>>>>>>
>>>>>>>>>>> I changed the calls in library_call.cpp according to your
>>>>>>>>>>> suggestions.
>>>>>>>>>>> The same with mathexactnode.cpp. Also did some additional cleanup
>>>>>>>>>>> there.
>>>>>>>>>>>
>>>>>>>>>>> Made will_overflow and can_overflow pure virtual in the base
>>>>>>>>>>> classes.
>>>>>>>>>>>
>>>>>>>>>>> The sub method is required as OverflowNode inherits from CmpNode.
>>>>>>>>>>>
>>>>>>>>>>> Updated webrev: http://cr.openjdk.java.net/~rbackman/8028997.2/
>>>>>>>>>>
>>>>>>>>>> That webrev seems to not have all changes.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> /R
>>>>>>>>>>>
>>>>>>>>>>> On 01/30, Igor Veresov wrote:
>>>>>>>>>>>> Hi Rickard,
>>>>>>>>>>>>
>>>>>>>>>>>> In library_call.cpp:
>>>>>>>>>>>>
>>>>>>>>>>>> May be it’s just me but I’d get rid of bool
>>>>>>>>>>>> LibraryCallKit::inline_math_overflow(bool is_unary) and reference
>>>>>>>>>>>> the arguments explicitly. Otherwise it feels like too much depth
>>>>>>>>>>>> to follow through when reading the code(since we keep specialized
>>>>>>>>>>>> functions like inline_math_addExactI() anyway).
>>>>>>>>>>>>
>>>>>>>>>>>> Something like:
>>>>>>>>>>>> bool LibraryCallKit::inline_math_addExactI(bool is_increment) {
>>>>>>>>>>>>   return inline_math_overflow<OverflowAddINode>(argument(0),
>>>>>>>>>>>> is_increment ? intcon(1) : argument(1));
>>>>>>>>>>>> }
>>>>>>>>>>>> Which will also allow you to avoid adding the IsLong enum.
>>>>>>>>>>>>
>>>>>>>>>>>> In mathexactnode.cpp:
>>>>>>>>>>>>
>>>>>>>>>>>> I’d move the AddHelper, SubHelper, MulHelper to the cpp file and
>>>>>>>>>>>> reference them directly in a more verbose way.
>>>>>>>>>>>> Something like:
>>>>>>>>>>>> bool OverflowAddINode::will_overflow(jint v1, jint v2) const {
>>>>>>>>>>>>    return AddHelper<OverflowINode>::will_overflow(v1, v2);
>>>>>>>>>>>> }
>>>>>>>>>>>> Otherwise the reader has to go and find out what OverflowHelper
>>>>>>>>>>>> means in a particular context.  This will also reduce the amount
>>>>>>>>>>>> of implementation specifics in the hpp.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a particular reason why you define will_overflow() and
>>>>>>>>>>>> can_overflow() to do ShoundNotReachHere() is the base classes
>>>>>>>>>>>> instead of making them pure virtual and let the compiler make
>>>>>>>>>>>> sure that you define them? I guess there is an expectation that
>>>>>>>>>>>> there are going to be overflow nodes that don’t redefine these
>>>>>>>>>>>> methods but are there really going to be any?
>>>>>>>>>>>>
>>>>>>>>>>>> You also define const Type* sub(const Type* t1, const Type* t2)
>>>>>>>>>>>> const as ShouldNotReachHere() and never redefine it. What are
>>>>>>>>>>>> your plans for it?
>>>>>>>>>>>>
>>>>>>>>>>>> igor
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jan 30, 2014, at 5:10 AM, Rickard Bäckman
>>>>>>>>>>>> <rickard.backman at oracle.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Igor,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for looking at this and the suggestions. I added some
>>>>>>>>>>>>> templates
>>>>>>>>>>>>> to reduce the amount of code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~rbackman/8027754.1/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> /R
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 01/23, Igor Veresov wrote:
>>>>>>>>>>>>>> Nice!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In library_call.cpp:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could the LibaryCallKit::inline_math*() family of functions be
>>>>>>>>>>>>>> factored with templates to shave a few lines? There is quite a
>>>>>>>>>>>>>> lot of common code.
>>>>>>>>>>>>>> I think the overflow idiom insertion can be something like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> template<typename OperationNodeType, template OverflowNodeType>
>>>>>>>>>>>>>> void LibraryCallKit::inline_overflow(Node* arg1, Node* arg2) {
>>>>>>>>>>>>>> Node* op = _gvn.transform(new(C) OperationNodeType(arg1, arg2));
>>>>>>>>>>>>>> Node* of = _gvn.transform(new(C) OverflowNodeType(arg1, arg2));
>>>>>>>>>>>>>> inline_math_mathExact(op, of);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In mathexactnode.cpp:
>>>>>>>>>>>>>> You’ve already commoned many things up by introducing
>>>>>>>>>>>>>> OverflowINode and OverflowLNode in hierarchy.
>>>>>>>>>>>>>> But it feels like some of the code there could factored up as
>>>>>>>>>>>>>> well using template helpers. In many cases the code looks
>>>>>>>>>>>>>> exactly the same for ints and longs, differing only in some
>>>>>>>>>>>>>> types.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> igor
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jan 23, 2014, at 3:27 AM, Rickard Bäckman
>>>>>>>>>>>>>> <rickard.backman at oracle.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> this change is going to 9 (and backporting to 8u20). Can I
>>>>>>>>>>>>>>> please have
>>>>>>>>>>>>>>> this change reviewed?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The implementation of different j.l.Math.mathExact() didn't
>>>>>>>>>>>>>>> work very
>>>>>>>>>>>>>>> well with different optimizations because there was one node
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> generated both control and data. This change has a new
>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>> where each call to j.l.Math.mathExact() generates a Overflow
>>>>>>>>>>>>>>> node and a
>>>>>>>>>>>>>>> normal math operation node (in the integer add example:
>>>>>>>>>>>>>>> OverflowAddINode
>>>>>>>>>>>>>>> and a AddINode). The Overflow node is responsible for generating
>>>>>>>>>>>>>>> control.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the end we generate assembly like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> mov rdx, rdi
>>>>>>>>>>>>>>> add rdx, rsi
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> mov rax, rdi
>>>>>>>>>>>>>>> add rax, rsi
>>>>>>>>>>>>>>> jo <overflow label>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With one add instruction for the data and one for flags. Future
>>>>>>>>>>>>>>> improvements could be to try to match the Overflow and the math
>>>>>>>>>>>>>>> operation and remove one of them.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8027754
>>>>>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~rbackman/8027754/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> /R
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>> /R
>>>