Request for reviews (L): 7063629: use cbcond in C2 generated code on T4
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Aug 11 11:52:37 PDT 2011
They are different but result is the same. I ran with assert as you suggested
and found it immediately (-Xcomp). Anyway I will revert the change since we
still have "wrong number of instructions" assert which should catch problems.
Thanks,
Vladimir
Tom Rodriguez wrote:
> On Aug 11, 2011, at 11:22 AM, Vladimir Kozlov wrote:
>
>>>> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) {
>>>> ! if( bb->_nodes[_bb_end-1] != n ) {
>>>>
>>>> That code is complicated enough that I can't reason about it's
>>>> correctness from a webrev. Is this because of the trailing NOPs?
>>> I hit next assert during development because the loop above pushed nodes
>>> which are not for schedule.
>>>
>>> assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of
>>> instructions" );
>>>
>>> It may happened before I split shorten_branches() and there were
>>> trailing NOPs. But it is not only trailing NOPs, it is also projections
>>> after calls and MachNullCheck nodes (see code in DoScheduling()). I
>>> think in general the check above should check the last node for schedule
>>> and not the last node in block.
>> Tom,
>>
>> I ran full CTW without this change with my latest changes and did not hit the assert which confirms that it was problem in early development when trailing NOPs were inserted before DoScheduling() call. Do you think I should remove this change?
>
> If it isn't be needed then I think should be removed. You could put in an assert that the old and new value are equal and then investigate any cases where they are different to confirm which value is correct. It may be that they are different but both could be correct.
>
> tom
>
>> Thanks,
>> Vladimir
>>
>> Vladimir Kozlov wrote:
>>> Thank you, Tom
>>> Tom Rodriguez wrote:
>>>> This looks really good.
>>>>
>>>> This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*. That would make it easier to use it directly during code generation, as in:
>>>>
>>>> + __ jmpb($labl$$label);
>>> Yes, I would leave it for an other time. I will file RFE.
>>>> sparc.ad:
>>>>
>>>> It might be nice to factor this out:
>>>>
>>>> Assembler::Predict predict_taken =
>>>> + cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn;
>>> I will file RFE for that: use probability from IfNode to determine the pt value as you suggested before.
>>>> x86_32.ad:
>>>>
>>>> Would you get averse to inlining Jcc and JccShort?
>>> I did not realize that it is just one instruction now :)
>>> They are used in a lot of places and I did not want to duplicate the original code. I will inline them now.
>>>> output.cpp:
>>>>
>>>> Why does the first round of shorten_branches occur in the middle of init_buffer? Couldn't it be done right afterwards? It's just odd that it's buried inside there.
>>> First loop in shorten_branches() estimates code, locals, stubs sizes which are used later in init_buffer() to allocate CodeBuffer. I would need to split shorten_branches() method which is not easy since the first loop also collects information about branches which could be replaced.
>>>> That first round is conservative since we haven't done all padding yet, right?
>>> Correct.
>>>> Then shorten_branches_final does a last pass based on the real offsets?
>>> Yes, backward branches inserted in this method use final offsets. For forward branches we still have only conservative offsets since following blocks are not processed yet.
>>>> shorten_branches_final isn't a great name. Maybe finalize_offsets_and_shorten?
>>> I also did not like it, I will use finalize_offsets_and_shorten()
>>>> The core shorten branch logic is duplicated in those functions. Could it be factored out or is there too much local state?
>>> I thought about it but as you said "too much local state".
>>>> Why was this needed?
>>>>
>>>> *** 2182,2192 ****
>>>> --- 2383,2393 ----
>>>> (op != Op_Node && // Not an unused antidepedence node and
>>>> // not an unallocated boxlock
>>>> (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) {
>>>> // Push any trailing projections
>>>> ! if( bb->_nodes[bb->_nodes.size()-1] != n ) {
>>>> ! if( bb->_nodes[_bb_end-1] != n ) {
>>>> for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) {
>>>> Node *foi = n->fast_out(i);
>>>> if( foi->is_Proj() )
>>>> _scheduled.push(foi);
>>>> }
>>>>
>>>> That code is complicated enough that I can't reason about it's correctness from a webrev. Is this because of the trailing NOPs?
>>> I hit next assert during development because the loop above pushed nodes which are not for schedule.
>>> assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of instructions" );
>>> It may happened before I split shorten_branches() and there were trailing NOPs. But it is not only trailing NOPs, it is also projections after calls and MachNullCheck nodes (see code in DoScheduling()). I think in general the check above should check the last node for schedule and not the last node in block.
>>>> Can you add this comment to the that last anti_do_def piece I added:
>>>>
>>>> // kill projections on a branch should appear to occur on the
>>>> // branch, not afterwards, so grab the masks from the projections
>>>> // and process them.
>>> Done.
>>> Thanks,
>>> Vladimir
>>>> tom
>>>>
>>>>
>>>> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote:
>>>>
>>>>> http://cr.openjdk.java.net/~kvn/7063629/webrev
>>>>>
>>>>> 7063629: use cbcond in C2 generated code on T4
>>>>>
>>>>> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86.
>>>>>
>>>>> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back.
>>>>>
>>>>> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding.
>>>>>
>>>>> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions.
>>>>>
>>>>> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file.
>>>>>
>>>>> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64.
>>>>>
>>>>> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte).
>>>>>
>>>>> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it.
>
More information about the hotspot-compiler-dev
mailing list