Request for reviews (L): 7063629: use cbcond in C2 generated code on T4

Tue Aug 9 15:55:37 PDT 2011

Thank you, Tom

Tom Rodriguez wrote:
> This looks really good.
> 
> This might be for another day but now that label must be non-NULL, maybe it should be a Label& instead of a Label*.  That would make it easier to use it directly during code generation, as in:
> 
> +     __ jmpb($labl$$label);

Yes, I would leave it for an other time. I will file RFE.

> 
> sparc.ad:
> 
> It might be nice to factor this out:
> 
>       Assembler::Predict predict_taken =
> +       cbuf.is_backward_branch(*L) ? Assembler::pt : Assembler::pn;

I will file RFE for that: use probability from IfNode to determine the pt value 
as you suggested before.

> 
> x86_32.ad:
> 
> Would you get averse to inlining Jcc and JccShort?

I did not realize that it is just one instruction now :)
They are used in a lot of places and I did not want to duplicate the original 
code. I will inline them now.

> 
> output.cpp:
> 
> Why does the first round of shorten_branches occur in the middle of init_buffer?  Couldn't it be done right afterwards?  It's just odd that it's buried inside there.

First loop in shorten_branches() estimates code, locals, stubs sizes which are 
used later in init_buffer() to allocate CodeBuffer. I would need to split 
shorten_branches() method which is not easy since the first loop also collects 
information about branches which could be replaced.

> 
> That first round is conservative since we haven't done all padding yet, right?

Correct.

> Then shorten_branches_final does a last pass based on the real offsets? 

Yes, backward branches inserted in this method use final offsets. For forward 
branches we still have only conservative offsets since following blocks are not 
processed yet.

> shorten_branches_final isn't a great name.  Maybe finalize_offsets_and_shorten?

I also did not like it, I will use finalize_offsets_and_shorten()

> 
> The core shorten branch logic is duplicated in those functions.  Could it be factored out or is there too much local state?

I thought about it but as you said "too much local state".

> 
> Why was this needed?
> 
> *** 2182,2192 ****
> --- 2383,2393 ----
>         (op != Op_Node &&         // Not an unused antidepedence node and
>          // not an unallocated boxlock
>          (OptoReg::is_valid(_regalloc->get_reg_first(n)) || op != Op_BoxLock)) ) {
>   
>       // Push any trailing projections
> !     if( bb->_nodes[bb->_nodes.size()-1] != n ) {
> !     if( bb->_nodes[_bb_end-1] != n ) {
>         for (DUIterator_Fast imax, i = n->fast_outs(imax); i < imax; i++) {
>           Node *foi = n->fast_out(i);
>           if( foi->is_Proj() )
>             _scheduled.push(foi);
>         }
> 
> That code is complicated enough that I can't reason about it's correctness from a webrev.  Is this because of the trailing NOPs?

I hit next assert during development because the loop above pushed nodes which 
are not for schedule.

     assert( _scheduled.size() == _bb_end - _bb_start, "wrong number of 
instructions" );

It may happened before I split shorten_branches() and there were trailing NOPs. 
But it is not only trailing NOPs, it is also projections after calls and 
MachNullCheck nodes (see code in DoScheduling()). I think in general the check 
above should check the last node for schedule and not the last node in block.

> 
> Can you add this comment to the that last anti_do_def piece I added:
> 
> // kill projections on a branch should appear to occur on the
> // branch, not afterwards, so grab the masks from the projections
> // and process them.

Done.

Thanks,
Vladimir

> 
> tom
> 
> 
> On Aug 4, 2011, at 6:19 PM, Vladimir Kozlov wrote:
> 
>> http://cr.openjdk.java.net/~kvn/7063629/webrev
>>
>> 7063629: use cbcond in C2 generated code on T4
>>
>> The code is finally shaped as I want and it passed CTW, regression, nsk tests on T4 and x86.
>>
>> Added new fused compare and branch instructions into sparc.ad and corresponding short versions which use cbcond instruction. Added new flag avoid_back_to_back to avoid generation of cbcond back to back.
>>
>> Split shorten_branches() into 2 methods. First method conservatively estimates code size and branches location and does few rounds of branch shortening. It is executed before ScheduleAndBundle(). Step 3 is moved to new method shorten_branches_final() called after ScheduleAndBundle(). It does final paddings, alignment and final branch replacement. Method fill_buffer() does verification instead of padding.
>>
>> Labels are binded now only during code generation in fill_buffer(). As result they are not available when forward branches are emitted. To fix that MacroAssembler branch instructions are used now in x86 .ad files. I replaced unused rtype parameter with maybe_short flag to force using only long branches in .ad long branch instructions.
>>
>> Added check to adlc to verify that short version of a branch instructions has the same declaration in .ad file.
>>
>> Added assert to verify that the size of emitted instruction matches the value returned by MachNode::size(). Found that MachBreakpointNode::size() returned incorrect value on x64.
>>
>> Fixed loop alignment for Sparc (min alignment should be instruction size which is 4 bytes instead of 1 byte).
>>
>> The prototype was done by Tom and I took some of his additional fixes. The block changes go with some code in output to put opto assembly style block comments in the PrintNMethods output. There's also snippet in there that deals with the fact kill projections on branches make it appear the kill occurs after the branch instead of being part of it.
>