RFR(L): 8003854: PPC64 (part 115): Introduce lateExpand that expands nodes after register allocation.

Wed Nov 13 15:37:48 PST 2013

Hi Vladimir, 

Thanks for the thorough review!!
I made a new webrev:
http://cr.openjdk.java.net/~goetz/webrevs/8003854-lateExp-2/

> It is good feature. 
Thanks :)
> Can you name it postaloc_expand to be clear when we do it?
Fixed.

> Names. I know it is picking but for methods and .ad statements names we 
> use low case and underscore. Camel style is used for class names.
No matter, changing names isn't a big deal. I hope I've got all.

> I know this was written long time ago but we are trying now to not use 
> separate enc_class unless it is used by several instructions. Can you 
> also implement ability to write postaloc_expand inside mach instruction 
> (As we do now for ins_encode)?  You can do it as separate changes later 
> if it a lot of work.
Done.  Was really simple ;)  I'll adapt our ad file eventually, but that contains
370 ins_encode and 460 instruct declarations, so it will take some time.

> Are you allow branches in late expand? I assume not because you don't 
> have separate graph's blocks for that. It would be nice to have some 
> verification code which enforces restrictions.
No, no branches.  That would need a very different approach, as the cfg
would have to be changed.  We have some conditional moves with branches 
that would be much better represented if blocks were possible.  But we did not implement
that as it seems far too much effort.  Anyways, on ia64 we have predicated
instructions, so there we don't need branches.  On Power, starting with 
Power7, there is a conditional move.

> Also it would be nice if in lateExpand rule you need only define nodes 
> and connect them and the rest (opnds clonning, RA info patching and 
> pushing result nodes) be done by adlc. But it will require more code in 
> adlc so it may be for the future. I am worried that current code 
> requires very careful coding: which operand to clone and to which assign 
> it, which RA methods to use for resetting (set1(), set2(), set_pair()). 
> It is very bugs prone.
Yes, it is a bit complicated.  Basically, adlc could generate constructors
for nodes that generate the operands.  At least if the node clearly specifies
it's operands.  If there are operand classes that's ambiguous.
I would at least like some checker code like machNode->check_operands
that checks that all operands are there and that they have the proper
type.
The advantage  is that it's more flexible than the expand mechanism.
For example there are no C-if's possible in expands. Many other things
are restricted, like generating nodes that require TEMPs. All these
features could be added to expand, but that's always a lot of overhead
if you just want to generate some instructions.

> Will it be difficult to totally separate lateExpands from ins_encode? 
> Your current code use _insencode + _is_lateExpand. Can you do one 
> InsEncode* _lateExpand? Or it will be too intrusive?
I will just have to copy lots of code.  Also, for the postaloc_expand %{ %} 
support I just had to call the proper ins_encode function. I don't think
replicating all that code makes sense.

> block.cpp:
> Node_List already has method contains(Node* n), use it in 
> Block::contains() (you can define it in header file then).
Done.  Also added const to Node_List::contains().

PhaseCFG::LateExpand():

> TraceLateExpand is develop flag so it will be true only for #ifdef 
> ASSERT. It seems all #ifndef PRODUCT should be replaced by #ifdef ASSERT 
> in this method.
Done.

> "These kills are not required any more after expanding" - what about 
> flags kill Proj nodes? Disconnecting input nodes worries me. Do you 
> clean graph after that to remove unused nodes?
Yes, the graph is clean afterwards.  Many kills/temps are explicit register
deps after expanding.  Those that should be kept have to be added in the
postaloc_expand method of the node.  If that method wants to reuse a Proj, it can
add it to the node list returned.  Then it will be first removed, and
then added again.  This is necessary, as the position in the blocks
node list must be correct.

> I need to look more on LateExpand() in next round.

> compile.cpp: Can you add TracePhase/TraceTime to get time spent in 
> LateExpand.
Done.  I used TracePhase.

> Why you need lateExpand() method in Node class? Generated mach nodes are 
> based on MachNode so this method could be declared only in MachNode. 
> (Putting new virtual method into top Node class will increase virtual 
> tables for all ideal and mach nodes).
We put it just right next to format() and emit().  Moved to machnode.

Best regards,
  Goetz.

Thanks,
Vladimir

On 11/12/13 8:32 AM, Lindenmaier, Goetz wrote:
> Hi,
>
> I also updated this webrev to work with the latest staging repository.
> http://cr.openjdk.java.net/~goetz/webrevs/8003854-lateExp/
>
> Best regards,
>    Goetz.
>
> From: goetz.lindenmaier at sap.com
> Sent: Mittwoch, 2. Oktober 2013 15:49
> To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; 'Vladimir Kozlov'
> Subject: RFR(L): 8003854: PPC64 (part 115): Introduce lateExpand that expands nodes after register allocation.
>
> Hi,
>
> we extended C2 by what we call lateExpand. LateExpand expands nodes
> after register allocation.
> http://cr.openjdk.java.net/~goetz/webrevs/8003854-lateExp/
>
> Some nodes can not be expanded during matching. E.g., register
> allocation might not be able to deal with the resulting pattern. To
> allow better scheduling in such cases, we introduce lateExpand which
> runs after register allocation. Whether and how nodes are expanded is
> specified in the ad-file. See block.cpp for a detailed
> documentation. We use this for some nodes on ppc, and extensively on
> ia64.
> For an example, see the webrev.
>
> We added some utility functions in node.hpp and block.hpp used by
> PhaseCFG::LateExpand() or the Node::lateExpand functions.  Also,
> MachConstantBaseNode gets the two new functions, as we need to
> late expand it.
>
> To skip the walk over the IR if no lateExpand is needed, we use
> Matcher::require_late_expand set in the ad file. This ends up in
> ad_<cpu>.cpp, thus can not be evaluated by the C++ compiler.  If you
> agree, I would use a "const bool EnableLateExpand" in
> globalDefinitions_<cpu>.hpp.  (There is no suitable c2_<xxx>_<cpu>.hpp
> file).
>
> We allready mailed a webrev with this change after last year's
> JavaOne, but there was no discussion on it.
> Again, this change is 'L', but the code is not used on other
> platforms than PPC64 (so far).  So the impact on those platforms
> should be minimal.
>
> Please review and test this change.
>
> Thanks and best regards,
>    Goetz.
>