A question about bytecodes + unsigned load performance ./. add performace

Tom Rodriguez Thomas.Rodriguez at Sun.COM
Mon Jan 12 13:52:27 PST 2009


> I tried to take a look at it, but now I'm stuck.

I suspect this is a problem I remember seeing long ago in the logic  
for breaking match trees with multiple memory edges.  When converting  
the ideal form into the mach form, the matcher has to take the ideal  
form, which is a DAG, and cut it up into trees.  The matcher can only  
operate on binary trees.  Within these binary trees it's possible for  
multiple ideal nodes to be swallowed into a single mach node.  One  
danger with this is when matching multiple nodes that have memory  
inputs it might be possible to violate anti-dependence.  There's a  
some cutouts in Matcher::Label_Root that force a break in the match  
when there are multiple memory inputs in the match tree:

     // Check for leaves of the State Tree; things that cannot be a  
part of
     // the current tree.  If it finds any, that value is matched as a
     // register operand.  If not, then the normal matching is used.
     if( match_into_reg(n, m, control, i, is_shared(m)) ||
         //
         // Stop recursion if this is LoadNode and the root of this  
tree is a
         // StoreNode and the load & store have different memories.
         ((mem!=(Node*)1) && m->is_Load() && m->in(MemNode::Memory) !=  
mem) ||
         // Can NOT include the match of a subtree when its memory state
         // is used by any of the other subtrees
         (input_mem == NodeSentinel) ) {

The ideal for the simple example is something like (StoreC mem2 addr2  
(AndI (LoadB mem1 addr1) (ConI 0xff))).  The code above will break the  
match at the load, forcing the value into a register.  It's seem like  
an excessively strong cutout but I'm not sure how to phrase it better,  
particularly since I don't know what exactly what problem it designed  
to eliminate.  I believe it's probably the anti-dep issue but without  
a concrete failure it's hard to know what exactly it should look like.

tom

>
>
> The ideal nodes in question are:
>
> 129	LoadB	===  311  51  127  [[ 141 ]]  @byte[int:>=0]:exact+any *,  
> idx=4; #byte !jvms: test::foo @ bci:28
> 140	ConI	===  0  [[ 141  217  268  347  439  441 ]]  #int:255
> 141	AndI	===  458  129  140  [[ 164 ]]  !orig=[377] !jvms:  
> test::decode @ bci:4 test::foo @ bci:29
>
> So loadUB should match but it does not (and I don't know why, yet).   
> The
> opto output is:
>
> 102   B7: #	B6 B8 <- B6  Freq: 2
> 102   	movslq  R10, R11	# i2l
> 105   	movq    R8, [rsp + #8]	# spill
> 10a   	movsbl  R8, [R8 + #24 + R10]	# byte
> 110   	incl    R11	# int
> 113   	movzbl  R8, R8	# int & 0xFF
> 117   	movw    [R9 + #24 + R10 << #1], R8	# char/short
> 11d   	cmpl    R11, #1
> 121   	jl,s   B6	# loop end  P=0.500000 C=22950.000000
>
> It seems the increment of the loop variable gets scheduled between  
> LoadB
> and immI_255 and thus loadUB cannot match.
>
> Not sure yet when matching is applied and if I'm right with my
> assumption above.  I'm looking further...
>
> -- Christian
>




More information about the hotspot-dev mailing list