A question about bytecodes + unsigned load performance ./. add performace

Vladimir Kozlov Vladimir.Kozlov at Sun.COM
Thu Jan 15 09:44:18 PST 2009


The loop in match_into_reg() is simplified check for
control nodes domination. You have to follow all
merging paths above a region to check that they all dominate.
And since it didn't reach m->in(0) we can not say that it dominates
so we bailout.

I wonder what are different control nodes. One is a loop head (region),
I assume, and an other? And why AndI has control edge?

AndI	===  458  129  140  [[ 164 ]]

Thanks,
Vladimir

Christian Thalinger wrote:
> 
> On Tue, 2009-01-13 at 10:26 -0800, Vladimir Kozlov wrote:
>> This code exists as far as history goes (1997), so no bugs or comments
>>
>>    if( _shared[n->_idx] || 	// Shared; stop recursion
>>        // Also stop recursion if they have different Controls.
>>        // Slot 0 of constants is not really a Control.
>>        (control != n->in(0) && n->in(0) && !n->is_Con()) ||
>>        // Finally, stop recursion if this a LoadNode and the root of
>>        // this tree is a StoreNode and the load & store have different
>>        // memories.
>>        ((mem!=(Node*)1) && n->is_Load() && n->in(LoadNode::Memory) != mem) ) {
>>      // Switch to a register-only opcode; this value must be in a register
>>      // and cannot be subsumed as part of a larger instruction.
>>      svec->DFA( n->ideal_reg(this), n );
> 
> Thank you Vladimir.  As I currently do not understand all of that code I
> can simple guess and try.
> 
> I have to do two changes to get the loadUB matched for all cases in that
> method:
> 
> --- hotspot.35ae4dd6c27c/src/share/vm/opto/matcher.cpp	2009-01-15 16:12:29.748744167 +0100
> +++ /export/home/twisti/projects/openjdk/jdk7/hotspot-comp/hotspot/src/share/vm/opto/matcher.cpp	2009-01-15 16:12:24.732215699 +0100
> @@ -1252,7 +1252,8 @@ static bool match_into_reg( const Node *
>        uint j;
>        for( j=0; j<max_scan; j++ ) {
>          if( x->is_Region() )    // Bail out at merge points
> -          return true;
> +          //return true;
> +          break;
>          x = x->in(0);
>          if( x == m->in(0) )     // Does 'control' post-dominate
>            break;                // m->in(0)?  If so, we can use it
> @@ -1331,10 +1332,12 @@ Node *Matcher::Label_Root( const Node *n
>      // the current tree.  If it finds any, that value is matched as a
>      // register operand.  If not, then the normal matching is used.
>      if( match_into_reg(n, m, control, i, is_shared(m)) ||
> +#if 0
>          //
>          // Stop recursion if this is LoadNode and the root of this tree is a
>          // StoreNode and the load & store have different memories.
>          ((mem!=(Node*)1) && m->is_Load() && m->in(MemNode::Memory) != mem) ||
> +#endif
>          // Can NOT include the match of a subtree when its memory state
>          // is used by any of the other subtrees
>          (input_mem == NodeSentinel) ) {
> 
> Maybe the second patch is OK (but I'm not sure on that one too), but I
> doubt the first one is correct.  Can someone explain to me why we need
> to bail out at merge points (as the comment states)?
> 
> At least with these two changes I get the inner loop as follows:
> 
> 130   B9: #     B12 B10 <- B8 B10       Loop: B9-B10 inner stride: not constant main of N106 Freq: 10.9989
> 130     movslq  R10, R11        # i2l
> 133     movq    R8, [rsp + #8]  # spill
> 138     movzbl  RBX, [R8 + #24 + R10]   # ubyte
> 13e     movw    [R9 + #24 + R10 << #1], RBX     # char/short
> 144     movl    RBP, R11        # spill
> 147     addl    RBP, #4 # int
> 14a     movslq  R11, R11        # i2l
> 14d     movzbl  R8, [R8 + #25 + R11]    # ubyte
> 153     movw    [R9 + #26 + R11 << #1], R8      # char/short
> 159     movq    R10, [rsp + #8] # spill
> 15e     movzbl  R10, [R10 + #27 + R11]  # ubyte
> 164     movq    R8, [rsp + #8]  # spill
> 169     movzbl  R8, [R8 + #26 + R11]    # ubyte
> 16f     movw    [R9 + #28 + R11 << #1], R8      # char/short
> 175     movw    [R9 + #30 + R11 << #1], R10     # char/short
> 17b     cmpl    RBP, RCX
> 17d     jge,s   B12     # loop end  P=0.090918 C=97000.000000
> 17d
> 17f   B10: #    B9 <- B9  Freq: 9.99894
> 17f     movl    R11, RBP        # spill
> 182     jmp,s   B9
> 
> It's better than before but there's still optimization potential.
> 
> -- Christian
> 



More information about the hotspot-compiler-dev mailing list