A question about bytecodes + unsigned load performance ./. add performace
Vladimir Kozlov
Vladimir.Kozlov at Sun.COM
Thu Jan 15 09:44:18 PST 2009
The loop in match_into_reg() is simplified check for
control nodes domination. You have to follow all
merging paths above a region to check that they all dominate.
And since it didn't reach m->in(0) we can not say that it dominates
so we bailout.
I wonder what are different control nodes. One is a loop head (region),
I assume, and an other? And why AndI has control edge?
AndI === 458 129 140 [[ 164 ]]
Thanks,
Vladimir
Christian Thalinger wrote:
>
> On Tue, 2009-01-13 at 10:26 -0800, Vladimir Kozlov wrote:
>> This code exists as far as history goes (1997), so no bugs or comments
>>
>> if( _shared[n->_idx] || // Shared; stop recursion
>> // Also stop recursion if they have different Controls.
>> // Slot 0 of constants is not really a Control.
>> (control != n->in(0) && n->in(0) && !n->is_Con()) ||
>> // Finally, stop recursion if this a LoadNode and the root of
>> // this tree is a StoreNode and the load & store have different
>> // memories.
>> ((mem!=(Node*)1) && n->is_Load() && n->in(LoadNode::Memory) != mem) ) {
>> // Switch to a register-only opcode; this value must be in a register
>> // and cannot be subsumed as part of a larger instruction.
>> svec->DFA( n->ideal_reg(this), n );
>
> Thank you Vladimir. As I currently do not understand all of that code I
> can simple guess and try.
>
> I have to do two changes to get the loadUB matched for all cases in that
> method:
>
> --- hotspot.35ae4dd6c27c/src/share/vm/opto/matcher.cpp 2009-01-15 16:12:29.748744167 +0100
> +++ /export/home/twisti/projects/openjdk/jdk7/hotspot-comp/hotspot/src/share/vm/opto/matcher.cpp 2009-01-15 16:12:24.732215699 +0100
> @@ -1252,7 +1252,8 @@ static bool match_into_reg( const Node *
> uint j;
> for( j=0; j<max_scan; j++ ) {
> if( x->is_Region() ) // Bail out at merge points
> - return true;
> + //return true;
> + break;
> x = x->in(0);
> if( x == m->in(0) ) // Does 'control' post-dominate
> break; // m->in(0)? If so, we can use it
> @@ -1331,10 +1332,12 @@ Node *Matcher::Label_Root( const Node *n
> // the current tree. If it finds any, that value is matched as a
> // register operand. If not, then the normal matching is used.
> if( match_into_reg(n, m, control, i, is_shared(m)) ||
> +#if 0
> //
> // Stop recursion if this is LoadNode and the root of this tree is a
> // StoreNode and the load & store have different memories.
> ((mem!=(Node*)1) && m->is_Load() && m->in(MemNode::Memory) != mem) ||
> +#endif
> // Can NOT include the match of a subtree when its memory state
> // is used by any of the other subtrees
> (input_mem == NodeSentinel) ) {
>
> Maybe the second patch is OK (but I'm not sure on that one too), but I
> doubt the first one is correct. Can someone explain to me why we need
> to bail out at merge points (as the comment states)?
>
> At least with these two changes I get the inner loop as follows:
>
> 130 B9: # B12 B10 <- B8 B10 Loop: B9-B10 inner stride: not constant main of N106 Freq: 10.9989
> 130 movslq R10, R11 # i2l
> 133 movq R8, [rsp + #8] # spill
> 138 movzbl RBX, [R8 + #24 + R10] # ubyte
> 13e movw [R9 + #24 + R10 << #1], RBX # char/short
> 144 movl RBP, R11 # spill
> 147 addl RBP, #4 # int
> 14a movslq R11, R11 # i2l
> 14d movzbl R8, [R8 + #25 + R11] # ubyte
> 153 movw [R9 + #26 + R11 << #1], R8 # char/short
> 159 movq R10, [rsp + #8] # spill
> 15e movzbl R10, [R10 + #27 + R11] # ubyte
> 164 movq R8, [rsp + #8] # spill
> 169 movzbl R8, [R8 + #26 + R11] # ubyte
> 16f movw [R9 + #28 + R11 << #1], R8 # char/short
> 175 movw [R9 + #30 + R11 << #1], R10 # char/short
> 17b cmpl RBP, RCX
> 17d jge,s B12 # loop end P=0.090918 C=97000.000000
> 17d
> 17f B10: # B9 <- B9 Freq: 9.99894
> 17f movl R11, RBP # spill
> 182 jmp,s B9
>
> It's better than before but there's still optimization potential.
>
> -- Christian
>
More information about the hotspot-compiler-dev
mailing list