Multiple copies of same code

Ian Rogers ian.rogers at manchester.ac.uk
Tue Nov 24 22:32:26 PST 2009


Hi Ulf,

I don't know if it useful but 2 years ago I had a go at optimizing GNU
Classpath's NIO charset implementation, in particular for byte
charsets like ASCII [1]. The approach I wanted was for small methods
that would inline easily and final fields that could be chased through
to avoid runtime indirections. In MRP [2] (source is Eclipse Public
License) I kick the compiler to inline some of the core routines
further [3].

Regards,
Ian Rogers
(now at Azul Systems in Mountain View)

[1] http://cvs.savannah.gnu.org/viewvc/classpath/gnu/java/nio/charset/?root=classpath
[2] http://mrp.codehaus.org/
[3] http://git.codehaus.org/gitweb.cgi?p=mrp.git;a=blob;f=tools/asm-tasks/src/org/jikesrvm/tools/asm/AnnotationAdder.java;hb=HEAD

2009/11/24 Ulf Zibis <Ulf.Zibis at gmx.de>:
> I think, it's not only the code size that matters, but too the performance
> lack from all these jumps.
>
> In the method code below, you see a 2-line finally block. Looking at the
> compile result, I can see, that this block is repeated 6 times and consumes
> 1/3 of the whole assembly code for this method. Additionally, there are
> plenty of range-check and null-check block which too seem to be
> copy-and-pasted, so I guess, removing the redundant blocks from this example
> would make the code half-sized.
>
> On the other hand, the 1-length int [] dp could be optimized to a normal int
> field and pushing the 6 parameters to stack could be saved, if method
> decode() would be inlined, but isn't because of inline threshold, which
> sadly isn't frequency-related. This would additionally increase the
> performance.
>
>
>         private CoderResult decodeArrayLoop(ByteBuffer src, CharBuffer dst)
> {
>
>             byte[] sa = src.array();
>             int sp = src.arrayOffset() + src.position();
>             int sl = sp + src.remaining();
>
>             char[] da = dst.array();
>             int [] dp = new int[1];
>             dp[0] = dst.arrayOffset() + dst.position();
>             int dl = dp[0] + dst.remaining();
>             try {
>                 while (sp < sl) {
>                     CoderResult result;
>                     byte byte1 = sa[sp];
>                     if (byte1 >= 0) {               // ASCII      G0
>                         if (dp[0] == dl)
>                             return CoderResult.OVERFLOW;
>                         da[dp[0]++] = (char)(byte1 & 0xff);
>                         sp++;
>                     } else if (byte1 != SS2) {      // Codeset 1  G1
>                         if (sp + 1 == sl)
>                             break;
>                         result = decode(byte1, sa[sp+1], 0, da, dp, dl);
>                         if (result != null)
>                             return result;
>                         sp += 2;
>                     } else {                        // Codeset 2  G2
>                         if (sp + 4 > sl)
>                             break;
>                         int cnsPlane = cnspToIndex[sa[sp+1] & 0xff];
>                         if (cnsPlane < 0)
>                             return CoderResult.malformedForLength(2);
>                         result = decode(sa[sp+2], sa[sp+3], cnsPlane, da,
> dp, dl);
>                         if (result != null)
>                             return result;
>                         sp += 4;
>                     }
>                 }
>                 return CoderResult.UNDERFLOW;
>             } finally {
>                 src.position(sp - src.arrayOffset());
>                 dst.position(dp[0] - dst.arrayOffset());
>             }
>         }
>
>
> -Ulf
>
>
> Am 22.11.2009 17:59, Chuck Rasbold schrieb:
>
> Sure.  It would be great to merge redundant code paths.  But I don't
> think the cost/benefit ratio is worth it.
> In the case you cite, there would be a savings of 4 bytes per path
> removed, which are projected to be very infrequent. In a JIT, you
> have to spend your compilation budget wisely.
> It's not that it can't be done. There are just better places to spend time.
> On Sat, Nov 21, 2009 at 5:54 AM, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>
>> In output of PrintAssembly I frequently see :
>>
>> ...
>> ...   # more than 10 recurrences
>> ...
>> 726   B108: #        B114 <- B10  Freq: 9.99898e-006
>> 726           # exception oop is in EAX; no code emitted
>> 726           MOV    ECX,EAX
>> 728           JMP,s  B114
>> 728
>> 72a   B109: #        B114 <- B9  Freq: 9.99918e-006
>> 72a           # exception oop is in EAX; no code emitted
>> 72a           MOV    ECX,EAX
>> 72c           JMP,s  B114
>> 72c
>> 72e   B110: #        B114 <- B6  Freq: 9.99938e-006
>> 72e           # exception oop is in EAX; no code emitted
>> 72e           MOV    ECX,EAX
>> 730           JMP,s  B114
>> 730
>> 732   B111: #        B114 <- B4  Freq: 9.99959e-006
>> 732           # exception oop is in EAX; no code emitted
>> 732           MOV    ECX,EAX
>> 734           JMP,s  B114
>> 734
>> 736   B112: #        B114 <- B3  Freq: 9.99979e-006
>> 736           # exception oop is in EAX; no code emitted
>> 736           MOV    ECX,EAX
>> 738           JMP,s  B114
>> 738
>> 73a   B113: #        B114 <- B2  Freq: 9.99999e-006
>> 73a           # exception oop is in EAX; no code emitted
>> 73a           MOV    ECX,EAX
>> 73a
>> 73c   B114: #        N1132 <- B79 B113 B112 B111 B110 B109 B108 B103 B102
>> B101 B100 B93 B92 B91 B90 B87 B86 B85 B84 B83 B82 B81 B80 B107 B106 B105
>> B104 B78 B77 B76 B75 B99  Freq: 7.11172e-005
>>
>>
>> Wouldn't it be better to have :
>>
>> ...
>> ...   # more than 10 recurrences
>> ...
>> 73a   B108: #        B114 <- B10  Freq: 9.99898e-006
>> 73a   B109: #        B114 <- B9  Freq: 9.99918e-006
>> 73a   B110: #        B114 <- B6  Freq: 9.99938e-006
>> 73a   B111: #        B114 <- B4  Freq: 9.99959e-006
>> 73a   B112: #        B114 <- B3  Freq: 9.99979e-006
>> 73a   B113: #        B114 <- B2  Freq: 9.99999e-006
>> 73a           # exception oop is in EAX; no code emitted
>> 73a           MOV    ECX,EAX
>> 73a
>> 73c   B114: #        N1132 <- B79 B113 B112 B111 B110 B109 B108 B103 B102
>> B101 B100 B93 B92 B91 B90 B87 B86 B85 B84 B83 B82 B81 B80 B107 B106 B105
>> B104 B78 B77 B76 B75 B99  Freq: 7.11172e-005
>>
>>
>
>


More information about the hotspot-compiler-dev mailing list