Multiple copies of same code
Ian Rogers
ian.rogers at manchester.ac.uk
Tue Nov 24 22:32:26 PST 2009
Hi Ulf,
I don't know if it useful but 2 years ago I had a go at optimizing GNU
Classpath's NIO charset implementation, in particular for byte
charsets like ASCII [1]. The approach I wanted was for small methods
that would inline easily and final fields that could be chased through
to avoid runtime indirections. In MRP [2] (source is Eclipse Public
License) I kick the compiler to inline some of the core routines
further [3].
Regards,
Ian Rogers
(now at Azul Systems in Mountain View)
[1] http://cvs.savannah.gnu.org/viewvc/classpath/gnu/java/nio/charset/?root=classpath
[2] http://mrp.codehaus.org/
[3] http://git.codehaus.org/gitweb.cgi?p=mrp.git;a=blob;f=tools/asm-tasks/src/org/jikesrvm/tools/asm/AnnotationAdder.java;hb=HEAD
2009/11/24 Ulf Zibis <Ulf.Zibis at gmx.de>:
> I think, it's not only the code size that matters, but too the performance
> lack from all these jumps.
>
> In the method code below, you see a 2-line finally block. Looking at the
> compile result, I can see, that this block is repeated 6 times and consumes
> 1/3 of the whole assembly code for this method. Additionally, there are
> plenty of range-check and null-check block which too seem to be
> copy-and-pasted, so I guess, removing the redundant blocks from this example
> would make the code half-sized.
>
> On the other hand, the 1-length int [] dp could be optimized to a normal int
> field and pushing the 6 parameters to stack could be saved, if method
> decode() would be inlined, but isn't because of inline threshold, which
> sadly isn't frequency-related. This would additionally increase the
> performance.
>
>
> private CoderResult decodeArrayLoop(ByteBuffer src, CharBuffer dst)
> {
>
> byte[] sa = src.array();
> int sp = src.arrayOffset() + src.position();
> int sl = sp + src.remaining();
>
> char[] da = dst.array();
> int [] dp = new int[1];
> dp[0] = dst.arrayOffset() + dst.position();
> int dl = dp[0] + dst.remaining();
> try {
> while (sp < sl) {
> CoderResult result;
> byte byte1 = sa[sp];
> if (byte1 >= 0) { // ASCII G0
> if (dp[0] == dl)
> return CoderResult.OVERFLOW;
> da[dp[0]++] = (char)(byte1 & 0xff);
> sp++;
> } else if (byte1 != SS2) { // Codeset 1 G1
> if (sp + 1 == sl)
> break;
> result = decode(byte1, sa[sp+1], 0, da, dp, dl);
> if (result != null)
> return result;
> sp += 2;
> } else { // Codeset 2 G2
> if (sp + 4 > sl)
> break;
> int cnsPlane = cnspToIndex[sa[sp+1] & 0xff];
> if (cnsPlane < 0)
> return CoderResult.malformedForLength(2);
> result = decode(sa[sp+2], sa[sp+3], cnsPlane, da,
> dp, dl);
> if (result != null)
> return result;
> sp += 4;
> }
> }
> return CoderResult.UNDERFLOW;
> } finally {
> src.position(sp - src.arrayOffset());
> dst.position(dp[0] - dst.arrayOffset());
> }
> }
>
>
> -Ulf
>
>
> Am 22.11.2009 17:59, Chuck Rasbold schrieb:
>
> Sure. It would be great to merge redundant code paths. But I don't
> think the cost/benefit ratio is worth it.
> In the case you cite, there would be a savings of 4 bytes per path
> removed, which are projected to be very infrequent. In a JIT, you
> have to spend your compilation budget wisely.
> It's not that it can't be done. There are just better places to spend time.
> On Sat, Nov 21, 2009 at 5:54 AM, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>>
>> In output of PrintAssembly I frequently see :
>>
>> ...
>> ... # more than 10 recurrences
>> ...
>> 726 B108: # B114 <- B10 Freq: 9.99898e-006
>> 726 # exception oop is in EAX; no code emitted
>> 726 MOV ECX,EAX
>> 728 JMP,s B114
>> 728
>> 72a B109: # B114 <- B9 Freq: 9.99918e-006
>> 72a # exception oop is in EAX; no code emitted
>> 72a MOV ECX,EAX
>> 72c JMP,s B114
>> 72c
>> 72e B110: # B114 <- B6 Freq: 9.99938e-006
>> 72e # exception oop is in EAX; no code emitted
>> 72e MOV ECX,EAX
>> 730 JMP,s B114
>> 730
>> 732 B111: # B114 <- B4 Freq: 9.99959e-006
>> 732 # exception oop is in EAX; no code emitted
>> 732 MOV ECX,EAX
>> 734 JMP,s B114
>> 734
>> 736 B112: # B114 <- B3 Freq: 9.99979e-006
>> 736 # exception oop is in EAX; no code emitted
>> 736 MOV ECX,EAX
>> 738 JMP,s B114
>> 738
>> 73a B113: # B114 <- B2 Freq: 9.99999e-006
>> 73a # exception oop is in EAX; no code emitted
>> 73a MOV ECX,EAX
>> 73a
>> 73c B114: # N1132 <- B79 B113 B112 B111 B110 B109 B108 B103 B102
>> B101 B100 B93 B92 B91 B90 B87 B86 B85 B84 B83 B82 B81 B80 B107 B106 B105
>> B104 B78 B77 B76 B75 B99 Freq: 7.11172e-005
>>
>>
>> Wouldn't it be better to have :
>>
>> ...
>> ... # more than 10 recurrences
>> ...
>> 73a B108: # B114 <- B10 Freq: 9.99898e-006
>> 73a B109: # B114 <- B9 Freq: 9.99918e-006
>> 73a B110: # B114 <- B6 Freq: 9.99938e-006
>> 73a B111: # B114 <- B4 Freq: 9.99959e-006
>> 73a B112: # B114 <- B3 Freq: 9.99979e-006
>> 73a B113: # B114 <- B2 Freq: 9.99999e-006
>> 73a # exception oop is in EAX; no code emitted
>> 73a MOV ECX,EAX
>> 73a
>> 73c B114: # N1132 <- B79 B113 B112 B111 B110 B109 B108 B103 B102
>> B101 B100 B93 B92 B91 B90 B87 B86 B85 B84 B83 B82 B81 B80 B107 B106 B105
>> B104 B78 B77 B76 B75 B99 Freq: 7.11172e-005
>>
>>
>
>
More information about the hotspot-compiler-dev
mailing list