Loop compilation weirdness

Wed Dec 3 11:53:06 PST 2008

As many of you may know, in the appserver we spent a great deal of time
converting characters (from Strings) to bytes (for HTTP traffic). The
vast majority of this is ISO_8859_1 encoding, so the conversion is a
simple loop that does b[bIndex++] = (byte) c[cIndex++].

Still, we'd been using the NIO converters for this rather than coding it
ourselves. In certain cases, the overhead of wrapping the arrays into
buffers introduces a huge performance penalty (and increased GC), so
we're thinking of doing this special-case conversion directly.

So we wrote a simple loop to do the conversion and found out that for 8K
buffers of HTTP requests, the simple loop takes significantly longer
than calling the NIO. If we take the loop from
sun/nio/cs/ISO_8859_1.java and code it directly, we get the expected
performance. [An aside for the curious: we often convert 8K buffers
because that's a JSP page size. But we also often convert really small
strings, and its the conversion of those small 16-character strings
where the overhead of calling the NIO converter kicks in.]

The attached code boils this all down: the convertFast method (derived
from the ISO_8859_1 class) is the fastest for arrays of all sizes;
convertNio is fast for large arrays and slow for small arrays; and
convertSimple is generally fast for small arrays, but 2-3x slower than
convertFast for large arrays. In fact, the key part of the convertFast
method is this:

    while (...characters to process ... ) {
		if (dp >= dl) {
                    return CoderResult.OVERFLOW;
                    // We need to reset the output byte array. If we
reset it directly
                    // (e.g. if we just set dp=0 here), performance
drops by 50%
                    // and is hence equivalent to convertSimple
                }
                da[dp++] = (byte) c;
    }

So if we return out of this loop, reset the output byte buffer, and
return to the loop -- then the loop is optimized faster than if we reset
the output byte buffer directly in the if statement (a little clearer if
you look at the entire code). Oddly, this is true even though in this
particular test, the body of the if statement is never executed.

So my long-winded question is why the loop is compiled differently when
we return out of the if statement rather than if we process a simple
statement in the if statement. And in general, why the simpler
convertSimple method in the attached is sub-optimal to the more complex
convertFast, at least for large arrays. Is this just an odd
corner-case-or-something bug, or is there some basic fast loop-writing
construct I'm not aware of here?

If you run as is, the key thing to look at is the second test for
strings of 8192, which on my very old sparc machine gives me 537ms for
fast and 1474ms for simple; on my x4100 with Linux it gives me
96ms/207ms respectively. You can comment out the return from the if and
uncomment the dp=0 line, and see the performance of the "fast" test drop
by 2-3x.

We've tested this is with both JDK 6_07 and the lastest promotion of JDK
7

-Scott
-------------- next part --------------
A non-text attachment was scrubbed...
Name: C2BTest.java
Type: text/x-java
Size: 3704 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20081203/316ad1d3/attachment.bin