Request for review: 6896617: Optimize sun.nio.cs.ISO_8859_1$Encode.encodeArrayLoop() on x86

Mon Jan 21 23:30:38 UTC 2013

Am 17.01.2013 05:27, schrieb Vladimir Kozlov:
> On 1/12/13 12:37 AM, Ulf Zibis wrote:
>> 3) bugs.sun.com/bugdatabase/view_bug.do?bug_id=6896617 ==> This bug is
>> not available.
>
> I opened it, should show up in few days.

Thanks!

>
>> 4) What specific operation should be done by the intrinsic, i.e. is
>> there a fixed API for that method ???
>
> When C2 (server JIT compiler in JVM) compiles encode methods it will replace new method 
> encodeArray() (matched by signature) with hand optimized assembler code which uses latest 
> processor instructions. I will send Hotspot changes soon. So it is nothing to do with interpreter 
> or bytecode sequence.
>
>> 5) Can an intrinsic write back more than 1 value (see my hack via int[]
>> p) ?
>> 6) Vladimir's webrev shows an integer as return type for that method,
>> I've added a variant with boolean return type, and the code from my last
>> approach could be transformed to a method with Object return type.

I wanted to say, there would maybe different results for the surrounding code depending on the API 
we choose for the intrinsic method call. If the surrounding code is *as short as possible*, there is 
a better chance, it will be JIT-compiled too after fewer invocations of encodeArrayLoop(). I guess, 
the fastest would be:

// while inlinig, JIT will erase the surrounding int[] p
private static CoderResult copyISOs(CoderResult cr,
         char[] sa, byte[] da, final int[] p, int sl) {
     for (int sp = p[0], dp = p[1]; sp < sl; sp++) {
         char c = sa[sp];
         // if (c & '\uFF00' != 0) // needs bug 6935994 to be fixed, would be fastest
         if ((byte)(c >> 8) != 0) // temporary replacement, fast byte-length operation on x86
             return null;
         da[dp++] = (byte)c;
     }
     return cr;
}

// No more needs try...finally block
private CoderResult encodeArrayLoop(
         CharBuffer src, ByteBuffer dst) {
     char[] sa = src.array();
     int soff = src.arrayOffset();
     int sp = soff + src.position();
     int sr = src.remaining();
     byte[] da = dst.array();
     int doff = dst.arrayOffset();
     int dp = doff + dst.position();
     int dr = dst.remaining();
     CoderResult cr;
     if (dr < sr) {
         sr = dr;
         cr = CoderResult.OVERFLOW;
     } else
         cr = CoderResult.UNDERFLOW;
     int sl = sp + sr;
     final int[] p = { sp, dp };
     cr = copyISOs(cr, sa, da, p, sl);
     src.position(p[0] - soff);
     dst.position(p[1] - doff);
     return result(cr, sa, p[0], sl);

}

// if adapted, maybe could also be reused in encodeBufferLoop()
private static CoderResult result(CoderResult cr, byte[] sa, int sp, int sl) {
     return cr != null ? cr :
         sgp.parse(sa[sp], sa, sp, sl) < 0
             ? sgp.error();
             : sgp.unmappableResult();
}

>>
>> ... so waiting for Vladimir's feedback :-[
>> (especially on performance/hsdis results)
>
> Performance on x86 tested with next code (whole test will be in Hotspot changes) :
>
>         ba = CharBuffer.wrap(a);
>         bb = ByteBuffer.wrap(b);
>         long start = System.currentTimeMillis();
>         for (int i = 0; i < 1000000; i++) {
>             ba.clear(); bb.clear();
>             enc_res = enc_res && enc.encode(ba, bb, true).isUnderflow();
>         }
>         long end = System.currentTimeMillis();

1.) Wouldn't System.nanoTime() give more accurate results?
2.) I want to point out that it is not real world scenario, encoding the same data 1.000.000 times. 
If same data is used, it is likely, that the data itself becomes cached in the inner CPU cache so 
should have very fast access times, which would be not the case on real world data.
3.) It would also be interesting to see the results for less than 1.000.000 iterations in 
considering, the surrounding code would be JIT-compiled or not. Also performance on C1 should be tested.

I also worry about the naming of method encodeArray(...). I think, it should reflect the fact, that 
it only encodes ISO-8859-1 charset characters.

Please add a comment on the fact, that method encodeArray(...) is intended to be intrinsified.

-Ulf