Review request for 6636323_6636319

Fri Mar 20 18:25:24 UTC 2009

The change has been/is being reviewed by Alan and Ulf, sent to the alias 
to see if anyone else is
interested to take a look (Ulf suggested we should go more open:-)

http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev

(a)It's a pure performance improvement for new String(byte[], cs/csn) 
and String.getBytes(cs/can) when
data size is relatively small. (We're not ready to add those methods 
into CharsetDe/Encoder for now)

The preliminary  micro-benchmark data is at
http://cr.openjdk.java.net/~sherman/6636323_6636319/benchmark.txt
(b)This is now for ASCII, 8859-1 and all SingleByte based charsets. I 
yet to find time to migrate the
DB charsets.

Thanks,
Sherman

Simple writeup for the changes.
-------------------------------
Problem/Issue to solve:
StringCoding.java is "slow" and create "too many" objects when doing 
byte[]<->String conversion.

Root Cause:
There are "too many" layers and logic before the byte[]/char[] can reach 
the real de/encoding code
and then going back.  A pair of ByteBuffer/CharBuffer is always created 
(the wrapper) for each
conversion. While the GC should be doing pretty good these days to clean 
these wrapper objects
quickly, the "creating" and "cleaning" itself are still a waste of 
CPU/memory resource, if not really
necessary.

Two "facts/details" that we can take advantage of:

(1) StringCoding always perform REPLACE when having malformed or 
unmappable input sequences.
(2) The input and output byte/char[]  are totally under our "control", 
the de/encoding should never
 "overflow"

Changes:

(1) 2 new internal interfaces sun.nio.cs.ArrayDecoder,  
sun.nio.cs.ArrayEncoder to provide
the byte[] <->char[] fastpath from otherwise "well-encapsulated-X-Buffer 
only"  CharsetDe/Encoder
interface.

(2)US_ASCII/ISO_8859_1/SingleByte.Decoder/Encoder to implement above 
interface

(3)US_ASCII/ISO_8859_1/SingleByte.Decoder/Encoder also to override 
isLegalReplacement()
 which improve new CharsetEncoder() significantly, which has big impact 
to getBytes(charset).

(4)StringCoding.java
  a)Use ArrayDecoder/ArrayEncoder interface if possible (instanceof)
  b)Added a "isTrusted" field to indicate the charset is from the system 
class loader  during
     creating the  StringDe/Encoder(invoking 
cs.getClass().getClassLoader0()  is "expensive",  to
     pay the cost everytime len==ba.length, when there is a SM 
installed, is unnecessary, it helps
    the benchmark lot when SM installed)
  c)No longer create StringDe/Encoder to in "param  is charset" cases 
and avoid defensive copy
    if not necessary.