Review request for 6636323_6636319
Xueming Shen
Xueming.Shen at Sun.COM
Fri Mar 20 18:25:24 UTC 2009
The change has been/is being reviewed by Alan and Ulf, sent to the alias
to see if anyone else is
interested to take a look (Ulf suggested we should go more open:-)
http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
(a)It's a pure performance improvement for new String(byte[], cs/csn)
and String.getBytes(cs/can) when
data size is relatively small. (We're not ready to add those methods
into CharsetDe/Encoder for now)
The preliminary micro-benchmark data is at
http://cr.openjdk.java.net/~sherman/6636323_6636319/benchmark.txt
(b)This is now for ASCII, 8859-1 and all SingleByte based charsets. I
yet to find time to migrate the
DB charsets.
Thanks,
Sherman
Simple writeup for the changes.
-------------------------------
Problem/Issue to solve:
StringCoding.java is "slow" and create "too many" objects when doing
byte[]<->String conversion.
Root Cause:
There are "too many" layers and logic before the byte[]/char[] can reach
the real de/encoding code
and then going back. A pair of ByteBuffer/CharBuffer is always created
(the wrapper) for each
conversion. While the GC should be doing pretty good these days to clean
these wrapper objects
quickly, the "creating" and "cleaning" itself are still a waste of
CPU/memory resource, if not really
necessary.
Two "facts/details" that we can take advantage of:
(1) StringCoding always perform REPLACE when having malformed or
unmappable input sequences.
(2) The input and output byte/char[] are totally under our "control",
the de/encoding should never
"overflow"
Changes:
(1) 2 new internal interfaces sun.nio.cs.ArrayDecoder,
sun.nio.cs.ArrayEncoder to provide
the byte[] <->char[] fastpath from otherwise "well-encapsulated-X-Buffer
only" CharsetDe/Encoder
interface.
(2)US_ASCII/ISO_8859_1/SingleByte.Decoder/Encoder to implement above
interface
(3)US_ASCII/ISO_8859_1/SingleByte.Decoder/Encoder also to override
isLegalReplacement()
which improve new CharsetEncoder() significantly, which has big impact
to getBytes(charset).
(4)StringCoding.java
a)Use ArrayDecoder/ArrayEncoder interface if possible (instanceof)
b)Added a "isTrusted" field to indicate the charset is from the system
class loader during
creating the StringDe/Encoder(invoking
cs.getClass().getClassLoader0() is "expensive", to
pay the cost everytime len==ba.length, when there is a SM
installed, is unnecessary, it helps
the benchmark lot when SM installed)
c)No longer create StringDe/Encoder to in "param is charset" cases
and avoid defensive copy
if not necessary.
More information about the core-libs-dev
mailing list