Codereview request for 6653797: Reimplement JDK charset repository charsets.jar

Xueming Shen xueming.shen at oracle.com
Sun Jul 15 22:12:14 UTC 2012


Hi

This changeset includes the migration of our JIS0201/0208/0212 based single/
double-byte charsets to the new mapping based implementation.  This is the
left-over of the effort we put in JDK7 to re-implement most of our 
charsets in
charsets.jar to (1)have better performance (2) small storage foot print 
and (3)
ease the maintenance burden.

http://cr.openjdk.java.net/~sherman/6653797/webrev/

Notes of the implementation:

(1) jis0201/0208/0212 and their variants are now generated from the 
mapping table
during the build time. (See those new .map *.nr and *.c2b tables)

(2) EUC_JP/LINUX_OPEN, SJIS, PCK, ISO2022_JP and its variants are now 
using these
new jis0201/02080212 charsets.

(3) Those in red (in webrev) are the old implementation, since no 
charset uses them
anymore, I removed them from the repository)

(4) There are two approaches for PCK and SJIS. PCK.java_v1 and 
SJIS.java_v1 are the
one that follows the old implementation, which decode/encodes base on the
jis0201/0208 (and the variants) mapping via Ken's algorithm. This is 
known to be
slow and buggy (the algothrim does not take care of illegal sjis cp, see 
#6653797
and http://cr.openjdk.java.net/~sherman/6653797/Sjis2Jis.java)
So in this charset, I generated the direct mapping tables for sjis and 
pck and use
the "general" DoubleByte base class for these two charsets. This results 
in much
faster decoding/encoding and correct mapping for all code points. The 
downside
of this approach is that it adds about 50k uncompressed side to the 
charsets.jar.
But given this change actually reduces about 300K from the rt.jar, we 
still get
a net 250K, so I decided to go with this approach for better performance.

It appears to be lots of files (80+) in the webrev, but that number 
includes the
removed old implementation and the tests I put in to guarantee the identical
de/encoding result from the old and new implementations (those OLD... test
cases), the change is actually not that big:-) So please help review. I 
can then
put this multi-year efforts into rest.

-Sherman








More information about the core-libs-dev mailing list