Codereview needed for HKSCS2008 support in JDK7
Xueming Shen
Xueming.Shen at Sun.COM
Fri Feb 26 00:10:25 UTC 2010
Alan, Martin, Masayoshi and anyone interested, please help comment and
review.
Bugs/RFEs:
6911753: NSN wants to add Big5 HKSCS-2004 support
6902790: Converting/displaying HKSCS characters issue on Vista and Windows7
6218752: Update HKSCS and GB18030 converters for Unicode 4.1
Webrev:
http://cr.openjdk.java.net/~sherman/hkscs2008/webrev/
http://cr.openjdk.java.net/~sherman/hkscs2008/test/webrev/
Background Info:
HK gov http://www.ogcio.gov.hk/ccli/eng/hkscs/introduction.html
Microsoft http://www.microsoft.com/hk/HKSCS/
Wiki http://en.wikipedia.org/wiki/HKSCS
HKSCS Versions:
0. HKSCS-2008
The HKSCS-2008 is an updated version of the Hong Kong
Supplementary
Character Set-2004 (HKSCS-2004) published in May 2005. It includes
5,009 characters of which 68 are newly added. The HKSCS-2008 is
aligned
technically with the ISO/IEC 10646:2003 and its Amendments 1 to 6
published in October 2009 by the International Organization for
Standardization (ISO).
1. HKSCS-2004
denotes the 123 characters that are newly included in the
HKSCS-2004.
2. HKSCS-2001
denotes the 116 characters that are newly included in the
HKSCS-2001.
3. HKSCS-1999
denotes characters that are included since the first version
of the HKSCS that was released in 1999, which contains 4,702 characters.
* HKSCS-2004 and later use Unicode 4.1 code-point/mapping.
JDK currently has two versions of HKSCS charset in its charset repository.
1. Big5_HKSCS is built on HKSCS-2001, which is used as the default
charset for Solaris zh_hk
locale (there is no indication that Solaris will move on to new
version anytime soon)
2. MS950_HKSCS is built on a mixed HKSCS2001/1999 version
Windows XP claims it's based on 2001 but its mapping table
suggests actually it's a pre-2001
version which does not use supplementary characters at all, our
implementation matches what
the XP has.
Vista (and later) now "moves on" to HKSCS2004, it has native
support (in Unicode only) for
HKSCSC2004 in its zh_hk locale. This is where the requests of
upgrading come from.
Solution:
(1) Support HKSCS2008 in JDK7 (instead of the requested HKSCS2004)
The good thing about HKSCS2008 is that it only added 68 NEW
characters into the 2004
version, so it does not have any compatibility issue, you only get 68
more mappings, which
would be "unmappable" in 2004. I don't see any reason to do HKSCS2004
when the latest
version is out already. (HK promised this would be the "last" version
of doing hkscs in Big5
encoding, from now on, they will ONLY add new characters with Unicode
code point)
-Big5_HKSCS and MS950_HKSCS charsets are now based on HKSCS2008.
-MS950_HKSCS is going to be the default charset for zh_HK locale on
vista and beyond.
(2) Charset Big5_HKSCS_2001 is HKSCS-2001 based (has exactly the same
mapping table
as the current Big5_HKSCS, which is 2001 based implementation). This
is going to be the
default "hkscs" charset for Solaris zh_HK locale (as explained
above, Solaris has no plan to
upgrade for now)
(3) Charset MS950_HKSCS_XP is the mixed-2001/1999 based hkscs for
Windows XP, this one
has the same mapping as current MS950_HKSCS. This charset is going to
be the default
"hkscs" charset for Windows XP.
(4) We also have sun.io.ByteToChar/CharToByteBig5/MS950_HKSCS.
So to make life easy,
-removed CharToByte/ByteToCharHKSCS/HKSCS_2001
-CharToByte/ByteToCharBig5/MS950_HKSCS now base on HKSCS2008
(a big bonus for sun.io.c2b/b2c users :-) )
(5) Update the b2c/c2b mapping at sun/nio/cs/mapping to correspond the
changes in charset.
(6) Make corresponding change in font.property configuration files
-----------------------------------------------------------------------------------------------------------
Belows are changes are not direct HKSCS related, but since the HKSCS
charsets are built
on top of the Big5 charset, I included them in this change as well.
(These changes are
at the bottom of the webrev page, I also have a separate webrev for them
per Martin's
request, it's here http://cr.openjdk.java.net/~sherman/big5/webrev)
(7) Migratethe Big5 charset to the "new" mapping based the
implementation (generate the
source from the Big5.map/nr mapping table during build time)
(8) Adjusted the Big5_Solaris to use the new Big5 charset, to build the
Big5_Solaris tables
on top of the Big5 tables (which should make the coding faster, with
the price of a "little"
extra runtime memory to hold its own tables)
(9) House-clanning in make/tools/src/build/tools/charsetmapping (rename,
move some piece around)
So now the change is about 50+ files:-)
Sherman
More information about the core-libs-dev
mailing list