JDK-8072773 should support more Charsets

Philippe Marschall philippe.marschall at gmail.com
Wed Jul 8 08:08:55 UTC 2015


Hi

I was following the discussion of JDK-8072773 and having a look at the
implementation. I was a bit surprised that ISO-8859-15 is not
supported. It's the default on Windows in Western Europe and the only
differences to ISO-8859-1 are that eight additional characters in a
previously unassigned area. But that not me thinking, the only real
requirement of the algorithm is that CR LF is mapped to 13 10.
So I ran the code below which gave me the list below (on 8u45 on Mac
OS). I don't really know if you want to hard code this list. On the
other hand you likely also don't wont this code in a static
initializer or preload all these charsets. Some of them have quite
large coding tables and this adds up to about 5 MB heap. So maybe just
take the Windows and ISO ones?  Or is that just a westener
perspective?

        String crlfString = "\r\n";
        byte[] crlfBytes = new byte[]{13, 10};
        for (Charset charset : Charset.availableCharsets().values()) {
            try {
                if (Arrays.equals(crlfString.getBytes(charset), crlfBytes)) {
                    System.out.println(charset.name());
                }
            } catch (UnsupportedOperationException e) {
                // ISO-2022-CN and x-JISAutoDetect don't cooperate
            }
        }

Big5
Big5-HKSCS
CESU-8
EUC-JP
EUC-KR
GB18030
GB2312
GBK
IBM00858
IBM437
IBM775
IBM850
IBM852
IBM855
IBM857
IBM860
IBM861
IBM862
IBM863
IBM864
IBM865
IBM866
IBM868
IBM869
ISO-2022-JP
ISO-2022-JP-2
ISO-2022-KR
ISO-8859-1
ISO-8859-13
ISO-8859-15
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
JIS_X0201
KOI8-R
KOI8-U
Shift_JIS
TIS-620
US-ASCII
UTF-8
windows-1250
windows-1251
windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
windows-31j
x-Big5-HKSCS-2001
x-Big5-Solaris
x-euc-jp-linux
x-EUC-TW
x-eucJP-Open
x-IBM1006
x-IBM1046
x-IBM1098
x-IBM1124
x-IBM1381
x-IBM1383
x-IBM33722
x-IBM737
x-IBM856
x-IBM874
x-IBM921
x-IBM922
x-IBM942
x-IBM942C
x-IBM943
x-IBM943C
x-IBM948
x-IBM949
x-IBM949C
x-IBM950
x-IBM964
x-IBM970
x-ISCII91
x-ISO-2022-CN-CNS
x-ISO-2022-CN-GB
x-iso-8859-11
x-Johab
x-MacArabic
x-MacCentralEurope
x-MacCroatian
x-MacCyrillic
x-MacDingbat
x-MacGreek
x-MacHebrew
x-MacIceland
x-MacRoman
x-MacRomania
x-MacSymbol
x-MacThai
x-MacTurkish
x-MacUkraine
x-MS932_0213
x-MS950-HKSCS
x-MS950-HKSCS-XP
x-mswin-936
x-PCK
x-SJIS_0213
x-windows-50220
x-windows-50221
x-windows-874
x-windows-949
x-windows-950
x-windows-iso2022jp

 [1] https://en.wikipedia.org/wiki/ISO/IEC_8859-15

Cheers
Philippe


More information about the nio-dev mailing list