RFR: 8132459: ExceptionInInitializerError from 'java -version' on Linux under zh_CN.GB18030 locale

Thu Jul 30 05:21:48 UTC 2015

On 7/29/15 9:53 PM, David Holmes wrote:
> Hi Sherman,
>
> On 30/07/2015 1:54 PM, Xueming Shen wrote:
>> Here is the webrev to add those "missing" charsets. The assumption back
>> then was that the linux platform has successfully migrated to the 
>> "utf-8 default" world.
>
> This process seems somewhat ad-hoc, what are we using to determine 
> which charsets are "core" and which are not?

Hi David,

Each platform has a list of "supported locale/encoding". All these 
encodings/charsets need to be in
base module for that particular platform, to support the jvm to start 
(in a particular locale/encoding)
under module system. The charsets in our repository can be categorized 
into different groups, solaris/
linux specific, windows specific and IBM specific and couple that are 
shared by different platforms).
The idea here is to build all those platform-specific charsets into the 
base module for that platform.

To use the native platform de/encoding implementation/mechanism, for 
example iconv, for startup only
might be a better solution. Until we have that (and the consensus that 
we go with that approach), the
current approach appears to be the only reasonable&simple choice.

Thanks,
-Sherman

>
> Thanks,
> David
>
>>
>> http://cr.openjdk.java.net/~sherman/8132459/
>>
>> thanks,
>> Sherman
>>
>> On 7/28/15 8:22 PM, Jonathan Lu wrote:
>>> Hi Alan, Sherman,
>>>
>>> Thanks for taking a look!
>>>
>>> I understand and totally agree with improving module separation.
>>> Another quick test was just done on my Linux box for all available
>>> locales, and found several more which will cause
>>>   ExceptionInInitializerError on JDK9, but worked with JDK8.
>>>
>>> ar_AE
>>> ar_AE.iso88596
>>> ar_BH
>>> ar_BH.iso88596
>>> ar_DZ
>>> ar_DZ.iso88596
>>> ar_EG
>>> ar_EG.iso88596
>>> ar_IQ
>>> ar_IQ.iso88596
>>> ar_JO
>>> ar_JO.iso88596
>>> ar_KW
>>> ar_KW.iso88596
>>> ar_LB
>>> ar_LB.iso88596
>>> ar_LY
>>> ar_LY.iso88596
>>> ar_MA
>>> ar_MA.iso88596
>>> ar_OM
>>> ar_OM.iso88596
>>> ar_QA
>>> ar_QA.iso88596
>>> ar_SA
>>> ar_SA.iso88596
>>> ar_SD
>>> ar_SD.iso88596
>>> ar_SY
>>> ar_SY.iso88596
>>> ar_TN
>>> ar_TN.iso88596
>>> ar_YE
>>> ar_YE.iso88596
>>> hebrew
>>> he_IL
>>> he_IL.iso88598
>>> iw_IL
>>> iw_IL.iso88598
>>> mt_MT
>>> mt_MT.iso88593
>>> thai
>>> th_TH
>>> th_TH.tis620
>>> yi_US
>>> yi_US.cp1255
>>> zh_CN.gb18030
>>> zh_TW.euctw
>>>
>>> @Sherman, so other locales (except gb18030, like euctw) are all
>>> supposed to wait for the general solution, right ?
>>> As I read from the scripts, it sounds to be implementation specific
>>> decision.
>>>
>>> Thanks
>>> Jonathan
>>>
>>>
>>>> On Jul 29, 2015, at 4:56 AM, Xueming Shen <xueming.shen at oracle.com>
>>>> wrote:
>>>>
>>>> yes, gb18030 needs to be in linux/unix std-solaris/unix as well.
>>>>
>>>> -sherman
>>>>
>>>> On 07/28/2015 09:51 AM, Volker Simonis wrote:
>>>>> Hi Jonathan, Alan,
>>>>>
>>>>> this is a known problem and we've already discussed it intensively.
>>>>>
>>>>> Please have a look at:
>>>>>
>>>>> 8081674: EmptyStackException at startup if running with extended or
>>>>> unsupported charset
>>>>> https://bugs.openjdk.java.net/browse/JDK-8081674
>>>>>
>>>>> and:
>>>>>
>>>>> 8087161: Fails to start up initialize system class loader running on
>>>>> unsupported charset
>>>>> https://bugs.openjdk.java.net/browse/JDK-8087161
>>>>>
>>>>> 8081674 has a long discussion and also detailed description on how
>>>>> this can be reproduced.
>>>>> @Jonathan: the problem with your test case is that it is not 
>>>>> enough to
>>>>> only set the appropriate locale, you also have to make sure that the
>>>>> locale is installed (see bug discussion for more details). 8081674
>>>>> finally only fixed a part of the problem and left the rest for
>>>>> 8087161.
>>>>>
>>>>> The mailing list thread about this issue can be found here:
>>>>>
>>>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-June/thread.html#33879 
>>>>>
>>>>>
>>>>>
>>>>> As your bug is an exact copy of 8087161 I've closed it as duplicate.
>>>>>
>>>>> Regards,
>>>>> Volker
>>>>>
>>>>>
>>>>> On Tue, Jul 28, 2015 at 3:48 PM, Alan
>>>>> Bateman<Alan.Bateman at oracle.com>  wrote:
>>>>>> On 28/07/2015 10:50, 陆传胜(传胜) wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>>
>>>>>>> The issue
>>>>>>> was found on one of my Linux boxes which uses locale 
>>>>>>> zh_CN.GB18030 by
>>>>>>> default,
>>>>>>>
>>>>>>> a simple
>>>>>>> patch was made to fix it, may I have it reviewed ?
>>>>>>>
>>>>>>>
>>>>>>> webrev: http://cr.openjdk.java.net/~luchsh/webrev-8132459/
>>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8132459
>>>>>>>
>>>>>> I hope Sherman will have time to look at this and say whether
>>>>>> GB18030 is
>>>>>> supposed java.base and so be listed in
>>>>>> jdk/make/data/charsetmapping/stdcs-linux.
>>>>>>
>>>>>> My concern with this change is that it's bringing back code that was
>>>>>> deliberately removed as part of JDK-8038310. We want clean
>>>>>> separation of the
>>>>>> java.base and jdk.charsets modules so that charsets that are needed
>>>>>> for
>>>>>> startup in supported locales to be in java.base. Anything that is
>>>>>> not needed
>>>>>> in java.base goes to jdk.charsets and is loaded via the extended
>>>>>> charset
>>>>>> provider.
>>>>>>
>>>>>> -Alan.
>>