Java 8 RFR 8011194: Apps launched via double-clicked .jars have file.encoding value of US-ASCII on Mac OS X

David Holmes david.holmes at oracle.com
Wed Jul 31 03:12:38 UTC 2013


If Mike endorses this approach then I step aside.

My point, as Scott noted, is that we get this info from the OS. So 
either the OS is at fault or we're asking the wrong question. Based on 
Mike's response we're asking the wrong question - but if the end result 
will be UTF-8 regardless then fine.

David

On 31/07/2013 8:53 AM, Mike Swingler wrote:
> Apple is highly unlikely to change the behavior of nl_langinfo().
>
> There is already code in the JDK that calls into JRSCopyPrimaryLanguage(), JRSCopyCanonicalLanguageForPrimaryLanguage(), and JRSSetDefaultLocalization() for exactly this purpose.
>
> Please proceed with setting the encoding to UTF-8. It is the de-facto standard for every Cocoa application I have ever seen. US-ASCII is always the wrong choice for a graphical app on OS X.
>
> Regards,
> Mike Swingler
> Apple Inc.
>
> On Jul 30, 2013, at 9:05 AM, Francis Devereux <francis at devrx.org> wrote:
>
>> I suspect that Apple might be unlikely to change the value that nl_langinfo returns when LANG is unset.
>>
>> However, it might be possible to fix this issue without second-guessing the character set reported by the OS by calling [NSLocale currentLocale] (or the CFLocale equivalent) instead of nl_langinfo. I think (although I haven't checked) that that [NSLocale currentLocale] determines the current locale using a mechanism other than environment variables, because LANG is usually be unset for GUI apps on OS X.
>>
>> On 30 Jul 2013, at 15:56, Scott Palmer <swpalmer at gmail.com> wrote:
>>
>>> Then shouldn't you be complaining to Apple that the value returned by
>>> nl_langinfo needs to be changed?
>>> David's point seems to be that second guessing the character set reported
>>> by the OS is likely to cause a different set of problems.
>>>
>>> Scott
>>>
>>>
>>> On Tue, Jul 30, 2013 at 10:14 AM, Johannes Schindelin <
>>> Johannes.Schindelin at gmx.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> On Tue, 30 Jul 2013, David Holmes wrote:
>>>>
>>>>> On 30/07/2013 5:54 AM, Brent Christian wrote:
>>>>>> On 7/28/13 10:13 PM, David Holmes wrote:
>>>>>>> On 27/07/2013 3:53 AM, Brent Christian wrote:
>>>>>>>> Please review my fix for 8011194 : "Apps launched via
>>>> double-clicked
>>>>>>>> .jars have file.encoding value of US-ASCII on Mac OS X"
>>>>>>>>
>>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8011194
>>>>>>>>
>>>>>>>> In most cases of launching a Java app on Mac (from the cmdline, or
>>>>>>>> from a native .app bundle), reading and displaying UTF-8
>>>>>>>> characters beyond the standard ASCII range works fine.
>>>>>>>>
>>>>>>>> A notable exception is the launching of an app by double-clicking
>>>>>>>> a .jar file.  In this case, file.encoding defaults to US-ASCII,
>>>>>>>> and characters outside of the ASCII range show up as garbage.
>>>>>>>
>>>>>>> Why does this occur? What sets the encoding to US-ASCII?
>>>>>>
>>>>>> "US-ASCII" is the answer we get from nl_langinfo(CODESET) because no
>>>>>> values for LANG/LC* are set in the environment when double-clicking a
>>>>>> .jar.
>>>>>>
>>>>>> We get "UTF-8" when launching from the command line because the
>>>>>> default Terminal.app setup on Mac will setup LANG for you (to
>>>>>> "en_US.UTF-8" in the US).
>>>>>
>>>>> Sounds like a user environment error to me. This isn't my area but I'm
>>>>> not convinced we should be second guessing what we think the encoding
>>>>> should be.
>>>>
>>>> Except that that is not the case here, of course. The user did *not* set
>>>> any environment variable in this case.
>>>>
>>>> So we are not talking about "second guessing" or "user environment error"
>>>> but about a sensible default.
>>>>
>>>> As to US-ASCII, sorry to say: the seventies called and want their
>>>> character set back.
>>>>
>>>> There can be no question that UTF-8 is the best default character
>>>> encoding, or are you even going to question *that*?
>>>>
>>>>> What if someone intends for it to be US-ASCII?
>>>>
>>>> Then LANG would not be unset, would it.
>>>>
>>>> Hth,
>>>> Johannes
>>>>
>>>
>>
>



More information about the core-libs-dev mailing list