Java 8 RFR 8011194: Apps launched via double-clicked .jars have file.encoding value of US-ASCII on Mac OS X

David DeHaven david.dehaven at oracle.com
Tue Jul 30 23:06:42 UTC 2013


I was about to chime in that UTF-8 has been the preferred encoding for (stored) text on Mac OS X as long as I've been hacking on it (think "Rhapsody"), so why is this even an issue?

Judging from the docs, nl_langinfo seems like a Unix portability function (something more likely to be happier with ASCII in a terminal), not something to be used by a native Cocoa application.


<vote>
Set it to UTF-8 and forget about it
</vote>

-DrD-

> Apple is highly unlikely to change the behavior of nl_langinfo().
> 
> There is already code in the JDK that calls into JRSCopyPrimaryLanguage(), JRSCopyCanonicalLanguageForPrimaryLanguage(), and JRSSetDefaultLocalization() for exactly this purpose.
> 
> Please proceed with setting the encoding to UTF-8. It is the de-facto standard for every Cocoa application I have ever seen. US-ASCII is always the wrong choice for a graphical app on OS X.
> 
> Regards,
> Mike Swingler
> Apple Inc.
> 
> On Jul 30, 2013, at 9:05 AM, Francis Devereux <francis at devrx.org> wrote:
> 
>> I suspect that Apple might be unlikely to change the value that nl_langinfo returns when LANG is unset.
>> 
>> However, it might be possible to fix this issue without second-guessing the character set reported by the OS by calling [NSLocale currentLocale] (or the CFLocale equivalent) instead of nl_langinfo. I think (although I haven't checked) that that [NSLocale currentLocale] determines the current locale using a mechanism other than environment variables, because LANG is usually be unset for GUI apps on OS X.
>> 
>> On 30 Jul 2013, at 15:56, Scott Palmer <swpalmer at gmail.com> wrote:
>> 
>>> Then shouldn't you be complaining to Apple that the value returned by
>>> nl_langinfo needs to be changed?
>>> David's point seems to be that second guessing the character set reported
>>> by the OS is likely to cause a different set of problems.
>>> 
>>> Scott
>>> 
>>> 
>>> On Tue, Jul 30, 2013 at 10:14 AM, Johannes Schindelin <
>>> Johannes.Schindelin at gmx.de> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> On Tue, 30 Jul 2013, David Holmes wrote:
>>>> 
>>>>> On 30/07/2013 5:54 AM, Brent Christian wrote:
>>>>>> On 7/28/13 10:13 PM, David Holmes wrote:
>>>>>>> On 27/07/2013 3:53 AM, Brent Christian wrote:
>>>>>>>> Please review my fix for 8011194 : "Apps launched via
>>>> double-clicked
>>>>>>>> .jars have file.encoding value of US-ASCII on Mac OS X"
>>>>>>>> 
>>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8011194
>>>>>>>> 
>>>>>>>> In most cases of launching a Java app on Mac (from the cmdline, or
>>>>>>>> from a native .app bundle), reading and displaying UTF-8
>>>>>>>> characters beyond the standard ASCII range works fine.
>>>>>>>> 
>>>>>>>> A notable exception is the launching of an app by double-clicking
>>>>>>>> a .jar file.  In this case, file.encoding defaults to US-ASCII,
>>>>>>>> and characters outside of the ASCII range show up as garbage.
>>>>>>> 
>>>>>>> Why does this occur? What sets the encoding to US-ASCII?
>>>>>> 
>>>>>> "US-ASCII" is the answer we get from nl_langinfo(CODESET) because no
>>>>>> values for LANG/LC* are set in the environment when double-clicking a
>>>>>> .jar.
>>>>>> 
>>>>>> We get "UTF-8" when launching from the command line because the
>>>>>> default Terminal.app setup on Mac will setup LANG for you (to
>>>>>> "en_US.UTF-8" in the US).
>>>>> 
>>>>> Sounds like a user environment error to me. This isn't my area but I'm
>>>>> not convinced we should be second guessing what we think the encoding
>>>>> should be.
>>>> 
>>>> Except that that is not the case here, of course. The user did *not* set
>>>> any environment variable in this case.
>>>> 
>>>> So we are not talking about "second guessing" or "user environment error"
>>>> but about a sensible default.
>>>> 
>>>> As to US-ASCII, sorry to say: the seventies called and want their
>>>> character set back.
>>>> 
>>>> There can be no question that UTF-8 is the best default character
>>>> encoding, or are you even going to question *that*?
>>>> 
>>>>> What if someone intends for it to be US-ASCII?
>>>> 
>>>> Then LANG would not be unset, would it.
>>>> 
>>>> Hth,
>>>> Johannes
>>>> 
>>> 
>> 
> 




More information about the core-libs-dev mailing list