Java 8 RFR 8011194: Apps launched via double-clicked .jars have file.encoding value of US-ASCII on Mac OS X

Brent Christian brent.christian at oracle.com
Wed Jul 31 20:43:46 UTC 2013


On 7/30/13 4:06 PM, David DeHaven wrote:
>
> Judging from the docs, nl_langinfo seems like a Unix portability
> function (something more likely to be happier with ASCII in a
> terminal), not something to be used by a native Cocoa application.

Exactly - so I think it expects to be called from a cmdline with a 
shell-style surrounding environment, with LANG/etc variables set.

David suggests that calling nl_langinfo() is "asking the wrong 
question."  In the particular context of double-click launching on Mac, 
you could say that's true (or at least asking the question in the wrong 
way).

But consider - the code in question is shared with other Unix platforms, 
and when running from the cmdline/shell scripts/etc, nl_langinfo() *is* 
the right way to ask the question.

To ask the right question for this specific context on MacOS X (NSLocale 
or CFLocale) I suspect would involve a fair amount of code surgery, and 
the end result would be the same.  Given this, I think my proposed 
change is a good one from a practical standpoint.

Thank you, everyone, for your feedback.

-Brent

>> Apple is highly unlikely to change the behavior of nl_langinfo().
>>
>> There is already code in the JDK that calls into JRSCopyPrimaryLanguage(), JRSCopyCanonicalLanguageForPrimaryLanguage(), and JRSSetDefaultLocalization() for exactly this purpose.
>>
>> Please proceed with setting the encoding to UTF-8. It is the de-facto standard for every Cocoa application I have ever seen. US-ASCII is always the wrong choice for a graphical app on OS X.
>>
>> Regards,
>> Mike Swingler
>> Apple Inc.
>>
>> On Jul 30, 2013, at 9:05 AM, Francis Devereux <francis at devrx.org> wrote:
>>
>>> I suspect that Apple might be unlikely to change the value that nl_langinfo returns when LANG is unset.
>>>
>>> However, it might be possible to fix this issue without second-guessing the character set reported by the OS by calling [NSLocale currentLocale] (or the CFLocale equivalent) instead of nl_langinfo. I think (although I haven't checked) that that [NSLocale currentLocale] determines the current locale using a mechanism other than environment variables, because LANG is usually be unset for GUI apps on OS X.
>>>
>>> On 30 Jul 2013, at 15:56, Scott Palmer <swpalmer at gmail.com> wrote:
>>>
>>>> Then shouldn't you be complaining to Apple that the value returned by
>>>> nl_langinfo needs to be changed?
>>>> David's point seems to be that second guessing the character set reported
>>>> by the OS is likely to cause a different set of problems.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Tue, Jul 30, 2013 at 10:14 AM, Johannes Schindelin <
>>>> Johannes.Schindelin at gmx.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Tue, 30 Jul 2013, David Holmes wrote:
>>>>>
>>>>>> On 30/07/2013 5:54 AM, Brent Christian wrote:
>>>>>>> On 7/28/13 10:13 PM, David Holmes wrote:
>>>>>>>> On 27/07/2013 3:53 AM, Brent Christian wrote:
>>>>>>>>> Please review my fix for 8011194 : "Apps launched via
>>>>> double-clicked
>>>>>>>>> .jars have file.encoding value of US-ASCII on Mac OS X"
>>>>>>>>>
>>>>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8011194
>>>>>>>>>
>>>>>>>>> In most cases of launching a Java app on Mac (from the cmdline, or
>>>>>>>>> from a native .app bundle), reading and displaying UTF-8
>>>>>>>>> characters beyond the standard ASCII range works fine.
>>>>>>>>>
>>>>>>>>> A notable exception is the launching of an app by double-clicking
>>>>>>>>> a .jar file.  In this case, file.encoding defaults to US-ASCII,
>>>>>>>>> and characters outside of the ASCII range show up as garbage.
>>>>>>>>
>>>>>>>> Why does this occur? What sets the encoding to US-ASCII?
>>>>>>>
>>>>>>> "US-ASCII" is the answer we get from nl_langinfo(CODESET) because no
>>>>>>> values for LANG/LC* are set in the environment when double-clicking a
>>>>>>> .jar.
>>>>>>>
>>>>>>> We get "UTF-8" when launching from the command line because the
>>>>>>> default Terminal.app setup on Mac will setup LANG for you (to
>>>>>>> "en_US.UTF-8" in the US).
>>>>>>
>>>>>> Sounds like a user environment error to me. This isn't my area but I'm
>>>>>> not convinced we should be second guessing what we think the encoding
>>>>>> should be.
>>>>>
>>>>> Except that that is not the case here, of course. The user did *not* set
>>>>> any environment variable in this case.
>>>>>
>>>>> So we are not talking about "second guessing" or "user environment error"
>>>>> but about a sensible default.
>>>>>
>>>>> As to US-ASCII, sorry to say: the seventies called and want their
>>>>> character set back.
>>>>>
>>>>> There can be no question that UTF-8 is the best default character
>>>>> encoding, or are you even going to question *that*?
>>>>>
>>>>>> What if someone intends for it to be US-ASCII?
>>>>>
>>>>> Then LANG would not be unset, would it.
>>>>>
>>>>> Hth,
>>>>> Johannes
>>>>>
>>>>
>>>
>>
>



More information about the core-libs-dev mailing list