Java 8 RFR 8011194: Apps launched via double-clicked .jars have file.encoding value of US-ASCII on Mac OS X

Francis Devereux francis at devrx.org
Tue Jul 30 16:05:16 UTC 2013


I suspect that Apple might be unlikely to change the value that nl_langinfo returns when LANG is unset.

However, it might be possible to fix this issue without second-guessing the character set reported by the OS by calling [NSLocale currentLocale] (or the CFLocale equivalent) instead of nl_langinfo. I think (although I haven't checked) that that [NSLocale currentLocale] determines the current locale using a mechanism other than environment variables, because LANG is usually be unset for GUI apps on OS X.

On 30 Jul 2013, at 15:56, Scott Palmer <swpalmer at gmail.com> wrote:

> Then shouldn't you be complaining to Apple that the value returned by
> nl_langinfo needs to be changed?
> David's point seems to be that second guessing the character set reported
> by the OS is likely to cause a different set of problems.
> 
> Scott
> 
> 
> On Tue, Jul 30, 2013 at 10:14 AM, Johannes Schindelin <
> Johannes.Schindelin at gmx.de> wrote:
> 
>> Hi,
>> 
>> On Tue, 30 Jul 2013, David Holmes wrote:
>> 
>>> On 30/07/2013 5:54 AM, Brent Christian wrote:
>>>> On 7/28/13 10:13 PM, David Holmes wrote:
>>>>> On 27/07/2013 3:53 AM, Brent Christian wrote:
>>>>>> Please review my fix for 8011194 : "Apps launched via
>> double-clicked
>>>>>> .jars have file.encoding value of US-ASCII on Mac OS X"
>>>>>> 
>>>>>> http://bugs.sun.com/view_bug.do?bug_id=8011194
>>>>>> 
>>>>>> In most cases of launching a Java app on Mac (from the cmdline, or
>>>>>> from a native .app bundle), reading and displaying UTF-8
>>>>>> characters beyond the standard ASCII range works fine.
>>>>>> 
>>>>>> A notable exception is the launching of an app by double-clicking
>>>>>> a .jar file.  In this case, file.encoding defaults to US-ASCII,
>>>>>> and characters outside of the ASCII range show up as garbage.
>>>>> 
>>>>> Why does this occur? What sets the encoding to US-ASCII?
>>>> 
>>>> "US-ASCII" is the answer we get from nl_langinfo(CODESET) because no
>>>> values for LANG/LC* are set in the environment when double-clicking a
>>>> .jar.
>>>> 
>>>> We get "UTF-8" when launching from the command line because the
>>>> default Terminal.app setup on Mac will setup LANG for you (to
>>>> "en_US.UTF-8" in the US).
>>> 
>>> Sounds like a user environment error to me. This isn't my area but I'm
>>> not convinced we should be second guessing what we think the encoding
>>> should be.
>> 
>> Except that that is not the case here, of course. The user did *not* set
>> any environment variable in this case.
>> 
>> So we are not talking about "second guessing" or "user environment error"
>> but about a sensible default.
>> 
>> As to US-ASCII, sorry to say: the seventies called and want their
>> character set back.
>> 
>> There can be no question that UTF-8 is the best default character
>> encoding, or are you even going to question *that*?
>> 
>>> What if someone intends for it to be US-ASCII?
>> 
>> Then LANG would not be unset, would it.
>> 
>> Hth,
>> Johannes
>> 
> 




More information about the core-libs-dev mailing list