Codereview request for 4153167: separate between ANSI and OEM code pages on Windows

Ulf Zibis Ulf.Zibis at gmx.de
Mon Feb 13 21:20:50 UTC 2012


Am 13.02.2012 19:35, schrieb Xueming Shen:
> On 2/13/2012 10:15 AM, Ulf Zibis wrote:
>> Interesting issue, especially for us germans!
>>
>> What is about System.in, if one types some umlaute at Windows console?
>
> System.in is a "InputStream",  no charset involved there,  you build your own "reader"
> on top of that yourself.
Well, in normal case, one would use the InputStreamReader with default charset. In case of Windows 
console, characters likely would be decoded wrong.
So IMO there should be a mechanism, that e.g. InputStreamReader chooses the correct OEM charset, if 
not explicitly defined otherwise and if the underlying input stream System.in is directly reading 
from the Windows console.

>> Why are there theoretically different code pages for stdout and stderr?
>
> you can re-direct std err to a log file file but keep the std out to the console, or re-direct
> the std out but keep the std.err to the console, in these scenario, the stderr and stdout
> will use different code page. Basically the approach is that if the otuput stream gets
> re-directed, it keeps using the default charset (with the assumption that the rest of the
> world is using the Windows codepage), if not, use the oem codepage from the console
> on Windows, to make sure the System.out/err outputs the bits that the underlying
> console can understand.
Oops, I'm not sure, if you didn't misunderstood me.
I mean, why are there 2 different properties? :
     "sun.stdout.encoding"
     "sun.stderr.encoding"
Shouldn't something be enough like
     "console.encoding"
as counterpart to
      "file.encoding"
?

-Ulf




More information about the core-libs-dev mailing list