<i18n dev> Codereview request for 4153167: separate between ANSI and OEM code pages on Windows

Xueming Shen xueming.shen at oracle.com
Mon Feb 13 09:36:36 PST 2012


Hi

This is a long standing Windows codepage support issue on Java platform 
(we probably have
20 bug/rfes filed for this particular issue and closed as the dup of 
4153167). Windows supports
two sets of codepages,  ANSI (Windows) codepage and OEM (IBM) codepage.  
Windows uses
ANSI/Windows codepage almost "everywhere" except in its dos/command 
prompt window,
which uses OEM codepage. For example, on a normal English Windows, the 
default Windows
codepage isCp1252 <http://msdn.microsoft.com/en-us/goglobal/cc305145> 
(west European Latin) and the OEM codepage used in its dos/command
prompt however is Cp437 
<http://msdn.microsoft.com/en-us/goglobal/cc305156> (you can use chcp 
command to check/change the "active" codepage
used in your dos/coomand prompt). These two obviously have different 
mapping for certain
code points, for example those umlaut characters.

J2SE runtime chooses the ANSI/Windows codepage as its default charset 
for its i/o character
reading/writing, graphic text display, etc. including System.out&err. 
This causes problem when
the ANSI code page and OEM codepage are not "compatible" and you happen 
to need to write
those "in-compatible" characters to the dos/command prompt, as show in 
the following test
case

         String umlaut = "\u00f6\u00e4\u00fc\u00d6\u00c4\u00dc\u00df";
         PrintWriter ps = new PrintWriter(new 
OutputStreamWriter(System.out, "Cp437"), true );
         ps.println("ps.cp437: " + umlaut);
         System.out.println("sys.out : " + umlaut);
         System.err.println("sys.err : " + umlaut);

You will see the umlauts get displayed correctly from PrintWriter with 
explicit Cp437 encoding
setting, but garbled from system.out and err (because both the 
System.out & err use the default
charset Cp1252, which is also used for all necessary Unicode <-> Windows 
encoding conversion
for that particular vm instance).

For years, we have been debating whether or not we should and how to fix 
this issue, do we
want to have two "default charset" for i/o. In jdk6, we have provided a 
java.io.Console class
that specifically uses OEM codepage when running on Windows' dos/command 
prompt.
However, the feedback is that people still want the System.out/err to 
work correctly with
the dos/command prompt, when the OEM codepage used is not "compatible" 
with the default
Windows codepage.

The proposed change here is to use OEM codepage for System.out/err when 
the vm is
started without its std out/err is redirected to something else, such as 
a file (make sure
to only use OME for the dos/command prompt), if vm's std out/err is 
redirected, then
continue to use the default charset (file.encoding) for the 
System.out/err.  I believe this
approach solves the problem without breaking any existing assumption/use 
scenario.

The webrev is at

http://cr.openjdk.java.net/~sherman/4153167/webrev

Here is a simple"manual" test case.

public class HelloWorld {

     public static void main(String[] args) throws Exception {

         String umlaut = "\u00f6\u00e4\u00fc\u00d6\u00c4\u00dc\u00df";

         System.out.println("file.encoding =" + 
System.getProperty("file.encoding"));
         System.out.println("stdout.encoding=" + 
System.getProperty("sun.stdout.encoding"));
         System.out.println("stderr.encoding=" + 
System.getProperty("sun.stderr.encoding"));
         System.out.println("-----------------------");

         PrintWriter ps = new PrintWriter(new 
OutputStreamWriter(System.out, "Cp437"),
                                          true );
         ps.println("ps.cp437: " + umlaut);
         System.out.println("sys.out : " + umlaut);
         System.err.println("sys.err : " + umlaut);
         Console con = System.console();
         if (con != null)
             con.printf("console : %s%n", umlaut);
     }
}

-Sherman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/i18n-dev/attachments/20120213/f582bd4f/attachment.html 


More information about the i18n-dev mailing list