RFR 8124977 cmdline encoding challenges on Windows
    Kirk Shoop 
    Kirk.Shoop at microsoft.com
       
    Thu Aug  6 02:45:09 UTC 2015
    
    
  
> -----Original Message-----
> From: Xueming Shen [mailto:xueming.shen at oracle.com]
> Sent: Monday, July 20, 2015 11:50 AM
> 
> On 07/20/2015 10:22 AM, Kirk Shoop wrote:
> > So when default system locale differs from the active one, we have
> different behavior on Linux and Windows. The new options allow a windows
> user to select the same behavior that one would expect on unix. The
> switches can certainly be removed, if the compatibility impact is acceptable.
> 
> Kirk, on Windows file.encoding is from the user locale and the
> sun.jnu.encoding is from the system locale setting. sun.jnu.encoding is
> purely for those text encoding sensitive jnu functiond to communicate with
> the underlying windows system api, when the system locale and the user
> locale are set to different value. On unix/linux/osx, these two are always set
> to the same value. Yes, they might be input/output issue if the encoding
> used by the console (oem codepage) is not compatible with the encoding
> used by the "user locale"
> and you are trying to use System.in/out/err for the input/output to the
> console.
> 
> Here is the original CCC request regarding the sun.jnu.encoding, which might
> provide some background info.
> 
> http://cr.openjdk.java.net/~sherman/4958170.html
> 
> If you/we are NOT going to change the encoding used by the underlying
> console, I don't think we need/should change the "encoding" used by the
> java.io.Console. As I suggested in my previously email, the
> Java_java_io_Console_encoding() implementation probably need to update
> to return utf8 if the cp == 65001 (that was 10 years ago, I'm not sure if the
> 65001 was really used back then when we wrote this code).  My
> understanding of the issue here is that if you continue to use the "A" version
> of the API to parse/get the arguments, and try to solve the possible issue
> triggered by the "incompatibility" of the oem encoding used by the console
> and the user locale encoding used by the System.in/ out/err, it's fine to
> define a new system property to specify a preferred encoding for the
> launcher to use, but this "preferred" encoding should not be used by
> java.io.Console.
> But isn't it more reasonable to simply always use the "W" version for this
> purpose in launcher?
> 
> -Sherman
> 
Thank you for the valuable feedback. We have vastly simplified the original patch. 
The new webrev is here:
  http://cr.openjdk.java.net/~kshoop/8124977/webrev.02/ 
This webrev uses GetCommandLineW on windows to retrieve the UCS16 commandline and also supports the 65001(UTF-8) codepage (set by chcp 65001) so that when -Dsun.jnu.encoding="UTF-8" is supplied the console output (stdout & stderr) will be in UTF8.
There are no new commandline switches.
Please let us know if there is anything else that needs improvement.
Thanks!
Kirk and Valery
    
    
More information about the core-libs-dev
mailing list