Windows command line processing
I've observed issues with passing non-ASCII characters through the command line to a Java program on Windows. It seems that even though I can invoke java.exe through CreateProcess, passing a full range of Unicode characters, and even though the Java main accepts strings of Unicode characters, the launcher's C main function converts the Unicode to the local ANSI code page. Thus any characters not in the local code page are lost. This seems like a bug to me. As a proof of concept, I changed jdk/src/share/bin/main.c to call GetCommanLineW instead of GetCommandLine, and then converted that string to UTF-8. For my test, I set sun.jnu.encoding to UTF-8 so that makePlatformString in LauncherHelper would just work. ------------------------------------- Tom Salter | Software Engineer | Java & Middleware Development Unisys | 2476 Swedesford Road | Malvern, PA 19355 | 610-648-2568 | N385-2568
Hello Thomas, There are long standing bugs in this area, it all depends what arguments are being passed with Unicode code pages. This is somewhat a convoluted, as the JNI invocation APIs are not Unicode friendly. Also we have a common code base for shared launcher logic between the *nixes and Windows. However as you pointed, we can pass UTF8 parameters such as main class, and application arguments to JVM as UTF-8. We already have tests to exercises these conditions, please see jdk/test/tools/launcher, however on Windows the Regional Settings must be set correctly for the locale. Thanks Kumar
I've observed issues with passing non-ASCII characters through the command line to a Java program on Windows. It seems that even though I can invoke java.exe through CreateProcess, passing a full range of Unicode characters, and even though the Java main accepts strings of Unicode characters, the launcher's C main function converts the Unicode to the local ANSI code page. Thus any characters not in the local code page are lost. This seems like a bug to me.
As a proof of concept, I changed jdk/src/share/bin/main.c to call GetCommanLineW instead of GetCommandLine, and then converted that string to UTF-8. For my test, I set sun.jnu.encoding to UTF-8 so that makePlatformString in LauncherHelper would just work.
------------------------------------- Tom Salter | Software Engineer | Java & Middleware Development Unisys | 2476 Swedesford Road | Malvern, PA 19355 | 610-648-2568 | N385-2568
One historical reason for this class of bugs was the historic support for Windows98 family, which made it much harder to switch to the correct "W" Unicode APIs. Today Windows98 is no longer supported, so some things may appear easy or at least easier. On Thu, May 23, 2013 at 6:59 AM, Salter, Thomas A <Thomas.Salter@unisys.com>wrote:
I've observed issues with passing non-ASCII characters through the command line to a Java program on Windows. It seems that even though I can invoke java.exe through CreateProcess, passing a full range of Unicode characters, and even though the Java main accepts strings of Unicode characters, the launcher's C main function converts the Unicode to the local ANSI code page. Thus any characters not in the local code page are lost. This seems like a bug to me.
As a proof of concept, I changed jdk/src/share/bin/main.c to call GetCommanLineW instead of GetCommandLine, and then converted that string to UTF-8. For my test, I set sun.jnu.encoding to UTF-8 so that makePlatformString in LauncherHelper would just work.
------------------------------------- Tom Salter | Software Engineer | Java & Middleware Development Unisys | 2476 Swedesford Road | Malvern, PA 19355 | 610-648-2568| N385-2568
participants (3)
-
Kumar Srinivasan
-
Martin Buchholz
-
Salter, Thomas A