RFR: 8195129: System.load() fails to load from unicode paths [v5]

Maxim Kartashev github.com+28651297+mkartashev at openjdk.java.net
Mon Jun 7 16:25:24 UTC 2021


On Fri, 4 Jun 2021 13:36:27 GMT, Maxim Kartashev <github.com+28651297+mkartashev at openjdk.org> wrote:

>> Character strings within JVM are produced and consumed in several formats. Strings come from/to Java in the UTF8 format and POSIX APIs (like fprintf() or dlopen()) consume strings also in UTF8. On Windows, however, the situation is far less simple: some new(er) APIs expect UTF16 (wide-character strings), some older APIs can only work with strings in a "platform" format, where not all UTF8 characters can be represented; which ones can depends on the current "code page".
>> 
>> This commit switches the Windows version of native library loading code to using the new UTF16 API `LoadLibraryW()` and attempts to streamline the use of various string formats in the surrounding code. 
>> 
>> Namely, exception messages are made to consume strings explicitly in the UTF8 format, while logging functions (that end up using legacy Windows API) are made to consume "platform" strings in most cases. One exception is `JVM_LoadLibrary()` logging where the UTF8 name of the library is logged, which can, of course, be fixed, but was considered not worth the additional code (NB: this isn't a new bug).
>> 
>> The test runs in a separate JVM in order to make NIO happy about non-ASCII characters in the file name; tests are executed with LC_ALL=C and that doesn't let NIO work with non-ASCII file names even on Linux or MacOS.
>> 
>> Tested by running `test/hotspot/jtreg:tier1` on Linux and `jtreg:test/hotspot/jtreg/runtime` on Windows 10. The new test (`   jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode`) was explicitly ran on those platforms as well.
>> 
>> Results from Linux:
>> 
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR   
>>    jtreg:test/hotspot/jtreg:tier1                     1784  1784     0     0   
>> ==============================
>> TEST SUCCESS
>> 
>> 
>> Building target 'run-test-only' in configuration 'linux-x86_64-server-release'
>> Test selection 'jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode', will run:
>> * jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode
>> 
>> Running test 'jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode'
>> Passed: runtime/jni/loadLibraryUnicode/LoadLibraryUnicodeTest.java
>> Test results: passed: 1
>> 
>> 
>> Results from Windows 10:
>> 
>> Test summary
>> ==============================
>>    TEST                                              TOTAL  PASS  FAIL ERROR
>>    jtreg:test/hotspot/jtreg/runtime                    746   746     0     0
>> ==============================
>> TEST SUCCESS
>> Finished building target 'run-test-only' in configuration 'windows-x86_64-server-fastdebug'
>> 
>> 
>> Building target 'run-test-only' in configuration 'windows-x86_64-server-fastdebug'
>> Test selection 'test/hotspot/jtreg/runtime/jni/loadLibraryUnicode', will run:
>> * jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode
>> 
>> Running test 'jtreg:test/hotspot/jtreg/runtime/jni/loadLibraryUnicode'
>> Passed: runtime/jni/loadLibraryUnicode/LoadLibraryUnicodeTest.java
>> Test results: passed: 1
>
> Maxim Kartashev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Updated the test to run on Windows only and to use a character from the
>   supplementary plane in the path name.

I came to realize that changing `os::dll_load()` to accept UTF-8 (standard or modified) will break all the users of that function except `JVM_LoadLibrary()`. Consider `os::native_java_library()` that still operates with the platform encoding on Windows and works correctly if CWD contains Latin-1 characters (assuming 1252 code page). With this change, `java` will fail to start if its path name contains, say, Æ because `os::dll_load()` will expect it to be encoded as `c3 86` (UTF-8), but will get `c6` (Latin-1) instead.

One possible solution is to update all the call sites of `os::dll_load()` (quite laborous), another is to introduce `os::dll_load_utf8()` and change only `JVM_LoadLibrary()` at this point in time.

Advice is welcome.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4169


More information about the core-libs-dev mailing list