RFR: 7904021: Parsing group files using non-UTF-8 encoding fails
Pasam Soujanya
duke at openjdk.org
Wed Sep 10 13:56:19 UTC 2025
On Tue, 8 Jul 2025 06:27:20 GMT, Christian Stein <cstein at openjdk.org> wrote:
>> We make use of jtreg to execute openjdk tests for JDK11/17/21 releases on non-UTF-8 returning platforms. We found latest jtreg code is using Files.newBufferedReader(path) to read group files data(TEST.GROUPS) from openjdk via GroupManager (https://github.com/openjdk/jtreg/blob/master/src/share/classes/com/sun/javatest/regtest/config/GroupManager.java#L102C44-L102C61).
>>
>> This code defaults to return BufferedReader as UTF-8 instance. We see discrepancies when using this version of jtreg on non-UTF-8 platforms where defaultCharset() is non-UTF-8(JDK11 and JDK17).
>>
>> Hence, we would like to propose a fix of using default.Charset() with Files.newBufferedWriter(Path path, Charset cs) instead of Files.newBufferedReader(path) and Files.readString(Path) to Files.readString(Path,Charset cs) in below jtreg files :
>> https://github.com/openjdk/jtreg/blob/master/src/share/classes/com/sun/javatest/regtest/config/GroupManager.java#L102C44-L102C61
>> https://github.com/openjdk/jtreg/blob/master/src/share/classes/com/sun/javatest/regtest/config/ExtraPropDefns.java#L309
>>
>> We've also tested this fix on OpenJDK supported platforms like Linux, Windows, MAC.
>>
>> ---------
>> ### Progress
>> - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer))
>> - [x] Change must not contain extraneous whitespace
>> - [x] Commit message must refer to an issue
>>
>>
>>
>> ### Reviewing
>> <details><summary>Using <code>git</code></summary>
>>
>> Checkout this PR locally: \
>> `$ git fetch https://git.openjdk.org/jtreg.git pull/267/head:pull/267` \
>> `$ git checkout pull/267`
>>
>> Update a local copy of the PR: \
>> `$ git checkout pull/267` \
>> `$ git pull https://git.openjdk.org/jtreg.git pull/267/head`
>>
>> </details>
>> <details><summary>Using Skara CLI tools</summary>
>>
>> Checkout this PR locally: \
>> `$ git pr checkout 267`
>>
>> View PR using the GUI difftool: \
>> `$ git pr show -t 267`
>>
>> </details>
>> <details><summary>Using diff file</summary>
>>
>> Download this PR as a diff file: \
>> <a href="https://git.openjdk.org/jtreg/pull/267.diff">https://git.openjdk.org/jtreg/pull/267.diff</a>
>>
>> </details>
>> <details><summary>Using Webrev</summary>
>>
>> [Link to Webrev Comment](https://git.openjdk.org/jtreg/pull/267#issuecomment-2942734568)
>> </details>
>
> Right, using `Charset.defaultCharset()` would also enable support of `file.encoding`.
>
> And before the conversion to use NIO, the `Properties.load(InputStream)` method was used. It has:
>
>> Reads a property list (key and element pairs) from the input byte stream. The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the `ISO 8859-1` character encoding; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section @jls 3.3 of The Java Language Specification
>
>
> Since jtreg 7, with [CODETOOLS-7903091](https://bugs.openjdk.org/browse/CODETOOLS-7903091) included, the `Properties.load(Reader)` is called - which gets the now UTF-8 encoded reader from `Files.newBufferedReader(file)`.
>
> Thus, all-in-all, your change resolves a regression. In the light of that, I'll approve and sponsor this pull request.
Hi @sormuras , I'm looking for this change to be present in jtreg 7.3.1 which is the minimum jtreg version for JDK11 and 17. Could you please help me in getting this PR merged to jtreg7.3.1 binaries with this fix. Thank you.
-------------
PR Comment: https://git.openjdk.org/jtreg/pull/267#issuecomment-3275088123
More information about the jtreg-dev
mailing list