RFR: 8282042: [testbug] FileEncodingTest.java depends on default encoding

Fri Feb 18 18:40:57 UTC 2022

On Thu, 17 Feb 2022 22:50:37 GMT, Tyler Steele <duke at openjdk.java.net> wrote:

> FileEncodingTest expects all non-Windows platforms will have `Charset.defaultCharset().name()` set to US-ASCII when file.encoding is set to COMPAT. This assumption does not hold for AIX where it is ISO-8859-1.
> 
> According to [JEP-400](https://openjdk.java.net/jeps/400), we should expect  `Charset.defaultCharset().name()` to equal `System.getProperty("native.encoding")` whenever the COMPAT flag is set. From JEP-400: "... if file.encoding is set to COMPAT on the command line, then the run-time value of file.encoding will be the same as the run-time value of native.encoding...". So one way to resolve this is to choose the value for each system from the native.encoding property.
> 
> With these changes however, my test systems continue to fail. 
> 
> - AIX reports: Default Charset: ISO-8859-1, expected: ISO8859-1
> - Linux/Z reports: Default Charset: US-ASCII, expected: ANSI_X3.4-1968
> - Linux/PowerLE reports: Default Charset: US-ASCII, expected: ANSI_X3.4-1968
> 
> Note that the expected value is populated from native.encoding.
> 
> This implies more work to be done. It looks to me that some modification to java_props_md.c may be needed to ensure that the System properties for native.encoding return [canonical names](http://www.iana.org/assignments/character-sets). 
> 
> ---
> 
> A tempting alternative is to set the expected value for AIX to "ISO-8859-1" in the test explicitly, as was done for the Windows expected encoding prior to this proposed change. The main advantage to this alternative is that it is quick and easy, but the disadvantages are that it fails to test that COMPAT behaves as specified in JEP-400, and the approach does not scale well if it happens that other systems require other cases. I wonder if this is the reason non-English locals are excluded by the test.
> 
> Proceeding with this change and the work implied by the new failures it highlights goes beyond the scope of what I thought was a simple testbug. So I'm opening this up for some comments before proceeding down the rabbit hole of further changes. If there is generally positive support for this direction I'm happy to make the modifications necessary to populate native.encoding with canonical names. As I am new to OpenJDK, I am especially looking to ensure that changing the value returned by native.encoding will not have unintended consequences elsewhere in the project.

Thanks for your feedback Naoto.

I agree it's a little odd to test the way I proposed above, as it introduces uncertainty as you mentioned, as well as other issues like both native.encoding and Charsets.defaultCharset() being wrong, but being wrong in the same way. The main part of testing this way was the quoted line from JEP-400 (of which I recognize you are an author). Maybe I'm being too literal; in my testing the encodings match, even if the names are aliases of the ones I expect. In addition, you have a good point about the purpose of the COMPAT flag being compatibility. I agree that it's not really appropriate to change the values of native.encoding to the canonical ones.

I was feeling torn between the proposed option and alternative, and your feedback definitely sways me towards the alternative. I'll change this PR to simply add an exception to the test for AIX.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7525