RFR: 8322802: Add testing for ZipFile.getEntry respecting the 'Language encoding' flag [v3]

Tue Jan 2 07:45:48 UTC 2024

On Mon, 1 Jan 2024 14:29:50 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> Please review this test-only PR which adds test coverage for `ZipFile.getEntry` under certain charset conditions. 
>> 
>> When `ZipFile.getEntry` is called for an entry which has the `Language encoding flag` general purpose bit flag set,  then `ZipCoder.UTF8` is used unconditionally, even when a different charset was supplied to the `ZipFile` constructor.
>> 
>> It turns out we do not have any testing for this particular case, as can be verified by commenting out the following line of code in `ZipFile.Source.getEntryPos`:
>> 
>> 
>> //ZipCoder zc = zipCoderForPos(pos);
>> ``` 
>> 
>> and then running `make test TEST="test/jdk/java/util/zip"`
>> 
>> The current test verifies that the correct ZipCoder is used by `ZipFile.entries()`, but does not exercise `ZipFile.getEntry` the same way.
>> 
>> Seeing that [JDK-7009069](https://bugs.openjdk.org/browse/JDK-7009069) was (accidentally?) fixed by [JDK-8243469](https://bugs.openjdk.org/browse/JDK-8243469), I think it is worthwhile to add explicit testing for this case to avoid regressions.
>> 
>> While visiting `ZipCoding.java`, I took the opportunity to convert it to JUnit 5. The conversion and modernization of the code is done in the first commit 1384850ed51ec845af06dd6d13616f20f8bbaa6a in this PR, while the second commit 1776b258b0fe8383709ae0c095f2631a4e6237f6 actually adds the code required to verify the `Language encoding flag` condition for `ZipFile.getEntry`.
>> 
>> Testing: Verified that the test indeed fails when `ZipFile.Source.getEntryPos` is updated to use the ZipFile's ZipCoder as suggested above.
>
> Eirik Bjørsnøs has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add @bug tag for 8322802

test/jdk/java/util/zip/ZipCoding.java line 86:

> 84:                 // ZipOutputStream sets the 'Language encoding flag' when writing using UTF-8
> 85:                 // UTF-8 should be used for decoding, even when opening with a different charset
> 86:                 Arguments.of("utf-8", "iso-8859-1",

Hello Eirik, there are 3 streams of arguments above which use "utf-8" for writing out the entry. All those other 3 streams of arguments, use "utf-8" for reading too (which is OK). For each of those 3 streams of arguments, could you also add another argument stream which uses non-utf8 charset for reading, like you do here? I think that would give this a bit more coverage, especially for cases like the surrogates in comments and entry names.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17207#discussion_r1439211686