RFR: 8322802: Add testing for ZipFile.getEntry respecting the 'Language encoding' flag [v4]

Alan Bateman alanb at openjdk.org
Tue Jan 2 12:13:49 UTC 2024


On Tue, 2 Jan 2024 09:31:16 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> Please review this test-only PR which adds test coverage for `ZipFile.getEntry` under certain charset conditions. 
>> 
>> When `ZipFile.getEntry` is called for an entry which has the `Language encoding flag` general purpose bit flag set,  then `ZipCoder.UTF8` is used unconditionally, even when a different charset was supplied to the `ZipFile` constructor.
>> 
>> It turns out we do not have any testing for this particular case, as can be verified by commenting out the following line of code in `ZipFile.Source.getEntryPos`:
>> 
>> 
>> //ZipCoder zc = zipCoderForPos(pos);
>> ``` 
>> 
>> and then running `make test TEST="test/jdk/java/util/zip"`
>> 
>> The current test verifies that the correct ZipCoder is used by `ZipFile.entries()`, but does not exercise `ZipFile.getEntry` the same way.
>> 
>> Seeing that [JDK-7009069](https://bugs.openjdk.org/browse/JDK-7009069) was (accidentally?) fixed by [JDK-8243469](https://bugs.openjdk.org/browse/JDK-8243469), I think it is worthwhile to add explicit testing for this case to avoid regressions.
>> 
>> While visiting `ZipCoding.java`, I took the opportunity to convert it to JUnit 5. The conversion and modernization of the code is done in the first commit 1384850ed51ec845af06dd6d13616f20f8bbaa6a in this PR, while the second commit 1776b258b0fe8383709ae0c095f2631a4e6237f6 actually adds the code required to verify the `Language encoding flag` condition for `ZipFile.getEntry`.
>> 
>> Testing: Verified that the test indeed fails when `ZipFile.Source.getEntryPos` is updated to use the ZipFile's ZipCoder as suggested above.
>
> Eirik Bjørsnøs has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add more cases for 'language encoding' bit set, opened with a different encoding

> The change to allow user/application specific arbritary charsets to the `ZipFile` constructor seems to have been done long back in Java 1.7 days as part of JDK-4244499.

There is a lot of history in this area. ZIP dates from the days of MS-DOS where it used IBM 437 for encoding the names of entries. So different to Java where it uses UTF-8 for JAR files and also non-JAR ZIP files. Up to this point (as in JDK 7) there were also issues with the UTF-8 decoding and some forms of supplementary characters. Sherman got things to a good place in JDK 7 and also added the constructors so you can specify the encoding when you obtain it from some out of band means.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17207#issuecomment-1873950701


More information about the core-libs-dev mailing list