RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

Jaikiran Pai jpai at openjdk.org
Wed Apr 30 12:11:53 UTC 2025


On Sun, 23 Mar 2025 10:45:28 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> Hello Jaikiran,
>> 
>> Before this change, two ZipFile instances opened using different (non-UTF-8) charsets would have equal keys, and thus be backed by the same Source and ZipCoder instance, whichever ZipFile constructed first would "win".
>> 
>> This seems like a separate bug,  independent of the concurrency concerns described  JDK-8347712.
>> 
>> For the benefit of future maintainers, I think this independent bug should be described in a separate JBS issue.
>> 
>> The bug could be solved in a separate PR, however I feel it's also ok to fix it in this PR, since moving the ZipCoder instance to ZipFile seems to incidentally solve the issue as well.
>> 
>> WDYT?
>
>> This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.
> 
> Consider the following test, which when added to `ZipCoding.java` fails in master but succeeds in this PR:
> 
> 
> /**
>  * Verifies that opening a ZipFile with an incorrect charset does not
>  * prevent a later opening of the same file using the correct charset.
>  *
>  * This may happen if the two ZipFile instances have identical
>  * ZipFile.Source.Key, thus mapping to the same Source instance.
>  *
>  * @throws IOException if an unexpected I/O error occurs
>  */
> @Test
> public void differentNonUTF8Charsets() throws IOException {
>     // Using ISO-8859-15 and ISO-8859-1 here, since they are similar
>     // encodings with different characters for some bytes
> 
>     // The byte 0xA4 is "Euro sign" in 8859-15, "Currency sign (generic)" in 8859-1
>     String euroSign = "\u20AC";
>     String currencySign = "\u00A4";
> 
>     // Create a ZIP, encoded in ISO-8859-15
>     byte[] zip = createZIP("ISO-8859-15", euroSign, euroSign);
>     Path f = Path.of("zf-non-utf.zip");
>     Files.write(f, zip);
> 
>     // Open a ZipFile instance using (incorrect) encoding ISO-8859-1
>     // Then open a ZipFile instance using (correct) encoding ISO-8859-15
>     try (ZipFile incorrect = new ZipFile(f.toFile(), StandardCharsets.ISO_8859_1);
>          ZipFile correct = new ZipFile(f.toFile(), Charset.forName("ISO-8859-15"))) {
> 
>         // Correct encoding should resolve euro sign
>         assertNotNull(correct.getEntry(euroSign));
>         // Correct encoding should not resolve currency sign
>         assertNull(correct.getEntry(currencySign));
> 
>         // Incorrect encoding should resolve currency sign
>         assertNotNull(incorrect.getEntry(currencySign));
>         // Incorrect encoding should not resolve euro sign
>         assertNull(incorrect.getEntry(euroSign));
>     }
> }

Hello Eirik,

> Before this change, two ZipFile instances opened using different (non-UTF-8) charsets would have equal keys, and thus be backed by the same Source and ZipCoder instance, whichever ZipFile constructed first would "win".
> This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.
> For the benefit of future maintainers, I think this independent bug should be described in a separate JBS issue.

That's right - when fixing 8347712 I saw this issue with `Key` class and addressed it in this PR. You have a good point that having a JBS issue specifically for this will be useful. I have now created https://bugs.openjdk.org/browse/JDK-8355975 and linked that issue in this current PR. What helped was your test case example that uses `ISO-8859-15` and `ISO-8859-1` charsets to trigger this issue. I wasn't aware of that difference between `ISO-8859-15` and `ISO-8859-1` charsets. Thank you, I've now added that test in this PR.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2068528308


More information about the core-libs-dev mailing list