RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]
Jaikiran Pai
jpai at openjdk.org
Wed Apr 30 12:11:53 UTC 2025
On Sun, 23 Mar 2025 10:45:28 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:
>> Hello Jaikiran,
>>
>> Before this change, two ZipFile instances opened using different (non-UTF-8) charsets would have equal keys, and thus be backed by the same Source and ZipCoder instance, whichever ZipFile constructed first would "win".
>>
>> This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.
>>
>> For the benefit of future maintainers, I think this independent bug should be described in a separate JBS issue.
>>
>> The bug could be solved in a separate PR, however I feel it's also ok to fix it in this PR, since moving the ZipCoder instance to ZipFile seems to incidentally solve the issue as well.
>>
>> WDYT?
>
>> This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.
>
> Consider the following test, which when added to `ZipCoding.java` fails in master but succeeds in this PR:
>
>
> /**
> * Verifies that opening a ZipFile with an incorrect charset does not
> * prevent a later opening of the same file using the correct charset.
> *
> * This may happen if the two ZipFile instances have identical
> * ZipFile.Source.Key, thus mapping to the same Source instance.
> *
> * @throws IOException if an unexpected I/O error occurs
> */
> @Test
> public void differentNonUTF8Charsets() throws IOException {
> // Using ISO-8859-15 and ISO-8859-1 here, since they are similar
> // encodings with different characters for some bytes
>
> // The byte 0xA4 is "Euro sign" in 8859-15, "Currency sign (generic)" in 8859-1
> String euroSign = "\u20AC";
> String currencySign = "\u00A4";
>
> // Create a ZIP, encoded in ISO-8859-15
> byte[] zip = createZIP("ISO-8859-15", euroSign, euroSign);
> Path f = Path.of("zf-non-utf.zip");
> Files.write(f, zip);
>
> // Open a ZipFile instance using (incorrect) encoding ISO-8859-1
> // Then open a ZipFile instance using (correct) encoding ISO-8859-15
> try (ZipFile incorrect = new ZipFile(f.toFile(), StandardCharsets.ISO_8859_1);
> ZipFile correct = new ZipFile(f.toFile(), Charset.forName("ISO-8859-15"))) {
>
> // Correct encoding should resolve euro sign
> assertNotNull(correct.getEntry(euroSign));
> // Correct encoding should not resolve currency sign
> assertNull(correct.getEntry(currencySign));
>
> // Incorrect encoding should resolve currency sign
> assertNotNull(incorrect.getEntry(currencySign));
> // Incorrect encoding should not resolve euro sign
> assertNull(incorrect.getEntry(euroSign));
> }
> }
Hello Eirik,
> Before this change, two ZipFile instances opened using different (non-UTF-8) charsets would have equal keys, and thus be backed by the same Source and ZipCoder instance, whichever ZipFile constructed first would "win".
> This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.
> For the benefit of future maintainers, I think this independent bug should be described in a separate JBS issue.
That's right - when fixing 8347712 I saw this issue with `Key` class and addressed it in this PR. You have a good point that having a JBS issue specifically for this will be useful. I have now created https://bugs.openjdk.org/browse/JDK-8355975 and linked that issue in this current PR. What helped was your test case example that uses `ISO-8859-15` and `ISO-8859-1` charsets to trigger this issue. I wasn't aware of that difference between `ISO-8859-15` and `ISO-8859-1` charsets. Thank you, I've now added that test in this PR.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2068528308
More information about the core-libs-dev
mailing list