RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v4]

Eirik Bjørsnøs eirbjo at openjdk.org
Thu Mar 13 19:51:57 UTC 2025


On Wed, 12 Mar 2025 06:50:08 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Can I please get a review of this change which proposes to fix an issue `java.util.zip.ZipFile` which would cause failures when multiple instances of `ZipFile` using non-UTF8 `Charset` were operating against the same underlying ZIP file? This addresses https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to indicate that the entry name and comment are encoded using UTF8. A `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This `Charset` (which defaults to UTF-8) gets used for decoding entry names and comments for non-UTF8 entries.
>> 
>> The internal implementation of `ZipFile` uses a `ZipCoder` (backed by `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe.
>> 
>> The internal implementation of `ZipFile` maintains a cache of `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and during construction, uses a `ZipCoder` for parsing the ZIP entries and once constructed holds on to the parsed ZIP structure. Multiple instances of a `ZipFile` which all correspond to the same ZIP file on the filesystem, share a single instance of `Source` (after the `Source` has been constructed and cached). Although `ZipFile` instances aren't expected to be thread-safe, the fact that multiple different instances of `ZipFile` could be sharing the same instance of `Source` in concurrent threads, mandates that the `Source` must be thread-safe.
>> 
>> In Java 15, we did a performance optimization through https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we started holding on to the `ZipCoder` instance (corresponding to the `Charset` provided during `ZipFile` construction) in the `Source`. This stored `ZipCoder` was then used for `ZipFile` operations when working with the ZIP entries. As noted previously, any non-UTF8 `ZipCoder` is not thread-safe and as a result, any usages of `ZipCoder` in the `Source` makes `Source` not thread-safe too. That effectively violates the requirement that `Source` must be thread-safe to allow for its usage in multiple different `ZipFile` instances concurrently. This then causes `ZipFile` usages to fail in unexpected ways like the one shown in the linked https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> The commit in this PR addresses the issue by not maintaining `ZipCoder` as a instance field of `Source`. Instead the `ZipCoder` is now mainta...
>
> Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add code comment

I made a pass over the test, adding some comments.

test/jdk/java/util/zip/ZipFile/ZipFileSharedSourceTest.java line 78:

> 76: 
> 77:     /**
> 78:      * Creates multiple instances of java.util.zip.ZipFile with the given {@code entryNameCharset}

This comment could be interpreted as saying that the construction of the ZipFile instances happens up-front, on the main test thread. However, looking at the actual code, the construction of the ZipFile instance AND the enumeration happens concurrently.

Perhaps we could trim the "how" part here, and instad just say that the test verifies that concurrent construction and enumeration of  java.util.zip.ZipFile instances backed by a the same ZIP file does not cause unexpected encoding-related concurrency failures?

test/jdk/java/util/zip/ZipFile/ZipFileSharedSourceTest.java line 86:

> 84:     @ParameterizedTest
> 85:     @MethodSource("charsets")
> 86:     void testMultipleZipFileInstances(final Charset entryNameCommentCharset) throws Exception {

The Charset parameter could be renamed to `charset`.

test/jdk/java/util/zip/ZipFile/ZipFileSharedSourceTest.java line 111:

> 109:         private final CountDownLatch startLatch;
> 110: 
> 111:         private ZipEntryIteratingTask(final Path file, final Charset entryNameCharset,

The Charset parameter/field could be renamed to `charset`.

test/jdk/java/util/zip/ZipFile/ZipFileSharedSourceTest.java line 127:

> 125:             try (final ZipFile zf = new ZipFile(this.file.toFile(), this.entryNameCharset)) {
> 126:                 final var entries = zf.entries();
> 127:                 while (entries.hasMoreElements()) {

I think it would be good to include a call to `ZipFile::getEntry` here as well, since that exercises a separate code path.

-------------

PR Review: https://git.openjdk.org/jdk/pull/23986#pullrequestreview-2683167953
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1994186951
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1994208216
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1994206617
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1994204441


More information about the core-libs-dev mailing list