RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v7]

Lance Andersen lancea at openjdk.org
Wed Apr 30 13:49:59 UTC 2025


On Wed, 30 Apr 2025 12:02:25 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Can I please get a review of this change which proposes to fix an issue `java.util.zip.ZipFile` which would cause failures when multiple instances of `ZipFile` using non-UTF8 `Charset` were operating against the same underlying ZIP file? This addresses https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to indicate that the entry name and comment are encoded using UTF8. A `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This `Charset` (which defaults to UTF-8) gets used for decoding entry names and comments for non-UTF8 entries.
>> 
>> The internal implementation of `ZipFile` uses a `ZipCoder` (backed by `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe.
>> 
>> The internal implementation of `ZipFile` maintains a cache of `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and during construction, uses a `ZipCoder` for parsing the ZIP entries and once constructed holds on to the parsed ZIP structure. Multiple instances of a `ZipFile` which all correspond to the same ZIP file on the filesystem, share a single instance of `Source` (after the `Source` has been constructed and cached). Although `ZipFile` instances aren't expected to be thread-safe, the fact that multiple different instances of `ZipFile` could be sharing the same instance of `Source` in concurrent threads, mandates that the `Source` must be thread-safe.
>> 
>> In Java 15, we did a performance optimization through https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we started holding on to the `ZipCoder` instance (corresponding to the `Charset` provided during `ZipFile` construction) in the `Source`. This stored `ZipCoder` was then used for `ZipFile` operations when working with the ZIP entries. As noted previously, any non-UTF8 `ZipCoder` is not thread-safe and as a result, any usages of `ZipCoder` in the `Source` makes `Source` not thread-safe too. That effectively violates the requirement that `Source` must be thread-safe to allow for its usage in multiple different `ZipFile` instances concurrently. This then causes `ZipFile` usages to fail in unexpected ways like the one shown in the linked https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> The commit in this PR addresses the issue by not maintaining `ZipCoder` as a instance field of `Source`. Instead the `ZipCoder` is now mainta...
>
> Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision:
> 
>  - Eirik's review about code comments
>  - fix comment typo
>  - merge latest from master branch
>  - 8355975: introduce a test for 8355975
>  - merge latest from master branch
>  - merge latest from master branch
>  - merge latest from master branch
>  - merge latest from master branch
>  - merge latest from master branch
>  - improve code comment for ZipFile.zipCoder
>  - ... and 11 more: https://git.openjdk.org/jdk/compare/c0f4b933...9a29b960

Hi Jai,

A few minor comments on the last set of changes.  Going to make another pass through the previous changes.

test/jdk/java/util/zip/ZipFile/ZipFileCharsetTest.java line 59:

> 57:      * A ZipFile can be constructed by passing a Charset which will be used to
> 58:      * decode the entry names (and comment) in a ZIP file.
> 59:      * The test here verifies that when multiple ZipFile instances are

Minor nit

'The test here verifies' -> "The test verifies'

test/jdk/java/util/zip/ZipFile/ZipFileCharsetTest.java line 68:

> 66:         // ISO-8859-15 is not a standard charset in Java. We skip this test
> 67:         // when it is unavailable
> 68:         assumeTrue(Charset.availableCharsets().containsKey(ISO_8859_15_NAME),

I would suggest throwing SkippedException otherwise junit throws org.opentest4j.TestAbortedException  If I understand correctly

-------------

PR Review: https://git.openjdk.org/jdk/pull/23986#pullrequestreview-2807011149
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2068641256
PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2068637944


More information about the core-libs-dev mailing list