RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v5]

Jaikiran Pai jpai at openjdk.org
Wed Apr 30 15:16:50 UTC 2025


On Sun, 23 Mar 2025 08:55:33 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> src/java.base/share/classes/java/util/zip/ZipFile.java line 384:
>> 
>>> 382:      * @param fallback the fallback ZipCoder to return if the entry doesn't require UTF-8
>>> 383:      */
>>> 384:     private static ZipCoder zipCoderFor(final byte[] cen, final int pos, final ZipCoder fallback) {
>> 
>> Have you tried putting an instance method on ZipFile instead as it has access to the zip coder. That would give you better abstraction and avoid needing to introduce "fallback" as that is confusing to see here.
>
> I agree with Alan that an instance method would provide better abstraction and avoid the somewhat clumsy "fallback" parameter introduced in my previous patch.
> 
> Here's a patch exploring the instance method:
> 
> 
> Index: src/java.base/share/classes/java/util/zip/ZipFile.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
> --- a/src/java.base/share/classes/java/util/zip/ZipFile.java	(revision 83ac59fcd292c21bbff4e0ac243a6bf07b4b21dc)
> +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java	(date 1742719028475)
> @@ -202,7 +202,7 @@
>          long t0 = System.nanoTime();
>  
>          this.zipCoder = ZipCoder.get(charset);
> -        this.res = new CleanableResource(this, zipCoder, file, mode);
> +        this.res = new CleanableResource(this, file, mode);
>  
>          PerfCounter.getZipFileOpenTime().addElapsedTimeFrom(t0);
>          PerfCounter.getZipFileCount().increment();
> @@ -292,7 +292,7 @@
>              // Look up the name and CEN header position of the entry.
>              // The resolved name may include a trailing slash.
>              // See Source::getEntryPos for details.
> -            EntryPos pos = res.zsrc.getEntryPos(name, true, zipCoder);
> +            EntryPos pos = res.zsrc.getEntryPos(name, true, this);
>              if (pos != null) {
>                  entry = getZipEntry(pos.name, pos.pos);
>              }
> @@ -332,7 +332,7 @@
>              if (Objects.equals(lastEntryName, entry.name)) {
>                  pos = lastEntryPos;
>              } else {
> -                EntryPos entryPos = zsrc.getEntryPos(entry.name, false, zipCoder);
> +                EntryPos entryPos = zsrc.getEntryPos(entry.name, false, this);
>                  if (entryPos != null) {
>                      pos = entryPos.pos;
>                  } else {
> @@ -374,26 +374,25 @@
>       * <p>
>       * A ZIP entry's name and comment fields may be encoded using UTF-8, in
>       * which case this method returns a UTF-8 capable {@code ZipCoder}. If the
> -     * entry doesn't require UTF-8, then this method returns the {@code fallback}
> -     * {@code ZipCoder}.
> +     * entry doesn't require UTF-8, then this method returns the
> +     * {@code ZipCoder} of the ZipFile.
>       *
>       * @param cen the CEN
>       * @param pos the ZIP entry's position in CEN
> -     * @param fallback the fallbac...

Hello Alan,

> Have you tried putting an instance method on ZipFile instead as it has access to the zip coder. That would give you better abstraction and avoid needing to introduce "fallback" as that is confusing to see here.

When I experimented with a fix for this issue, several weeks back, my initial version included a variant of what Eirik shows as a diff in his comment. I decided to shelve that in favour of what I currently have in my PR. The root of the issue is that we use the `ZipCoder` at two categorically different moments in the lifecycle of a `ZipFile` instance. Once when the `ZipFile` is still being constructed and once after the `ZipFile` instance is fully constructed.

In the constructor of the `ZipFile`, the code leads to the static `Source` class. A `ZipCoder` that's an instance member/method of `ZipFile` can be used but that would then mean that various code paths that spawn out of the `Source` class, while parsing the CEN, may end up operating on a `ZipFile` instance (while obtaining the `ZipCoder`) that isn't fully constructed yet. It would not be immediately obvious while reading or updating such code that the `ZipFile` instance may not yet have been fully constructed. So to reduce such usage I decided to pursue this alternate way where using these static methods would prevent such access to the `ZipFile` instance. It has it's own drawbacks like you note about the confusing method parameters. But I felt that this is better of the two approaches.

If you think we should consider the instance method approach, then let me know and I'll experiment more and see if any issues related to that can be minimized.

Maybe in some larger future work this entire code path can be redone in a manner that a ZipFile instance that isn't yet constructed, doesn't have such tight integration with the Source/cache class.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2068893458


More information about the core-libs-dev mailing list