RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

Jaikiran Pai jpai at openjdk.org
Thu Mar 13 11:06:07 UTC 2025


On Wed, 12 Mar 2025 18:22:57 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>>> This seems mostly used to determine which ZipCoder to pick. Could we fold this into the method, calling it zipCoderForPos and make it just return the ZipCoder, like we had in Source?
>> 
>> I intentionally removed the `zipCoderForPos()` method and instead introduced this static method to avoid the case where the code ends up calling this method and then stores the returned `ZipCoder` (like was being done in `Source`). Instead with this new method, it now allows the call sites to determine if a UTF8 `ZipCoder` is needed or a non-UTF8 one and then decide where to get the non-UTF8 `ZipCoder` from. If the call site is a instance method in `ZipFile`, then it was use the `ZipFile`'s `zipCoder` field and if the call site is within `Source`, when the `Source` is being instantiated then it constructs a non-UTF8 `ZipCoder` afresh. That level of detail is better dealt at the call site instead of a method like `zipCoderForPos()`.
>
> Hello Jaikiran,
> 
> I'm not convinced dropping `zipCoderForPos` is a step forward:
> 
> * `zipCoderForPos` included a short-circuit optimization logic which skipped reading the CENFLG field in the case where the opening charset was UTF-8 (which it commonly is, and always for JarFile). This optimization seems lost in the currrent state of this PR and could impact lookup performance. It would be strange to repeat this optimization at all call sites.
> * The only thing differing between callsites seems to be which ZipCoder to use as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly be passed in as a parameter, and will de-duplicate logic at the call sites.
> * The only "cumbersome" call site seems to be `initCEN`, caused by the lazy initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems questionable: First, ZipCoder is already lazy in the way it initializes encoders and decoders. An unused ZipCoder is just a thin wrapper around a Charset and should not incur significant cost until it gets used. Second, the `ZipFile` constructor actually constructs a `ZipCoder` instance, would it not be better to just pass this down to initCEN as a parameter? That would avoid the cost of initializing encoders and decoders twice for the common case of single-threaded `ZipFile` access, once for initCEN, then again on the first lookup.  
> 
> Here's a patch for a quick implementation of the above on top of your PR. (Code speeks better than words some times :-)
> 
> This retains the CENFLG optimization, simplifies logic at call sites and prevents duplicated initialization of ZipCoder beteween initCEN and lookups:
> 
> 
> Index: src/java.base/share/classes/java/util/zip/ZipFile.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
> --- a/src/java.base/share/classes/java/util/zip/ZipFile.java	(revision 935b04ed00522aa92105292baa0693c55b1356ae)
> +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java	(date 1741800197915)
> @@ -201,8 +201,8 @@
>          this.fileName = file.getName();
>          long t0 = System.nanoTime();
>  
> -        this.res = new CleanableResource(this, charset, file, mode);
>          this.zipCoder = ZipCoder.get(charset);
> +        this.res = new CleanableResource(this, charset, zipCoder, file, mode);
>  
>          PerfCounter...

Hello Eirik,

> zipCoderForPos included a short-circuit optimization logic which skipped reading the CENFLG field in the case where the opening charset was UTF-8 (which it commonly is, and always for JarFile). This optimization seems lost in the currrent state of this PR

That's a good point. I'll update this PR shortly, borrowing from your proposed diff/patch.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r1993283080


More information about the core-libs-dev mailing list