RFR: 8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset [v2]

Wed Mar 19 14:37:34 UTC 2025

On Thu, 13 Mar 2025 11:03:00 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Hello Jaikiran,
>> 
>> I'm not convinced dropping `zipCoderForPos` is a step forward:
>> 
>> * `zipCoderForPos` included a short-circuit optimization logic which skipped reading the CENFLG field in the case where the opening charset was UTF-8 (which it commonly is, and always for JarFile). This optimization seems lost in the currrent state of this PR and could impact lookup performance. It would be strange to repeat this optimization at all call sites.
>> * The only thing differing between callsites seems to be which ZipCoder to use as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly be passed in as a parameter, and will de-duplicate logic at the call sites.
>> * The only "cumbersome" call site seems to be `initCEN`, caused by the lazy initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems questionable: First, ZipCoder is already lazy in the way it initializes encoders and decoders. An unused ZipCoder is just a thin wrapper around a Charset and should not incur significant cost until it gets used. Second, the `ZipFile` constructor actually constructs a `ZipCoder` instance, would it not be better to just pass this down to initCEN as a parameter? That would avoid the cost of initializing encoders and decoders twice for the common case of single-threaded `ZipFile` access, once for initCEN, then again on the first lookup.  
>> 
>> Here's a patch for a quick implementation of the above on top of your PR. (Code speeks better than words some times :-)
>> 
>> This retains the CENFLG optimization, simplifies logic at call sites and prevents duplicated initialization of ZipCoder beteween initCEN and lookups:
>> 
>> 
>> Index: src/java.base/share/classes/java/util/zip/ZipFile.java
>> IDEA additional info:
>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> <+>UTF-8
>> ===================================================================
>> diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
>> --- a/src/java.base/share/classes/java/util/zip/ZipFile.java	(revision 935b04ed00522aa92105292baa0693c55b1356ae)
>> +++ b/src/java.base/share/classes/java/util/zip/ZipFile.java	(date 1741800197915)
>> @@ -201,8 +201,8 @@
>>          this.fileName = file.getName();
>>          long t0 = System.nanoTime();
>>  
>> -        this.res = new CleanableResource(this, charset, file, mode);
>>          this.zipCoder = ZipCoder.get(charset);
>> +        this.res = new CleanableResource(this...
>
> Hello Eirik,
> 
>> zipCoderForPos included a short-circuit optimization logic which skipped reading the CENFLG field in the case where the opening charset was UTF-8 (which it commonly is, and always for JarFile). This optimization seems lost in the currrent state of this PR
> 
> That's a good point. I'll update this PR shortly, borrowing from your proposed diff/patch.

Hello Eirik, I have now updated the PR to borrow a major part of the diff that you provided. This now brings back the `zipCoderFor()` method and at the same time, we don't end up holding on to the `ZipCoder` in the the `Source` class, which is expected to be thread safe.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23986#discussion_r2003488113