<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hello Eirik,</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
I strongly agree with your proposal. I see such a change has low risk given ZipCoder is an internal class.</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Regards,</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Chen</div>
<div style="font-family: "Calibri Light", "Helvetica Light", sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<hr style="display: inline-block; width: 98%;">
<div id="divRplyFwdMsg">
<div style="direction: ltr; font-family: Calibri, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<b>From:</b> core-libs-dev <core-libs-dev-retn@openjdk.org> on behalf of Eirik Bjørsnøs <eirbjo@gmail.com><br>
<b>Sent:</b> Wednesday, January 28, 2026 3:26 AM<br>
<b>To:</b> core-libs-dev <core-libs-dev@openjdk.org><br>
<b>Subject:</b> RFD: Reorganize ZipCoder such that UTF8 is handled by the base class</div>
<div style="direction: ltr;"> </div>
</div>
<div style="direction: ltr;">Hi,</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Bringing this up on core-libs-dev such that the motivation can be explained/discussed here and any future PR can focus on actual code changes.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Summary:</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Reorganize the ZipCoder class hierarchy to let the base class handle UTF8 and the subclass handle arbitrary Charsets. This makes the design better match the ZIP specification and how ZIP files are used in the real world and additionally
have some benefits in code quality and performance.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Motivation:</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">The ZipCoder class has been central to many ZipFile performance improvements in recent years. Many optimizations are encoding-specific and encapsulating these concerns makes a lot of sense.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Currently, the base ZipCoder instance supports any given Charset. Then, a subclass UTF8ZipCoder provides higher performance optimizations specific to UTF-8.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">However, real-world use of the ZipFile API defaults to UTF-8. The ZIP specification long-ago introduced a flag to explicitly indicate that entries are encoded using UTF-8. The JAR specification has mandated UTF-8 since the beginning.
Any use of non-UTF-8 ZIP files is increasingly niche and belongs in the legacy zone.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">The current UTF8ZipCoder is stateless and documented as thread safe, while the base class ZipCoder is not. As a subclass of ZipCode, UTF8ZipCoder does however inherit CharsetEncoder and CharsetDecoder state fields from its super
class and it needs to pass a UTF8 Charset to its parent, without really using it. This makes state and thread safety harder to reason about.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Since UTF8ZipCoder is always needed, the JVM must always load it along with the base class ZipCoder. Apart from loading an extra class, this prevents the JVM from seeing calls to ZipCoder methods as monomorphic.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">A draft implementation of this change indicates a ~3% performance win on ZipFile lookups in ZipFileGetEntry, probably explained by the compiler seeing only one instance of ZipCoder being loaded.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Solution:</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Switch the class hierarchy of ZipCoder around such that the base class handles UTF-8. Introduce a new subclass CharsetZipCoder to handle legacy non-UTF ZIP files. Move the Charset, CharsetEncoder, CharsetDecoder fields to this subclass.
Update code comments to reflect the changes.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Risks:</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">This should be a pure refactoring, mostly moving code around. Most changes can be performed in-place, such that side by side review will mostly reflect indentation changes. We have good test coverage for UTF8 and non-UTF-8 ZIP files
to help us catch regressions.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">If I see support for this proposal, I'll be happy to submit a PR with the actual changes.</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;">Cheers,</div>
<div style="direction: ltr;">Eirik :-)</div>
<div style="direction: ltr;"> </div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;"><br>
</div>
<div style="direction: ltr;"><br>
</div>
<div><br>
<div style="font-family: Calibri; text-align: left; color: rgb(0, 0, 0); margin-left: 5pt; font-size: 10pt;">
Confidential- Oracle Internal</div>
</div>
</body>
</html>