RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Mon Jan 8 15:46:27 UTC 2024

On Mon, 8 Jan 2024 14:47:48 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>  With the current proposed change, we not only rely on the Inflater for this decision but also rely on ZipEntry itself to tell us whether to read 8 bytes or 4 bytes each.

The proposed change resides solely in one single internal/private method called `readEnd(ZipEntry e)` of the `java.util.zip.ZipInputStream`. This method gets called in the internal implementation of the `ZipInputStream` when the contents of a "current" `ZipEntry` are being read and this `readEnd()` method gets passed that "current" `ZipEntry`. The passed `ZipEntry`'s `getExtra()` method is then called by the new proposed change to get hold of the byte[] representing the extra blocks of the zip entry and that byte[] is parsed to decide whether we need to read 8 bytes for compressed/uncompressed fields in the data descriptor. It's important to note that before this proposed change, this `readEnd()` method doesn't read anything out of the passed `ZipEntry` and instead only updates it, so before this change, the `ZipEntry` itself doesn't play a role in what data this `readEnd()` method processes.

The crucial part here is that the "current" `ZipEntry` is the one which was created by a previous call to `ZipInputStream.getNextEntry()` method, which is a public overridable method. Furthermore, the `ZipEntry.getExtra()` method too is overridable. So in theory, the byte[] data returned by the `ZipEntry.getExtra()` isn't controlled or validated and can be anything that the overridden method might return. So at a minimum, the newly introduced `hasZip64Extra()` private method which parses this byte[] shouldn't expect this to be parsable and thus should handle any exceptions this might generate and thus the recommendation of catching those exceptions and returning `false`.

The bigger question however is, should `readEnd()` now be entrusted the responsibility of reading the `ZipEntry.getExtra()` byte[] contents that can be anything and even if the content of the byte[] seem to represent a zip64 entry (i.e. the newly introduced `hasZip64Extra()` method returning `true`), should that decision be trusted to further lead the `readEnd()` method to read the compressed/uncompressed sizes in data descriptor as 8 bytes? Would this need additional checks of some form?

Alan and Lance have more experience in this area and I would prefer hearing their thoughts.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1881275708