RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Tue Jan 9 16:11:29 UTC 2024

On Mon, 8 Jan 2024 15:43:34 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

> The crucial part here is that the "current" `ZipEntry` is the one which was created by a previous call to `ZipInputStream.getNextEntry()` method, which is a public overridable method. Furthermore, the `ZipEntry.getExtra()` method too is overridable. So in theory, the byte[] data returned by the `ZipEntry.getExtra()` isn't controlled or validated and can be anything that the overridden method might return. 

Yes, we cannot rely on the extra field contents to be well-formed and valid. Because of this, `hasZip64Extra` must be coded in a defensive fashion, meaning it returns true only when it finds a valid Zip64 extra field, false otherwise. It should also never throw exceptions. This is the case regardless of ZipEntry concerns, since the input ZIP data may already be malformed.

> The bigger question however is, should `readEnd()` now be entrusted the responsibility of reading the `ZipEntry.getExtra()` byte[] contents that can be anything and even if the content of the byte[] seem to represent a zip64 entry (i.e. the newly introduced `hasZip64Extra()` method returning `true`), should that decision be trusted to further lead the `readEnd()` method to read the compressed/uncompressed sizes in data descriptor as 8 bytes? Would this need additional checks of some form?

To avoid readEND relying on "untrusted" ZipEntry contents, I moved the call to `hasZip64Extra` from `readEND` to `readLOC`. This allows `hasZip64Extra` to work on the raw extra data.

 For this to work, I needed to add a new internal boolean field `ZipInputStream.expectEightBitDataDescriptor`. When true, this flag indicates to `readEND` that the next data descriptor has 8-byte size fields.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1883339627