RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v11]

Wed Jan 10 13:23:44 UTC 2024

> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the number of compressed or uncompressed bytes read from the inflater is larger than the Zip64 magic value.
> 
> While the ZIP format  mandates that the data descriptor `SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it also states that `ZIP64 format MAY be used regardless of the size of a file`. For such small entries, the above assumption does not hold.
> 
> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the ZipEntry includes a Zip64 extra information field. This brings ZipInputStream into alignment with the APPNOTE format spec:
> 
> 
> When extracting, if the zip64 extended information extra 
> field is present for the file the compressed and 
> uncompressed sizes will be 8 byte values.
> 
> 
> While small Zip64 files with 8-byte data descriptors are not commonly found in the wild, it is possible to create one using the Info-ZIP command line `-fd` flag:
> 
> `echo hello | zip -fd > hello.zip`
> 
> The PR also adds a test verifying that such a small Zip64 file can be parsed by ZipInputStream.

Eirik Bjørsnøs has updated the pull request incrementally with eight additional commits since the last revision:

 - Minor tweaks to improve comments
 - Tighten 64-bit data descriptor checking by requiring that the LOC's 'compressed size' and 'uncompressed size' fields must both be 0xFFFFFFFF and that the Zip64 field must have both 'Original Size' and 'Compressed' size fields present and set to zero.
 - Add test verifying that the data descriptor is read with 32-bit values if neither the 'compressed size' or 'uncompressed size' are set to the Zip64 magic marker value.
 - Rename hasZip64Extra to expect64BitDataDescriptor, make it return false if the LOC is not in streaming mode, if none of the LOC size fields have the Zip64 magic value set, or if the Zip64 data block size does not match the size computed from looking for markers in the LOC fields.
 - Move the call to hasZip64Extra from readEND to readLOC
 - Add a test verifying that a truncated Zip64 field (one with less than 4 bytes) is ignored and does not cause a ArrayIndexOutOfBoundsException.
 - When verifying that invalid Zip64 fields are ignored, use a separate ZIP with regular 4-bit data descriptors.
 - Avoid ArrayIndexOutOfBoundsException in the case were the LOC extra field ends with a truncated Zip64 field

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/12524/files
  - new: https://git.openjdk.org/jdk/pull/12524/files/900c5b19..8a44feb5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=12524&range=10
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12524&range=09-10

  Stats: 196 lines in 2 files changed: 125 ins; 17 del; 54 mod
  Patch: https://git.openjdk.org/jdk/pull/12524.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/12524/head:pull/12524

PR: https://git.openjdk.org/jdk/pull/12524