RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v12]

Jaikiran Pai jpai at openjdk.org
Tue Jan 16 13:56:30 UTC 2024


On Wed, 10 Jan 2024 13:39:52 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the number of compressed or uncompressed bytes read from the inflater is larger than the Zip64 magic value.
>> 
>> While the ZIP format  mandates that the data descriptor `SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it also states that `ZIP64 format MAY be used regardless of the size of a file`. For such small entries, the above assumption does not hold.
>> 
>> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the ZipEntry includes a Zip64 extra information field AND the 'compressed size' and 'uncompressed size' have the expected Zip64 "magic" value 0xFFFFFFFF. This brings ZipInputStream into alignment with the APPNOTE format spec:
>> 
>> 
>> When extracting, if the zip64 extended information extra 
>> field is present for the file the compressed and 
>> uncompressed sizes will be 8 byte values.
>> 
>> 
>> While small Zip64 files with 8-byte data descriptors are not commonly found in the wild, it is possible to create one using the Info-ZIP command line `-fd` flag:
>> 
>> `echo hello | zip -fd > hello.zip`
>> 
>> The PR also adds a test verifying that such a small Zip64 file can be parsed by ZipInputStream.
>
> Eirik Bjørsnøs has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Remove trailing whitespace
>  - Remove trailing whitespace

src/java.base/share/classes/java/util/zip/ZipInputStream.java line 664:

> 662: 
> 663:         // The LOC's 'compressed size' and 'uncompressed size' must both be marked for Zip64
> 664:         if (csize != ZIP64_MAGICVAL || size != ZIP64_MAGICVAL) {

The spec for this says different. It says:

>
> 4.4.4 general purpose bit flag:
> ...
>    Bit 3: If this bit is set, the fields crc-32, compressed size and uncompressed size are set to zero in the local header.  The correct values are put in the data descriptor immediately following the compressed data.  

So it expects the value zero for the compressed/uncompressed sizes in the LOC when the data descriptor bit is set.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1453460177


More information about the core-libs-dev mailing list