RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v12]
Jaikiran Pai
jpai at openjdk.org
Tue Jan 16 14:32:53 UTC 2024
On Wed, 10 Jan 2024 13:39:52 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:
>> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the number of compressed or uncompressed bytes read from the inflater is larger than the Zip64 magic value.
>>
>> While the ZIP format mandates that the data descriptor `SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it also states that `ZIP64 format MAY be used regardless of the size of a file`. For such small entries, the above assumption does not hold.
>>
>> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the ZipEntry includes a Zip64 extra information field AND the 'compressed size' and 'uncompressed size' have the expected Zip64 "magic" value 0xFFFFFFFF. This brings ZipInputStream into alignment with the APPNOTE format spec:
>>
>>
>> When extracting, if the zip64 extended information extra
>> field is present for the file the compressed and
>> uncompressed sizes will be 8 byte values.
>>
>>
>> While small Zip64 files with 8-byte data descriptors are not commonly found in the wild, it is possible to create one using the Info-ZIP command line `-fd` flag:
>>
>> `echo hello | zip -fd > hello.zip`
>>
>> The PR also adds a test verifying that such a small Zip64 file can be parsed by ZipInputStream.
>
> Eirik Bjørsnøs has updated the pull request incrementally with two additional commits since the last revision:
>
> - Remove trailing whitespace
> - Remove trailing whitespace
src/java.base/share/classes/java/util/zip/ZipInputStream.java line 706:
> 704: * @return true if the extra field is a Zip64 extra field compatible with data descriptors
> 705: */
> 706: private static boolean isZip64DataDescriptorField(int headerId, byte[] extra, int blockStart, int blockSize) {
I understand the goals of this method - what it's trying to do is, assure the caller that the extra field/block actually is a zip64 extra block. That assurance is then used to access the data descriptor content as 8 byte fields.
However, I think in this proposed implementation of this method we are perhaps doing a bit too much. Specifically, I don't think we should check what values have been stamped for "Original size" and "Compressed size" fields of this zip64 block. I think, those values (presence or absence) shouldn't play a role in deciding whether we have to read a data descriptor size fields as 8 bytes. Doing these checks for these zip64 original/compressed size fields, I think will open up more permutations about which zip entries get processed as 8 byte data descriptors.
Given the context in which this method is used, I think the only checks that we should do in this method is to verify that the header id is `ZIP64_EXTID`.
Perhaps then this `isZip64DataDescriptorField(...)` won't be needed and we can just inline that `headerid == ZIP64_EXTID` check inline in the implementation of `expect64BitDataDescriptor(...)` method
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1453508949
More information about the core-libs-dev
mailing list