RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v13]

Jaikiran Pai jpai at openjdk.org
Tue Jan 16 14:37:31 UTC 2024


On Tue, 16 Jan 2024 14:32:51 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the number of compressed or uncompressed bytes read from the inflater is larger than the Zip64 magic value.
>> 
>> While the ZIP format  mandates that the data descriptor `SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it also states that `ZIP64 format MAY be used regardless of the size of a file`. For such small entries, the above assumption does not hold.
>> 
>> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the ZipEntry includes a Zip64 extra information field AND the 'compressed size' and 'uncompressed size' have the expected Zip64 "magic" value 0xFFFFFFFF. This brings ZipInputStream into alignment with the APPNOTE format spec:
>> 
>> 
>> When extracting, if the zip64 extended information extra 
>> field is present for the file the compressed and 
>> uncompressed sizes will be 8 byte values.
>> 
>> 
>> While small Zip64 files with 8-byte data descriptors are not commonly found in the wild, it is possible to create one using the Info-ZIP command line `-fd` flag:
>> 
>> `echo hello | zip -fd > hello.zip`
>> 
>> The PR also adds a test verifying that such a small Zip64 file can be parsed by ZipInputStream.
>
> Eirik Bjørsnøs has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove extra whitespace
>   
>   Co-authored-by: Andrey Turbanov <turbanoff at gmail.com>

On a general note - thank you for updating this PR from the previous proposed approach. The current proposed approach of solely relying on the data that comes from within the stream to decide whether or not to use 8 bytes for a data descriptor compressed/uncompressed fields, looks right to me. That prevents issues related to basing this decision on some application controlled/manipulated data which may not match the stream content.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1893873146


More information about the core-libs-dev mailing list