RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]
Jaikiran Pai
jpai at openjdk.org
Mon Jan 8 15:16:31 UTC 2024
On Fri, 22 Dec 2023 07:55:24 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:
>> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the number of compressed or uncompressed bytes read from the inflater is larger than the Zip64 magic value.
>>
>> While the ZIP format mandates that the data descriptor `SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it also states that `ZIP64 format MAY be used regardless of the size of a file`. For such small entries, the above assumption does not hold.
>>
>> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the ZipEntry includes a Zip64 extra information field. This brings ZipInputStream into alignment with the APPNOTE format spec:
>>
>>
>> When extracting, if the zip64 extended information extra
>> field is present for the file the compressed and
>> uncompressed sizes will be 8 byte values.
>>
>>
>> While small Zip64 files with 8-byte data descriptors are not commonly found in the wild, it is possible to create one using the Info-ZIP command line `-fd` flag:
>>
>> `echo hello | zip -fd > hello.zip`
>>
>> The PR also adds a test verifying that such a small Zip64 file can be parsed by ZipInputStream.
>
> Eirik Bjørsnøs has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits:
>
> - Merge branch 'master' into data-descriptor
> - Extract ZIP64_BLOCK_SIZE_OFFSET as a constant
> - A Zip64 extra field used in a LOC header must include both the uncompressed and compressed size fields, and does not include local header offset or disk start number fields. Conequently, a valid LOC Zip64 block must always be 16 bytes long.
> - Document better the zip command and options used to generate the test vector ZIP
> - Fix spelling of "presence"
> - Add a @bug reference in the test
> - Use the term "block size" when referring to the size of a Zip64 extra field data block
> - Update comment reflect that a Zip64 extended field in a LOC header has only two valid block sizes
> - Convert test from testNG to JUnit
> - Fix the check that the size of an extra field block size must not grow past the total extra field length
> - ... and 23 more: https://git.openjdk.org/jdk/compare/e2042421...ddff130f
src/java.base/share/classes/java/util/zip/ZipInputStream.java line 659:
> 657: if (extra != null && extra.length > fixedSize) {
> 658: for (int i = 0; i < extra.length;) {
> 659: int id = get16(extra, i);
This and other similar calls in this new method have a potential to throw an `ArrayIndexOutOfBoundsException` because we can't trust the `byte[]` returned by the `ZipEntry.getExtra()` call to be actually having the right amount of extra block data. The private `hasMagic` method in `java.util.jar.JarOutputStream` has an example where we catch the `ArrayIndexOutOfBoundsException` when dealing with the extra data, we should do similar here and return `false`.
(More on the trustworthiness of that byte[] value returned by `ZipEntry.getExtra()` as a separate comment)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1444806425
More information about the core-libs-dev
mailing list