RFR: 8328995: Launcher can't open jar files where the offset of the manifest is >4GB [v8]

Fri Jul 19 14:18:36 UTC 2024

On Thu, 18 Jul 2024 21:36:54 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:

>> This change fixes a zip64 bug in the launcher that is prevent it from reading the manifest of jars where the 'relative offset of local header' field in the central directory entry is >4GB. As described in APPNOTE.TXT 4.5.3, the offset is too large to be stored in the central directory it is stored in a 'Zip64 Extended Information Extra Field'.
>
> Liam Miller-Cushon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:
> 
>  - Add a missing `break`
>  - Merge branch 'master' into JDK-8328995
>  - Merge remote-tracking branch 'origin/master' into JDK-8328995
>  - Move test to test/jdk/tools/launcher
>  - Add some more comments
>  - Maximum Zip64 extra field length is 32
>  - Make cendsk an unsigned short
>  - Fix disk number size
>  - Improvements
>    
>    * don't rely on variable length arrays
>    * only run the test of 64 bit machines, since it requires >4GB of heap
>  - 8328995: launcher can't open jar files where the offset of the manifest is >4GB

src/java.base/share/native/libjli/parse_manifest.c line 507:

> 505:                   || censiz == ZIP64_MAGICVAL
> 506:                   || cenoff == ZIP64_MAGICVAL)
> 507:                 && cenext > 0) {

I went through these changes and here's my first round of review.

Before the changes proposed in this PR, this part of the code which is responsible for finding and returning a constructed zip entry instance would blindly use the local header offset value from the central directory of the entry being looked up. It would then "seek" to that position and read the metadata of that entry (details like uncompressed length, compressed length, the compression method) and return back that entry instance. Clearly this isn't right when zip64 entries are involved since various details of such an entry can reside in the zip64 extra field block of that entry instead of being in the central directory.

This change proposes that this part of the code first identify that a zip64 extra block exists for a particular entry and then read that zip64 block, validate some parts of the zip64 block and if that validation succeeds, then use the entry metadata from the zip64 block for constructing and returning the entry instance.

The idea to identify and support zip64 entries in this part of the code I think is agreed as the right one.
Coming to the implementation, I think we need to come to an agreement on what we want to do here. Specifically:

- What logic do we use to decide when to look for zip64 extra block for an entry? The changes in this PR (at this line here) proposes that we look for zip64 extra block for an entry if any of the compressed size, uncompressed size or the local header offset value is `0xFFFFFFFFL` and the extra field size noted in this central directory entry is greater than 0. This however doesn't match with what we do in the implementation of `java.util.zip.ZipFile` which too does similar checks for zip64 entry presence when parsing the central directory. Very specifically, in the ZipFile where this logic is implemented is here https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/zip/ZipFile.java#L1254. That code in the ZipFile has gone through the necessary vetting for dealing with various possibilities with the zip files. I think we should implement that same logic here while checking for zip64 entries.
- The next one to decide is what kind of validations do we want to do in this code for zip64 extra field block. I think here too we should match whatever w currently do in the `java.util.zip.ZipFile` implementation, specifically what's being done in the `checkExtraFields` and `checkZip64ExtraFieldValues` methods of that class.
- Next, what do we do when these validations fail. Right now in this proposed implementation, we appear to consider some validation failures as the entry being absent. My opinion is that we should be more stricter and fail the jar like we do in the implementation of `ZipFile` for such validation failures.

One additional thing that this proposed change does is that it validates that the local header offset determined for an entry (either through the central directory or through the zip64 extra field block) does indeed point to a local header for an entry with the same file name. I don't think we do that in the ZipFile implementation. But I think doing that check in this code is fine.

There are few other implementation details that I haven't commented about because they may not matter depending on what conclusions we arrive at for the above few points.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18479#discussion_r1684450259