JDK7's java.util.zip breakage with very large files

Alexander Sack pisymbol at gmail.com
Thu Feb 7 16:54:56 UTC 2013


Folks:

What I am trying to do is generate Zip64 extensions within a JAR file
and then dissect the zip contents (end of directory records, file
headers, etc.).

However, when I use jar or a small program that I wrote which uses
java.util.zip to zip up a very large file >12G, I do not get the
expected output.

Despite the fact that jar succeeds, the zip binary created does not
have an End of Directory (EoD) record at all! (like
ZipOutStream.finish() was never called).

I am able to extract the large file and verify its MD5 which is correct.

So I am doing this (data is 12G):

- md5sum data
- jar cvf data.jar data
[wait a while, out is around 2.3G, return code is 0]
- bvi data.jar (look for EoD at end of jar file, magic 0x06054B50 or
even the zip64 (EoD) locator/record signatures)

Not found! (bummer)

Extract:

- jar tvf data.jar -> I see the correct size which means jar is
reading the 64-bit sizes correctly, earlier builds (<b55 I think) I
would see -1.
- jar xvf data.jar
- md5sum data
- Matches original data

I noticed that after the deflate compressed blocks, the file is
appended with a lot of zeros (I originally thought it got truncated
but from the above extraction test, that is not the case).

This is on a x86-64 Fedora 13 system using yesterday's JDK7 build tree
(I downloaded the build infrastructure and set it to download bundles
during the build - I had no build failures).

Why for very large files does jar (java.util.zip) output a
non-standard zip file, i.e. no EoD record and friends?

I have just begun to look at the actual code to see whether this is
pilot error on my part or something else a foot (my code calls
zos.finish() explicitly which has no effect - not sure where jar calls
it just yet from ZipOutputStream.finish()).

Thanks!

-aps



More information about the core-libs-dev mailing list