RFD: Invalid CRC in ZipFile, ZipFileSystem
Eirik Bjørsnøs
eirbjo at gmail.com
Fri Feb 24 08:43:38 UTC 2023
On Fri, Feb 24, 2023 at 9:22 AM Alan Bateman <Alan.Bateman at oracle.com>
wrote:
> As a general point, the ZIP format can have redundant metadata and there
> can be cases where the CRC-32 isn't available when writing a LOC header.
>
ZipInputStream throws exceptions in both of these cases. If the general
purpose bit flag 3 is set, then CRC is set to zero in the LOC, and the
actual CRC is put in the data descriptor immediately following the
compressed data. With this format, an exception is thrown in
ZipInputStream.readEnd:
https://github.com/openjdk/jdk/blob/8f7c4969c28c58ae4b9adeed822707b28be16dd0/src/java.base/share/classes/java/util/zip/ZipInputStream.java#L624-L626
If the CRC-32 values is in the LOC, the exception is thrown when the read
reaches the end of the data, in ZipInputStream.read:
https://github.com/openjdk/jdk/blob/8f7c4969c28c58ae4b9adeed822707b28be16dd0/src/java.base/share/classes/java/util/zip/ZipInputStream.java#L624-L626
(The test I linked to covers both of these two cases)
At the same time, the APIs work differently in that ZipFile opens a ZIP
> file so it has access to the CEN whereas ZipInputStream is working on a
> stream of ZIP entries and does not read the CEN. So some inconsistencies
> in the handling is not too surprising.
>
Indeed, but I found it a bit amusing that ZipFile (and ZipFileSystem),
which both see the "full picture", are actually the ones to not enforce the
CRC. It does not make complete sense to me from a purely technical point of
view.
Perhaps the CRC in the CEN is less trustworthy across implementations than
the one found in the LOC/Data Descriptor..
Cheers,
Eirik.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230224/9248b35a/attachment.htm>
More information about the core-libs-dev
mailing list