"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)
Dawid Weiss
dawid.weiss at gmail.com
Sat Nov 28 14:34:47 UTC 2020
I know what the problem is and I know what the fix should be (and tried to
convey it) but I don't have a patch. Writing the test case is not a problem
but it'll be a lengthy test on slower drives (has to create a zip file that
exceeds 4gb of disk space). I don't think there is a way to fake the file
system by using a sparse file, sadly. Would this be acceptable? If so then
sure - I can provide a reproducible test case that will fail for the
current implementation.
D.
On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <LANCE.ANDERSEN at oracle.com>
wrote:
> Hi Dawid,
>
> If you believe you have a a fix, please submit a patch along with a JTREG
> test case. Ideally the test case would create zip file as part of the test.
>
> Best
> Lance
>
> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
> Just for the archives - I'm not sure if this ended up being filed to
> the bug system - the repro below demonstrates the bug on all JDKs up
> to the newest one.
>
> # create a single random file of 250 megabytes
> head -c 250M < /dev/urandom > rnd.bin
> # create a bunch of hardlinks to the same file.
> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
> zip -0 archive.zip rnd*.bin
> # show the content of the archive - for all entries beyond 4gb this should
> # show extra data blocks like below (note the order of extra data
> blocks - this is important).
> #
> # The central-directory extra field contains:
> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
> # The local extra field has UTC/GMT modification/access times.
> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
> # 01 04 ea 03 00 00 04 ea 03 00 00.
> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
> # a4 06 a0 86 01 00 00 00.
> #
> zipinfo -v archive.zip
>
> # Bug repro code:
> cat > Test.java <<EOF
> import java.io.*;
> import java.nio.*;
> import java.nio.file.*;
> import java.nio.file.attribute.*;
>
> public class Test {
> public static void main(String[]args) throws IOException {
> try (FileSystem fs =
> FileSystems.newFileSystem(Paths.get("archive.zip"))) {
> for (Path root : fs.getRootDirectories()) {
> Files.walkFileTree(root, new SimpleFileVisitor<>() {
> public FileVisitResult visitFile(Path file,
> BasicFileAttributes attrs) throws IOException {
> return FileVisitResult.CONTINUE;
> }
> });
> }
> }
> }
> }
> EOF
> # and run it.
> java Test.java
>
> The above ends with:
>
> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig
> ->841cd111
> at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
> at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
> at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
> at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
> at
> jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
> at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
> at
> java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
> at
> java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
> at
> java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
> at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
> at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
> at Test.main(Test.java:10)
>
> The fix is relatively simple I think - the current code assumes entry
> offset is < 4GB when parsing extra data for
> ID 0x5455. This should be evaluated lazily after all extra data blocks
> are parsed to ensure the file offset is correctly
> updated based on block ID 0x0001, regardless of its ordering within
> other extra data blocks.
>
>
>
> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
>
>
> This looks like a legitimate bug to me (the zip file system implementation
> is sensitive to the order of extra parameters and may throw an unexpected
> error
> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no
> permissions - but I think it'd be worth adding one?
>
> Dawid
>
> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
>
> Hi Lance,
>
> Yes, this is exactly the point where the problem is in JDK code. It's
> directly
> related to the zip entry beyond max int offset. The code tries to read data
> from a local zip entry header at locoff, here:
>
> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>
> but the locoff is set to ~0 as per the spec for files exceeding
> 4 gigabytes, which says:
>
> 4.4.16 relative offset of local header: (4 bytes)
>
> This is the offset from the start of the first disk on
> which this file appears, to where the local header SHOULD
> be found. If an archive is in ZIP64 format and the value
> in this field is 0xFFFFFFFF, the size will be in the
> corresponding 8 byte zip64 extended information extra field.
>
> A proper fix would be to read the local header from zip64 extra data
> first. I don't know how this interferes with the rest of the code though -
> didn't have enough time to look at it. As it stands, zip entries
> beyond 4GB cause
> an unchecked exception while attribute-scanning. This simple snippet
> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>
> try (FileSystem fs =
> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
> for (Path root : fs.getRootDirectories()) {
> Files.walkFileTree(root, new SimpleFileVisitor<>() {
> @Override
> public FileVisitResult visitFile(Path file, BasicFileAttributes
> attrs) throws IOException {
> return FileVisitResult.CONTINUE;
> }
> });
> }
> }
>
> The walkFileTree method is key here as it attempts to harvest
> attributes (why we use it is another story -- this saves a lot of time
> on large, network-mounted regular directories when you're collecting
> file metadata).
>
>
> Dawid
>
>
> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
> <LANCE.ANDERSEN at oracle.com> wrote:
>
>
> Hi David,
>
> From a quick look at ZipFileSystem this appears to be an optimization to
> avoid looking at the LOC extended Timestamp Extra field
>
> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>
> If the "zipinfo-time" entry was set to “false” in the Map specified when
> creating the Zip FileSystem,
>
> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>
> and the data size of the CEN extended Timestamp is 5 (flag + mod time)
>
> The modified time is used from the CEN Extended Timestamp extra field
>
> Otherwise get the modified time, creation time, and access time from the
> LOC Extended Timestamp extra field.
>
>
> —————
>
> Extended Timestamp Extra Field:
>
> ==============================
>
> The following is the layout of the extended-timestamp extra block.
> (Last Revision 19970118)
>
> Local-header version:
>
> Value Size Description
> ----- ---- -----------
> (time) 0x5455 Short tag for this extra block type ("UT")
> TSize Short total data size for this block
> Flags Byte info bits
> (ModTime) Long time of last modification (UTC/GMT)
> (AcTime) Long time of last access (UTC/GMT)
> (CrTime) Long time of original creation (UTC/GMT)
>
> Central-header version:
>
> Value Size Description
> ----- ---- -----------
> (time) 0x5455 Short tag for this extra block type ("UT")
> TSize Short total data size for this block
> Flags Byte info bits (refers to local header!)
> (ModTime) Long time of last modification (UTC/GMT)
>
> The central-header extra field contains the modification time
> only,
> or no timestamp at all. TSize is used to flag its presence or
> absence. But note:
>
> If "Flags" indicates that Modtime is present in the local
> header
> field, it MUST be present in the central header field, too!
> This correspondence is required because the modification time
> value may be used to support trans-timezone freshening and
> updating operations with zip archives.
>
> The time values are in standard Unix signed-long format,
> indicating
> the number of seconds since 1 January 1970 00:00:00. The times
> are relative to Coordinated Universal Time (UTC), also sometimes
> referred to as Greenwich Mean Time (GMT). To convert to local
> time,
> the software must know the local timezone offset from UTC/GMT.
>
> The lower three bits of Flags in both headers indicate which time-
> stamps are present in the LOCAL extra field:
>
> bit 0 if set, modification time is present
> bit 1 if set, access time is present
> bit 2 if set, creation time is present
> bits 3-7 reserved for additional timestamps; not set
>
> Those times that are present will appear in the order indicated,
> but
> any combination of times may be omitted. (Creation time may be
> present without access time, for example.) TSize should equal
> (1 + 4*(number of set bits in Flags)), as the block is currently
> defined. Other timestamps may be added in the future.
>
>
> --------------
>
> It's hard to comment on why you received the error that you did but it is
> possible the tool that was used for writing the entry did something
> unexpected.
>
> Out of curiosity, have you tried using ZipFile/ZipEntry to access the
> entry?
>
>
> Best,
> Lance
>
>
> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
> Hello,
>
> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
> entries are intact) that causes a runtime exception when scanning its
> contents with ZipFileSystem. The exception indicates an invalid
> signature when parsing EXTID_EXTT. I don't quite understand this
> comment in the code:
>
> case EXTID_EXTT:
> // spec says the Extened timestamp in cen only has mtime
> // need to read the loc to get the extra a/ctime, if flag
> // "zipinfo-time" is not specified to false;
> // there is performance cost (move up to loc and read) to
> // access the loc table foreach entry;
> if (zipfs.noExtt) {
> if (sz == 5)
> mtime = unixToJavaTime(LG(extra, pos + 1));
> break;
> }
> ...
>
> but this ZIP file has the extra data block of exactly 5 bytes, as
> indicated by zipinfo:
>
> Central directory entry #6:
> ---------------------------
> ...
> file system or operating system of origin: Unix
> version of encoding software: 3.0
> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
> minimum software version required to extract: 2.0
> compression method: deflated
> ...
> extended local header: no
> file last modified on (DOS date/time): 2018 Mar 1 04:56:20
> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
> ...
> Unix file attributes (100000 octal): ----------
> MS-DOS file attributes (01 hex): read-only
>
> The central-directory extra field contains:
> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
> The local extra field has UTC/GMT modification/access times.
>
> The above conditional block checking for length == 5 would have worked
> in ZipFileSystem but it's surrounded by a condition over an
> externally-provided property - zipfs.noExtt is only set to true if:
>
> this.noExtt = "false".equals(env.get("zipinfo-time"));
>
> I can't share this particular ZIP file with you and I don't know how
> it was created but it seems like that "zipinfo-time" flag could be
> omitted if the length of the extra data field is exactly 5?
>
> Dawid
>
>
>
> Best
> Lance
> ------------------
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>
>
>
>
> Best
> Lance
> ------------------
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>
>
>
More information about the core-libs-dev
mailing list