"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)
Dawid Weiss
dawid.weiss at gmail.com
Fri Oct 23 07:12:34 UTC 2020
This looks like a legitimate bug to me (the zip file system implementation
is sensitive to the order of extra parameters and may throw an unexpected
error
on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no
permissions - but I think it'd be worth adding one?
Dawid
On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com> wrote:
> Hi Lance,
>
> Yes, this is exactly the point where the problem is in JDK code. It's
> directly
> related to the zip entry beyond max int offset. The code tries to read data
> from a local zip entry header at locoff, here:
>
> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>
> but the locoff is set to ~0 as per the spec for files exceeding
> 4 gigabytes, which says:
>
> 4.4.16 relative offset of local header: (4 bytes)
>
> This is the offset from the start of the first disk on
> which this file appears, to where the local header SHOULD
> be found. If an archive is in ZIP64 format and the value
> in this field is 0xFFFFFFFF, the size will be in the
> corresponding 8 byte zip64 extended information extra field.
>
> A proper fix would be to read the local header from zip64 extra data
> first. I don't know how this interferes with the rest of the code though -
> didn't have enough time to look at it. As it stands, zip entries
> beyond 4GB cause
> an unchecked exception while attribute-scanning. This simple snippet
> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>
> try (FileSystem fs =
> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
> for (Path root : fs.getRootDirectories()) {
> Files.walkFileTree(root, new SimpleFileVisitor<>() {
> @Override
> public FileVisitResult visitFile(Path file, BasicFileAttributes
> attrs) throws IOException {
> return FileVisitResult.CONTINUE;
> }
> });
> }
> }
>
> The walkFileTree method is key here as it attempts to harvest
> attributes (why we use it is another story -- this saves a lot of time
> on large, network-mounted regular directories when you're collecting
> file metadata).
>
>
> Dawid
>
>
> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
> <LANCE.ANDERSEN at oracle.com> wrote:
> >
> > Hi David,
> >
> > From a quick look at ZipFileSystem this appears to be an optimization
> to avoid looking at the LOC extended Timestamp Extra field
> >
> > If a Info-ZIP Extended Timestamp (0x5455)is found then:
> >
> > If the "zipinfo-time" entry was set to “false” in the Map specified
> when creating the Zip FileSystem,
> >
> > FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
> >
> > and the data size of the CEN extended Timestamp is 5 (flag + mod time)
> >
> > The modified time is used from the CEN Extended Timestamp extra field
> >
> > Otherwise get the modified time, creation time, and access time from the
> LOC Extended Timestamp extra field.
> >
> >
> > —————
> >
> > Extended Timestamp Extra Field:
> >
> > ==============================
> >
> > The following is the layout of the extended-timestamp extra
> block.
> > (Last Revision 19970118)
> >
> > Local-header version:
> >
> > Value Size Description
> > ----- ---- -----------
> > (time) 0x5455 Short tag for this extra block type ("UT")
> > TSize Short total data size for this block
> > Flags Byte info bits
> > (ModTime) Long time of last modification (UTC/GMT)
> > (AcTime) Long time of last access (UTC/GMT)
> > (CrTime) Long time of original creation (UTC/GMT)
> >
> > Central-header version:
> >
> > Value Size Description
> > ----- ---- -----------
> > (time) 0x5455 Short tag for this extra block type ("UT")
> > TSize Short total data size for this block
> > Flags Byte info bits (refers to local header!)
> > (ModTime) Long time of last modification (UTC/GMT)
> >
> > The central-header extra field contains the modification time
> only,
> > or no timestamp at all. TSize is used to flag its presence or
> > absence. But note:
> >
> > If "Flags" indicates that Modtime is present in the local
> header
> > field, it MUST be present in the central header field, too!
> > This correspondence is required because the modification
> time
> > value may be used to support trans-timezone freshening and
> > updating operations with zip archives.
> >
> > The time values are in standard Unix signed-long format,
> indicating
> > the number of seconds since 1 January 1970 00:00:00. The times
> > are relative to Coordinated Universal Time (UTC), also
> sometimes
> > referred to as Greenwich Mean Time (GMT). To convert to local
> time,
> > the software must know the local timezone offset from UTC/GMT.
> >
> > The lower three bits of Flags in both headers indicate which
> time-
> > stamps are present in the LOCAL extra field:
> >
> > bit 0 if set, modification time is present
> > bit 1 if set, access time is present
> > bit 2 if set, creation time is present
> > bits 3-7 reserved for additional timestamps; not
> set
> >
> > Those times that are present will appear in the order
> indicated, but
> > any combination of times may be omitted. (Creation time may be
> > present without access time, for example.) TSize should equal
> > (1 + 4*(number of set bits in Flags)), as the block is
> currently
> > defined. Other timestamps may be added in the future.
> >
> >
> > --------------
> >
> > It's hard to comment on why you received the error that you did but it
> is possible the tool that was used for writing the entry did something
> unexpected.
> >
> > Out of curiosity, have you tried using ZipFile/ZipEntry to access the
> entry?
> >
> >
> > Best,
> > Lance
> >
> >
> > On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
> >
> > Hello,
> >
> > We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
> > entries are intact) that causes a runtime exception when scanning its
> > contents with ZipFileSystem. The exception indicates an invalid
> > signature when parsing EXTID_EXTT. I don't quite understand this
> > comment in the code:
> >
> > case EXTID_EXTT:
> > // spec says the Extened timestamp in cen only has mtime
> > // need to read the loc to get the extra a/ctime, if flag
> > // "zipinfo-time" is not specified to false;
> > // there is performance cost (move up to loc and read) to
> > // access the loc table foreach entry;
> > if (zipfs.noExtt) {
> > if (sz == 5)
> > mtime = unixToJavaTime(LG(extra, pos + 1));
> > break;
> > }
> > ...
> >
> > but this ZIP file has the extra data block of exactly 5 bytes, as
> > indicated by zipinfo:
> >
> > Central directory entry #6:
> > ---------------------------
> > ...
> > file system or operating system of origin: Unix
> > version of encoding software: 3.0
> > minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
> > minimum software version required to extract: 2.0
> > compression method: deflated
> > ...
> > extended local header: no
> > file last modified on (DOS date/time): 2018 Mar 1 04:56:20
> > file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19
> local
> > file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
> > ...
> > Unix file attributes (100000 octal): ----------
> > MS-DOS file attributes (01 hex): read-only
> >
> > The central-directory extra field contains:
> > - A subfield with ID 0x5455 (universal time) and 5 data bytes.
> > The local extra field has UTC/GMT modification/access times.
> >
> > The above conditional block checking for length == 5 would have worked
> > in ZipFileSystem but it's surrounded by a condition over an
> > externally-provided property - zipfs.noExtt is only set to true if:
> >
> > this.noExtt = "false".equals(env.get("zipinfo-time"));
> >
> > I can't share this particular ZIP file with you and I don't know how
> > it was created but it seems like that "zipinfo-time" flag could be
> > omitted if the length of the extra data field is exactly 5?
> >
> > Dawid
> >
> >
> >
> > Best
> > Lance
> > ------------------
> >
> >
> >
> > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> > Oracle Java Engineering
> > 1 Network Drive
> > Burlington, MA 01803
> > Lance.Andersen at oracle.com
> >
> >
> >
> >
>
More information about the core-libs-dev
mailing list