"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)

Dawid Weiss dawid.weiss at gmail.com
Wed Oct 21 18:36:27 UTC 2020


Hi Lance,

Yes, this is exactly the point where the problem is in JDK code. It's directly
related to the zip entry beyond max int offset. The code tries to read data
from a local zip entry header at locoff, here:

if (zipfs.readFullyAt(buf, 0, buf.length , locoff)

but the locoff is set to ~0 as per the spec for files exceeding
4 gigabytes, which says:

   4.4.16 relative offset of local header: (4 bytes)

       This is the offset from the start of the first disk on
       which this file appears, to where the local header SHOULD
       be found.  If an archive is in ZIP64 format and the value
       in this field is 0xFFFFFFFF, the size will be in the
       corresponding 8 byte zip64 extended information extra field.

A proper fix would be to read the local header from zip64 extra data
first. I don't know how this interferes with the rest of the code though -
didn't have enough time to look at it. As it stands, zip entries
beyond 4GB cause
an unchecked exception while attribute-scanning. This simple snippet
is enough to demonstrate it, given a ZIP entry beyond 4GB range:

try (FileSystem fs =
FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
  for (Path root : fs.getRootDirectories()) {
    Files.walkFileTree(root, new SimpleFileVisitor<>() {
      @Override
      public FileVisitResult visitFile(Path file, BasicFileAttributes
attrs) throws IOException {
        return FileVisitResult.CONTINUE;
      }
    });
  }
}

The walkFileTree method is key here as it attempts to harvest
attributes (why we use it is another story -- this saves a lot of time
on large, network-mounted regular directories when you're collecting
file metadata).


Dawid


On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
<LANCE.ANDERSEN at oracle.com> wrote:
>
> Hi David,
>
> From  a quick look at ZipFileSystem this appears to be an optimization to avoid looking at the LOC extended Timestamp Extra field
>
> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>
>   If the "zipinfo-time" entry was set  to “false” in the Map specified when creating the Zip FileSystem,
>
> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>
> and the  data size of the CEN extended Timestamp is 5 (flag + mod time)
>
> The modified time is used from the CEN Extended Timestamp extra field
>
> Otherwise get the modified time, creation time, and access time from the LOC Extended Timestamp extra field.
>
>
> —————
>
>  Extended Timestamp Extra Field:
>
>           ==============================
>
>           The following is the layout of the extended-timestamp extra block.
>           (Last Revision 19970118)
>
>           Local-header version:
>
>           Value         Size        Description
>           -----         ----        -----------
>   (time)  0x5455        Short       tag for this extra block type ("UT")
>           TSize         Short       total data size for this block
>           Flags         Byte        info bits
>           (ModTime)     Long        time of last modification (UTC/GMT)
>           (AcTime)      Long        time of last access (UTC/GMT)
>           (CrTime)      Long        time of original creation (UTC/GMT)
>
>           Central-header version:
>
>           Value         Size        Description
>           -----         ----        -----------
>   (time)  0x5455        Short       tag for this extra block type ("UT")
>           TSize         Short       total data size for this block
>           Flags         Byte        info bits (refers to local header!)
>           (ModTime)     Long        time of last modification (UTC/GMT)
>
>           The central-header extra field contains the modification time only,
>           or no timestamp at all.  TSize is used to flag its presence or
>           absence.  But note:
>
>               If "Flags" indicates that Modtime is present in the local header
>               field, it MUST be present in the central header field, too!
>               This correspondence is required because the modification time
>               value may be used to support trans-timezone freshening and
>               updating operations with zip archives.
>
>           The time values are in standard Unix signed-long format, indicating
>           the number of seconds since 1 January 1970 00:00:00.  The times
>           are relative to Coordinated Universal Time (UTC), also sometimes
>           referred to as Greenwich Mean Time (GMT).  To convert to local time,
>           the software must know the local timezone offset from UTC/GMT.
>
>           The lower three bits of Flags in both headers indicate which time-
>           stamps are present in the LOCAL extra field:
>
>                 bit 0           if set, modification time is present
>                 bit 1           if set, access time is present
>                 bit 2           if set, creation time is present
>                 bits 3-7        reserved for additional timestamps; not set
>
>           Those times that are present will appear in the order indicated, but
>           any combination of times may be omitted.  (Creation time may be
>           present without access time, for example.)  TSize should equal
>           (1 + 4*(number of set bits in Flags)), as the block is currently
>           defined.  Other timestamps may be added in the future.
>
>
> --------------
>
> It's hard to comment on why you received the error that you did but it is possible the tool that was used for writing the entry  did something unexpected.
>
> Out of curiosity, have you tried using ZipFile/ZipEntry to access the entry?
>
>
> Best,
> Lance
>
>
> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
> Hello,
>
> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
> entries are intact) that causes a runtime exception when scanning its
> contents with ZipFileSystem. The exception indicates an invalid
> signature when parsing EXTID_EXTT. I don't quite understand this
> comment in the code:
>
> case EXTID_EXTT:
>    // spec says the Extened timestamp in cen only has mtime
>    // need to read the loc to get the extra a/ctime, if flag
>    // "zipinfo-time" is not specified to false;
>    // there is performance cost (move up to loc and read) to
>    // access the loc table foreach entry;
>    if (zipfs.noExtt) {
>        if (sz == 5)
>            mtime = unixToJavaTime(LG(extra, pos + 1));
>         break;
>    }
> ...
>
> but this ZIP file has the extra data block of exactly 5 bytes, as
> indicated by zipinfo:
>
> Central directory entry #6:
> ---------------------------
> ...
>  file system or operating system of origin:      Unix
>  version of encoding software:                   3.0
>  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
>  minimum software version required to extract:   2.0
>  compression method:                             deflated
> ...
>  extended local header:                          no
>  file last modified on (DOS date/time):          2018 Mar 1 04:56:20
>  file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>  file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
> ...
>  Unix file attributes (100000 octal):            ----------
>  MS-DOS file attributes (01 hex):                read-only
>
>  The central-directory extra field contains:
>  - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>    The local extra field has UTC/GMT modification/access times.
>
> The above conditional block checking for length == 5 would have worked
> in ZipFileSystem but it's surrounded by a condition over an
> externally-provided property - zipfs.noExtt is only set to true if:
>
> this.noExtt = "false".equals(env.get("zipinfo-time"));
>
> I can't share this particular ZIP file with you and I don't know how
> it was created but it seems like that "zipinfo-time" flag could be
> omitted if the length of the extra data field is exactly 5?
>
> Dawid
>
>
>
> Best
> Lance
> ------------------
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>
>


More information about the core-libs-dev mailing list