"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)

Lance Andersen LANCE.ANDERSEN at ORACLE.COM
Fri Nov 27 19:37:41 UTC 2020


Hi Dawid,

If you believe you have a a fix, please submit a patch along with a JTREG test case.  Ideally the test case would create zip file as part of the test.

Best
Lance

> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
> 
> Just for the archives - I'm not sure if this ended up being filed to
> the bug system - the repro below demonstrates the bug on all JDKs up
> to the newest one.
> 
> # create a single random file of 250 megabytes
> head -c 250M < /dev/urandom > rnd.bin
> # create a bunch of hardlinks to the same file.
> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
> zip -0 archive.zip rnd*.bin
> # show the content of the archive - for all entries beyond 4gb this should
> # show extra data blocks like below (note the order of extra data
> blocks - this is important).
> #
> #  The central-directory extra field contains:
> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
> #   The local extra field has UTC/GMT modification/access times.
> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
> #   01 04 ea 03 00 00 04 ea 03 00 00.
> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
> #  a4 06 a0 86 01 00 00 00.
> #
> zipinfo -v archive.zip
> 
> # Bug repro code:
> cat > Test.java <<EOF
> import java.io.*;
> import java.nio.*;
> import java.nio.file.*;
> import java.nio.file.attribute.*;
> 
> public class Test {
>  public static void main(String[]args) throws IOException {
>    try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) {
>      for (Path root : fs.getRootDirectories()) {
>        Files.walkFileTree(root, new SimpleFileVisitor<>() {
>          public FileVisitResult visitFile(Path file,
> BasicFileAttributes attrs) throws IOException {
>            return FileVisitResult.CONTINUE;
>          }
>        });
>      }
>    }
>  }
> }
> EOF
> # and run it.
> java Test.java
> 
> The above ends with:
> 
> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig ->841cd111
>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
>        at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
>        at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
>        at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
>        at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
>        at Test.main(Test.java:10)
> 
> The fix is relatively simple I think - the current code assumes entry
> offset is < 4GB when parsing extra data for
> ID 0x5455. This should be evaluated lazily after all extra data blocks
> are parsed to ensure the file offset is correctly
> updated based on block ID 0x0001, regardless of its ordering within
> other extra data blocks.
> 
> 
> 
> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com> wrote:
>> 
>> 
>> This looks like a legitimate bug to me (the zip file system implementation is sensitive to the order of extra parameters and may throw an unexpected error
>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no permissions - but I think it'd be worth adding one?
>> 
>> Dawid
>> 
>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com> wrote:
>>> 
>>> Hi Lance,
>>> 
>>> Yes, this is exactly the point where the problem is in JDK code. It's directly
>>> related to the zip entry beyond max int offset. The code tries to read data
>>> from a local zip entry header at locoff, here:
>>> 
>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>>> 
>>> but the locoff is set to ~0 as per the spec for files exceeding
>>> 4 gigabytes, which says:
>>> 
>>>   4.4.16 relative offset of local header: (4 bytes)
>>> 
>>>       This is the offset from the start of the first disk on
>>>       which this file appears, to where the local header SHOULD
>>>       be found.  If an archive is in ZIP64 format and the value
>>>       in this field is 0xFFFFFFFF, the size will be in the
>>>       corresponding 8 byte zip64 extended information extra field.
>>> 
>>> A proper fix would be to read the local header from zip64 extra data
>>> first. I don't know how this interferes with the rest of the code though -
>>> didn't have enough time to look at it. As it stands, zip entries
>>> beyond 4GB cause
>>> an unchecked exception while attribute-scanning. This simple snippet
>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>>> 
>>> try (FileSystem fs =
>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>>>  for (Path root : fs.getRootDirectories()) {
>>>    Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>>      @Override
>>>      public FileVisitResult visitFile(Path file, BasicFileAttributes
>>> attrs) throws IOException {
>>>        return FileVisitResult.CONTINUE;
>>>      }
>>>    });
>>>  }
>>> }
>>> 
>>> The walkFileTree method is key here as it attempts to harvest
>>> attributes (why we use it is another story -- this saves a lot of time
>>> on large, network-mounted regular directories when you're collecting
>>> file metadata).
>>> 
>>> 
>>> Dawid
>>> 
>>> 
>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
>>> <LANCE.ANDERSEN at oracle.com> wrote:
>>>> 
>>>> Hi David,
>>>> 
>>>> From  a quick look at ZipFileSystem this appears to be an optimization to avoid looking at the LOC extended Timestamp Extra field
>>>> 
>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>>>> 
>>>>  If the "zipinfo-time" entry was set  to “false” in the Map specified when creating the Zip FileSystem,
>>>> 
>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>>>> 
>>>> and the  data size of the CEN extended Timestamp is 5 (flag + mod time)
>>>> 
>>>> The modified time is used from the CEN Extended Timestamp extra field
>>>> 
>>>> Otherwise get the modified time, creation time, and access time from the LOC Extended Timestamp extra field.
>>>> 
>>>> 
>>>> —————
>>>> 
>>>> Extended Timestamp Extra Field:
>>>> 
>>>>          ==============================
>>>> 
>>>>          The following is the layout of the extended-timestamp extra block.
>>>>          (Last Revision 19970118)
>>>> 
>>>>          Local-header version:
>>>> 
>>>>          Value         Size        Description
>>>>          -----         ----        -----------
>>>>  (time)  0x5455        Short       tag for this extra block type ("UT")
>>>>          TSize         Short       total data size for this block
>>>>          Flags         Byte        info bits
>>>>          (ModTime)     Long        time of last modification (UTC/GMT)
>>>>          (AcTime)      Long        time of last access (UTC/GMT)
>>>>          (CrTime)      Long        time of original creation (UTC/GMT)
>>>> 
>>>>          Central-header version:
>>>> 
>>>>          Value         Size        Description
>>>>          -----         ----        -----------
>>>>  (time)  0x5455        Short       tag for this extra block type ("UT")
>>>>          TSize         Short       total data size for this block
>>>>          Flags         Byte        info bits (refers to local header!)
>>>>          (ModTime)     Long        time of last modification (UTC/GMT)
>>>> 
>>>>          The central-header extra field contains the modification time only,
>>>>          or no timestamp at all.  TSize is used to flag its presence or
>>>>          absence.  But note:
>>>> 
>>>>              If "Flags" indicates that Modtime is present in the local header
>>>>              field, it MUST be present in the central header field, too!
>>>>              This correspondence is required because the modification time
>>>>              value may be used to support trans-timezone freshening and
>>>>              updating operations with zip archives.
>>>> 
>>>>          The time values are in standard Unix signed-long format, indicating
>>>>          the number of seconds since 1 January 1970 00:00:00.  The times
>>>>          are relative to Coordinated Universal Time (UTC), also sometimes
>>>>          referred to as Greenwich Mean Time (GMT).  To convert to local time,
>>>>          the software must know the local timezone offset from UTC/GMT.
>>>> 
>>>>          The lower three bits of Flags in both headers indicate which time-
>>>>          stamps are present in the LOCAL extra field:
>>>> 
>>>>                bit 0           if set, modification time is present
>>>>                bit 1           if set, access time is present
>>>>                bit 2           if set, creation time is present
>>>>                bits 3-7        reserved for additional timestamps; not set
>>>> 
>>>>          Those times that are present will appear in the order indicated, but
>>>>          any combination of times may be omitted.  (Creation time may be
>>>>          present without access time, for example.)  TSize should equal
>>>>          (1 + 4*(number of set bits in Flags)), as the block is currently
>>>>          defined.  Other timestamps may be added in the future.
>>>> 
>>>> 
>>>> --------------
>>>> 
>>>> It's hard to comment on why you received the error that you did but it is possible the tool that was used for writing the entry  did something unexpected.
>>>> 
>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the entry?
>>>> 
>>>> 
>>>> Best,
>>>> Lance
>>>> 
>>>> 
>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
>>>> entries are intact) that causes a runtime exception when scanning its
>>>> contents with ZipFileSystem. The exception indicates an invalid
>>>> signature when parsing EXTID_EXTT. I don't quite understand this
>>>> comment in the code:
>>>> 
>>>> case EXTID_EXTT:
>>>>   // spec says the Extened timestamp in cen only has mtime
>>>>   // need to read the loc to get the extra a/ctime, if flag
>>>>   // "zipinfo-time" is not specified to false;
>>>>   // there is performance cost (move up to loc and read) to
>>>>   // access the loc table foreach entry;
>>>>   if (zipfs.noExtt) {
>>>>       if (sz == 5)
>>>>           mtime = unixToJavaTime(LG(extra, pos + 1));
>>>>        break;
>>>>   }
>>>> ...
>>>> 
>>>> but this ZIP file has the extra data block of exactly 5 bytes, as
>>>> indicated by zipinfo:
>>>> 
>>>> Central directory entry #6:
>>>> ---------------------------
>>>> ...
>>>> file system or operating system of origin:      Unix
>>>> version of encoding software:                   3.0
>>>> minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
>>>> minimum software version required to extract:   2.0
>>>> compression method:                             deflated
>>>> ...
>>>> extended local header:                          no
>>>> file last modified on (DOS date/time):          2018 Mar 1 04:56:20
>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
>>>> ...
>>>> Unix file attributes (100000 octal):            ----------
>>>> MS-DOS file attributes (01 hex):                read-only
>>>> 
>>>> The central-directory extra field contains:
>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>>>   The local extra field has UTC/GMT modification/access times.
>>>> 
>>>> The above conditional block checking for length == 5 would have worked
>>>> in ZipFileSystem but it's surrounded by a condition over an
>>>> externally-provided property - zipfs.noExtt is only set to true if:
>>>> 
>>>> this.noExtt = "false".equals(env.get("zipinfo-time"));
>>>> 
>>>> I can't share this particular ZIP file with you and I don't know how
>>>> it was created but it seems like that "zipinfo-time" flag could be
>>>> omitted if the length of the extra data field is exactly 5?
>>>> 
>>>> Dawid
>>>> 
>>>> 
>>>> 
>>>> Best
>>>> Lance
>>>> ------------------
>>>> 
>>>> 
>>>> 
>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>>> Oracle Java Engineering
>>>> 1 Network Drive
>>>> Burlington, MA 01803
>>>> Lance.Andersen at oracle.com
>>>> 
>>>> 
>>>> 
>>>> 


Best
Lance
------------------




Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering 
1 Network Drive 
Burlington, MA 01803
Lance.Andersen at oracle.com






More information about the core-libs-dev mailing list