"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)

Lance Andersen LANCE.ANDERSEN at ORACLE.COM
Sat Nov 28 15:32:05 UTC 2020


Hi Dawid,

Thank you for the follow up.

> On Nov 28, 2020, at 9:34 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
> 
> 
> I know what the problem is and I know what the fix should be (and tried to convey it) but I don't have a patch.

Given you believe you know what the fix would be,  it would be great to create a patch for ZipFileSystem and see if it addresses the issue and then contribute the fix back to the OpenJDK community :-).


> Writing the test case is not a problem but it'll be a lengthy test on slower drives (has to create a zip file that exceeds 4gb of disk space).

The test case would be configured as a manual test and would look similar to test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java <https://github.com/openjdk/jdk/pull/987#diff-73b401685399c9d59851cda49e9c0d377093086119555708253c8df99649e3c9> which also creates a 4GB zip.

> I don't think there is a way to fake the file system by using a sparse file, sadly. Would this be acceptable? If so then sure - I can provide a reproducible test case that will fail for the current implementation.

The above test runs in a reasonable time so I would expect this test case would as well.

I will look to reproduce the issue over the weekend or early next week and log a bug with what you provided in your earlier email.


Best
Lance
> 
> D.
> 
> On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
> Hi Dawid,
> 
> If you believe you have a a fix, please submit a patch along with a JTREG test case.  Ideally the test case would create zip file as part of the test.
> 
> Best
> Lance
> 
>> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>> 
>> Just for the archives - I'm not sure if this ended up being filed to
>> the bug system - the repro below demonstrates the bug on all JDKs up
>> to the newest one.
>> 
>> # create a single random file of 250 megabytes
>> head -c 250M < /dev/urandom > rnd.bin
>> # create a bunch of hardlinks to the same file.
>> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
>> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
>> zip -0 archive.zip rnd*.bin
>> # show the content of the archive - for all entries beyond 4gb this should
>> # show extra data blocks like below (note the order of extra data
>> blocks - this is important).
>> #
>> #  The central-directory extra field contains:
>> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>> #   The local extra field has UTC/GMT modification/access times.
>> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
>> #   01 04 ea 03 00 00 04 ea 03 00 00.
>> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
>> #  a4 06 a0 86 01 00 00 00.
>> #
>> zipinfo -v archive.zip
>> 
>> # Bug repro code:
>> cat > Test.java <<EOF
>> import java.io <https://urldefense.com/v3/__http://java.io__;!!GqivPVa7Brio!KhvWnawkon4E5sVeCpHmwDPJusjNPLMtqPh-ZBleD-U4Dv6tmWKB3cq3sT68yelqdA$>.*;
>> import java.nio.*;
>> import java.nio.file.*;
>> import java.nio.file.attribute.*;
>> 
>> public class Test {
>>  public static void main(String[]args) throws IOException {
>>    try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) {
>>      for (Path root : fs.getRootDirectories()) {
>>        Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>          public FileVisitResult visitFile(Path file,
>> BasicFileAttributes attrs) throws IOException {
>>            return FileVisitResult.CONTINUE;
>>          }
>>        });
>>      }
>>    }
>>  }
>> }
>> EOF
>> # and run it.
>> java Test.java
>> 
>> The above ends with:
>> 
>> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig ->841cd111
>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
>>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
>>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
>>        at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
>>        at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
>>        at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
>>        at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
>>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
>>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
>>        at Test.main(Test.java:10)
>> 
>> The fix is relatively simple I think - the current code assumes entry
>> offset is < 4GB when parsing extra data for
>> ID 0x5455. This should be evaluated lazily after all extra data blocks
>> are parsed to ensure the file offset is correctly
>> updated based on block ID 0x0001, regardless of its ordering within
>> other extra data blocks.
>> 
>> 
>> 
>> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>> 
>>> 
>>> This looks like a legitimate bug to me (the zip file system implementation is sensitive to the order of extra parameters and may throw an unexpected error
>>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no permissions - but I think it'd be worth adding one?
>>> 
>>> Dawid
>>> 
>>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>> 
>>>> Hi Lance,
>>>> 
>>>> Yes, this is exactly the point where the problem is in JDK code. It's directly
>>>> related to the zip entry beyond max int offset. The code tries to read data
>>>> from a local zip entry header at locoff, here:
>>>> 
>>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>>>> 
>>>> but the locoff is set to ~0 as per the spec for files exceeding
>>>> 4 gigabytes, which says:
>>>> 
>>>>   4.4.16 relative offset of local header: (4 bytes)
>>>> 
>>>>       This is the offset from the start of the first disk on
>>>>       which this file appears, to where the local header SHOULD
>>>>       be found.  If an archive is in ZIP64 format and the value
>>>>       in this field is 0xFFFFFFFF, the size will be in the
>>>>       corresponding 8 byte zip64 extended information extra field.
>>>> 
>>>> A proper fix would be to read the local header from zip64 extra data
>>>> first. I don't know how this interferes with the rest of the code though -
>>>> didn't have enough time to look at it. As it stands, zip entries
>>>> beyond 4GB cause
>>>> an unchecked exception while attribute-scanning. This simple snippet
>>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>>>> 
>>>> try (FileSystem fs =
>>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>>>>  for (Path root : fs.getRootDirectories()) {
>>>>    Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>>>      @Override
>>>>      public FileVisitResult visitFile(Path file, BasicFileAttributes
>>>> attrs) throws IOException {
>>>>        return FileVisitResult.CONTINUE;
>>>>      }
>>>>    });
>>>>  }
>>>> }
>>>> 
>>>> The walkFileTree method is key here as it attempts to harvest
>>>> attributes (why we use it is another story -- this saves a lot of time
>>>> on large, network-mounted regular directories when you're collecting
>>>> file metadata).
>>>> 
>>>> 
>>>> Dawid
>>>> 
>>>> 
>>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
>>>> <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
>>>>> 
>>>>> Hi David,
>>>>> 
>>>>> From  a quick look at ZipFileSystem this appears to be an optimization to avoid looking at the LOC extended Timestamp Extra field
>>>>> 
>>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>>>>> 
>>>>>  If the "zipinfo-time" entry was set  to “false” in the Map specified when creating the Zip FileSystem,
>>>>> 
>>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>>>>> 
>>>>> and the  data size of the CEN extended Timestamp is 5 (flag + mod time)
>>>>> 
>>>>> The modified time is used from the CEN Extended Timestamp extra field
>>>>> 
>>>>> Otherwise get the modified time, creation time, and access time from the LOC Extended Timestamp extra field.
>>>>> 
>>>>> 
>>>>> —————
>>>>> 
>>>>> Extended Timestamp Extra Field:
>>>>> 
>>>>>          ==============================
>>>>> 
>>>>>          The following is the layout of the extended-timestamp extra block.
>>>>>          (Last Revision 19970118)
>>>>> 
>>>>>          Local-header version:
>>>>> 
>>>>>          Value         Size        Description
>>>>>          -----         ----        -----------
>>>>>  (time)  0x5455        Short       tag for this extra block type ("UT")
>>>>>          TSize         Short       total data size for this block
>>>>>          Flags         Byte        info bits
>>>>>          (ModTime)     Long        time of last modification (UTC/GMT)
>>>>>          (AcTime)      Long        time of last access (UTC/GMT)
>>>>>          (CrTime)      Long        time of original creation (UTC/GMT)
>>>>> 
>>>>>          Central-header version:
>>>>> 
>>>>>          Value         Size        Description
>>>>>          -----         ----        -----------
>>>>>  (time)  0x5455        Short       tag for this extra block type ("UT")
>>>>>          TSize         Short       total data size for this block
>>>>>          Flags         Byte        info bits (refers to local header!)
>>>>>          (ModTime)     Long        time of last modification (UTC/GMT)
>>>>> 
>>>>>          The central-header extra field contains the modification time only,
>>>>>          or no timestamp at all.  TSize is used to flag its presence or
>>>>>          absence.  But note:
>>>>> 
>>>>>              If "Flags" indicates that Modtime is present in the local header
>>>>>              field, it MUST be present in the central header field, too!
>>>>>              This correspondence is required because the modification time
>>>>>              value may be used to support trans-timezone freshening and
>>>>>              updating operations with zip archives.
>>>>> 
>>>>>          The time values are in standard Unix signed-long format, indicating
>>>>>          the number of seconds since 1 January 1970 00:00:00.  The times
>>>>>          are relative to Coordinated Universal Time (UTC), also sometimes
>>>>>          referred to as Greenwich Mean Time (GMT).  To convert to local time,
>>>>>          the software must know the local timezone offset from UTC/GMT.
>>>>> 
>>>>>          The lower three bits of Flags in both headers indicate which time-
>>>>>          stamps are present in the LOCAL extra field:
>>>>> 
>>>>>                bit 0           if set, modification time is present
>>>>>                bit 1           if set, access time is present
>>>>>                bit 2           if set, creation time is present
>>>>>                bits 3-7        reserved for additional timestamps; not set
>>>>> 
>>>>>          Those times that are present will appear in the order indicated, but
>>>>>          any combination of times may be omitted.  (Creation time may be
>>>>>          present without access time, for example.)  TSize should equal
>>>>>          (1 + 4*(number of set bits in Flags)), as the block is currently
>>>>>          defined.  Other timestamps may be added in the future.
>>>>> 
>>>>> 
>>>>> --------------
>>>>> 
>>>>> It's hard to comment on why you received the error that you did but it is possible the tool that was used for writing the entry  did something unexpected.
>>>>> 
>>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the entry?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Lance
>>>>> 
>>>>> 
>>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
>>>>> entries are intact) that causes a runtime exception when scanning its
>>>>> contents with ZipFileSystem. The exception indicates an invalid
>>>>> signature when parsing EXTID_EXTT. I don't quite understand this
>>>>> comment in the code:
>>>>> 
>>>>> case EXTID_EXTT:
>>>>>   // spec says the Extened timestamp in cen only has mtime
>>>>>   // need to read the loc to get the extra a/ctime, if flag
>>>>>   // "zipinfo-time" is not specified to false;
>>>>>   // there is performance cost (move up to loc and read) to
>>>>>   // access the loc table foreach entry;
>>>>>   if (zipfs.noExtt) {
>>>>>       if (sz == 5)
>>>>>           mtime = unixToJavaTime(LG(extra, pos + 1));
>>>>>        break;
>>>>>   }
>>>>> ...
>>>>> 
>>>>> but this ZIP file has the extra data block of exactly 5 bytes, as
>>>>> indicated by zipinfo:
>>>>> 
>>>>> Central directory entry #6:
>>>>> ---------------------------
>>>>> ...
>>>>> file system or operating system of origin:      Unix
>>>>> version of encoding software:                   3.0
>>>>> minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
>>>>> minimum software version required to extract:   2.0
>>>>> compression method:                             deflated
>>>>> ...
>>>>> extended local header:                          no
>>>>> file last modified on (DOS date/time):          2018 Mar 1 04:56:20
>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
>>>>> ...
>>>>> Unix file attributes (100000 octal):            ----------
>>>>> MS-DOS file attributes (01 hex):                read-only
>>>>> 
>>>>> The central-directory extra field contains:
>>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>>>>   The local extra field has UTC/GMT modification/access times.
>>>>> 
>>>>> The above conditional block checking for length == 5 would have worked
>>>>> in ZipFileSystem but it's surrounded by a condition over an
>>>>> externally-provided property - zipfs.noExtt is only set to true if:
>>>>> 
>>>>> this.noExtt = "false".equals(env.get("zipinfo-time"));
>>>>> 
>>>>> I can't share this particular ZIP file with you and I don't know how
>>>>> it was created but it seems like that "zipinfo-time" flag could be
>>>>> omitted if the length of the extra data field is exactly 5?
>>>>> 
>>>>> Dawid
>>>>> 
>>>>> 
>>>>> 
>>>>> Best
>>>>> Lance
>>>>> ------------------
>>>>> 
>>>>> 
>>>>> 
>>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>>>> Oracle Java Engineering
>>>>> 1 Network Drive
>>>>> Burlington, MA 01803
>>>>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
> 
> 
> Best
> Lance
> ------------------
> 
> <oracle_sig_logo.gif>
> 
> 
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering 
> 1 Network Drive 
> Burlington, MA 01803
> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>

Best
Lance
------------------




Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering 
1 Network Drive 
Burlington, MA 01803
Lance.Andersen at oracle.com






More information about the core-libs-dev mailing list