"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)

Lance Andersen LANCE.ANDERSEN at ORACLE.COM
Sun Nov 29 22:22:15 UTC 2020


Hi Dawid

Thank you for getting back to me.

> On Nov 29, 2020, at 5:00 PM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
> 
> 
> Hi Lance,
> 
> I looked at the test you pointed to... This is exactly the test for the problem I originally reported on October 21st... The bug has been filed
> for this on October 25th (by you) -- https://bugs.openjdk.java.net/browse/JDK-8255380 <https://bugs.openjdk.java.net/browse/JDK-8255380> and the fix is in commit 6606e0909cbda9. Weird.
> 
> I think that test is slightly broken in that it passes "zipinfo-time=false" -- this is exactly the workaround that makes it work everywhere. The
> test should *not* have that flag set?

The ZipFileSystem check is case sensitive:

———
this.noExtt = "false".equals(env.get("zipinfo-time"));
————

> 
>         try (FileSystem fs =
>                      FileSystems.newFileSystem(Paths.get(ZIP_FILE_NAME), Map.of("zipinfo-time", "False"))) {
> 

You are right though, it would be clearer if I change it to Map.of(); 

Thank you for pointing that out

Best
Lance
> Anyway, the problem is fixed. I think it'd be good to backport to LTS releases too.
> 
> Dawid
> 
> On Sat, Nov 28, 2020 at 4:32 PM Lance Andersen <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
> Hi Dawid,
> 
> Thank you for the follow up.
> 
>> On Nov 28, 2020, at 9:34 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>> 
>> 
>> I know what the problem is and I know what the fix should be (and tried to convey it) but I don't have a patch.
> 
> Given you believe you know what the fix would be,  it would be great to create a patch for ZipFileSystem and see if it addresses the issue and then contribute the fix back to the OpenJDK community :-).
> 
> 
>> Writing the test case is not a problem but it'll be a lengthy test on slower drives (has to create a zip file that exceeds 4gb of disk space).
> 
> The test case would be configured as a manual test and would look similar to test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/987*diff-73b401685399c9d59851cda49e9c0d377093086119555708253c8df99649e3c9__;Iw!!GqivPVa7Brio!O7QI8ncxtRbHUjGhW-GmeXZnM3KYdut-r74YfTVxu6etbBcluUHLqDp6DaLsampu2w$> which also creates a 4GB zip.
> 
>> I don't think there is a way to fake the file system by using a sparse file, sadly. Would this be acceptable? If so then sure - I can provide a reproducible test case that will fail for the current implementation.
> 
> The above test runs in a reasonable time so I would expect this test case would as well.
> 
> I will look to reproduce the issue over the weekend or early next week and log a bug with what you provided in your earlier email.
> 
> 
> Best
> Lance
>> 
>> D.
>> 
>> On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
>> Hi Dawid,
>> 
>> If you believe you have a a fix, please submit a patch along with a JTREG test case.  Ideally the test case would create zip file as part of the test.
>> 
>> Best
>> Lance
>> 
>>> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>> 
>>> Just for the archives - I'm not sure if this ended up being filed to
>>> the bug system - the repro below demonstrates the bug on all JDKs up
>>> to the newest one.
>>> 
>>> # create a single random file of 250 megabytes
>>> head -c 250M < /dev/urandom > rnd.bin
>>> # create a bunch of hardlinks to the same file.
>>> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
>>> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
>>> zip -0 archive.zip rnd*.bin
>>> # show the content of the archive - for all entries beyond 4gb this should
>>> # show extra data blocks like below (note the order of extra data
>>> blocks - this is important).
>>> #
>>> #  The central-directory extra field contains:
>>> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>> #   The local extra field has UTC/GMT modification/access times.
>>> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
>>> #   01 04 ea 03 00 00 04 ea 03 00 00.
>>> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
>>> #  a4 06 a0 86 01 00 00 00.
>>> #
>>> zipinfo -v archive.zip
>>> 
>>> # Bug repro code:
>>> cat > Test.java <<EOF
>>> import java.io <https://urldefense.com/v3/__http://java.io__;!!GqivPVa7Brio!KhvWnawkon4E5sVeCpHmwDPJusjNPLMtqPh-ZBleD-U4Dv6tmWKB3cq3sT68yelqdA$>.*;
>>> import java.nio.*;
>>> import java.nio.file.*;
>>> import java.nio.file.attribute.*;
>>> 
>>> public class Test {
>>>  public static void main(String[]args) throws IOException {
>>>    try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) {
>>>      for (Path root : fs.getRootDirectories()) {
>>>        Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>>          public FileVisitResult visitFile(Path file,
>>> BasicFileAttributes attrs) throws IOException {
>>>            return FileVisitResult.CONTINUE;
>>>          }
>>>        });
>>>      }
>>>    }
>>>  }
>>> }
>>> EOF
>>> # and run it.
>>> java Test.java
>>> 
>>> The above ends with:
>>> 
>>> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig ->841cd111
>>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
>>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
>>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
>>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
>>>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
>>>        at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
>>>        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
>>>        at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
>>>        at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
>>>        at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
>>>        at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
>>>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
>>>        at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
>>>        at Test.main(Test.java:10)
>>> 
>>> The fix is relatively simple I think - the current code assumes entry
>>> offset is < 4GB when parsing extra data for
>>> ID 0x5455. This should be evaluated lazily after all extra data blocks
>>> are parsed to ensure the file offset is correctly
>>> updated based on block ID 0x0001, regardless of its ordering within
>>> other extra data blocks.
>>> 
>>> 
>>> 
>>> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>> 
>>>> 
>>>> This looks like a legitimate bug to me (the zip file system implementation is sensitive to the order of extra parameters and may throw an unexpected error
>>>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no permissions - but I think it'd be worth adding one?
>>>> 
>>>> Dawid
>>>> 
>>>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>> 
>>>>> Hi Lance,
>>>>> 
>>>>> Yes, this is exactly the point where the problem is in JDK code. It's directly
>>>>> related to the zip entry beyond max int offset. The code tries to read data
>>>>> from a local zip entry header at locoff, here:
>>>>> 
>>>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>>>>> 
>>>>> but the locoff is set to ~0 as per the spec for files exceeding
>>>>> 4 gigabytes, which says:
>>>>> 
>>>>>   4.4.16 relative offset of local header: (4 bytes)
>>>>> 
>>>>>       This is the offset from the start of the first disk on
>>>>>       which this file appears, to where the local header SHOULD
>>>>>       be found.  If an archive is in ZIP64 format and the value
>>>>>       in this field is 0xFFFFFFFF, the size will be in the
>>>>>       corresponding 8 byte zip64 extended information extra field.
>>>>> 
>>>>> A proper fix would be to read the local header from zip64 extra data
>>>>> first. I don't know how this interferes with the rest of the code though -
>>>>> didn't have enough time to look at it. As it stands, zip entries
>>>>> beyond 4GB cause
>>>>> an unchecked exception while attribute-scanning. This simple snippet
>>>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>>>>> 
>>>>> try (FileSystem fs =
>>>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>>>>>  for (Path root : fs.getRootDirectories()) {
>>>>>    Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>>>>      @Override
>>>>>      public FileVisitResult visitFile(Path file, BasicFileAttributes
>>>>> attrs) throws IOException {
>>>>>        return FileVisitResult.CONTINUE;
>>>>>      }
>>>>>    });
>>>>>  }
>>>>> }
>>>>> 
>>>>> The walkFileTree method is key here as it attempts to harvest
>>>>> attributes (why we use it is another story -- this saves a lot of time
>>>>> on large, network-mounted regular directories when you're collecting
>>>>> file metadata).
>>>>> 
>>>>> 
>>>>> Dawid
>>>>> 
>>>>> 
>>>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
>>>>> <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
>>>>>> 
>>>>>> Hi David,
>>>>>> 
>>>>>> From  a quick look at ZipFileSystem this appears to be an optimization to avoid looking at the LOC extended Timestamp Extra field
>>>>>> 
>>>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>>>>>> 
>>>>>>  If the "zipinfo-time" entry was set  to “false” in the Map specified when creating the Zip FileSystem,
>>>>>> 
>>>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>>>>>> 
>>>>>> and the  data size of the CEN extended Timestamp is 5 (flag + mod time)
>>>>>> 
>>>>>> The modified time is used from the CEN Extended Timestamp extra field
>>>>>> 
>>>>>> Otherwise get the modified time, creation time, and access time from the LOC Extended Timestamp extra field.
>>>>>> 
>>>>>> 
>>>>>> —————
>>>>>> 
>>>>>> Extended Timestamp Extra Field:
>>>>>> 
>>>>>>          ==============================
>>>>>> 
>>>>>>          The following is the layout of the extended-timestamp extra block.
>>>>>>          (Last Revision 19970118)
>>>>>> 
>>>>>>          Local-header version:
>>>>>> 
>>>>>>          Value         Size        Description
>>>>>>          -----         ----        -----------
>>>>>>  (time)  0x5455        Short       tag for this extra block type ("UT")
>>>>>>          TSize         Short       total data size for this block
>>>>>>          Flags         Byte        info bits
>>>>>>          (ModTime)     Long        time of last modification (UTC/GMT)
>>>>>>          (AcTime)      Long        time of last access (UTC/GMT)
>>>>>>          (CrTime)      Long        time of original creation (UTC/GMT)
>>>>>> 
>>>>>>          Central-header version:
>>>>>> 
>>>>>>          Value         Size        Description
>>>>>>          -----         ----        -----------
>>>>>>  (time)  0x5455        Short       tag for this extra block type ("UT")
>>>>>>          TSize         Short       total data size for this block
>>>>>>          Flags         Byte        info bits (refers to local header!)
>>>>>>          (ModTime)     Long        time of last modification (UTC/GMT)
>>>>>> 
>>>>>>          The central-header extra field contains the modification time only,
>>>>>>          or no timestamp at all.  TSize is used to flag its presence or
>>>>>>          absence.  But note:
>>>>>> 
>>>>>>              If "Flags" indicates that Modtime is present in the local header
>>>>>>              field, it MUST be present in the central header field, too!
>>>>>>              This correspondence is required because the modification time
>>>>>>              value may be used to support trans-timezone freshening and
>>>>>>              updating operations with zip archives.
>>>>>> 
>>>>>>          The time values are in standard Unix signed-long format, indicating
>>>>>>          the number of seconds since 1 January 1970 00:00:00.  The times
>>>>>>          are relative to Coordinated Universal Time (UTC), also sometimes
>>>>>>          referred to as Greenwich Mean Time (GMT).  To convert to local time,
>>>>>>          the software must know the local timezone offset from UTC/GMT.
>>>>>> 
>>>>>>          The lower three bits of Flags in both headers indicate which time-
>>>>>>          stamps are present in the LOCAL extra field:
>>>>>> 
>>>>>>                bit 0           if set, modification time is present
>>>>>>                bit 1           if set, access time is present
>>>>>>                bit 2           if set, creation time is present
>>>>>>                bits 3-7        reserved for additional timestamps; not set
>>>>>> 
>>>>>>          Those times that are present will appear in the order indicated, but
>>>>>>          any combination of times may be omitted.  (Creation time may be
>>>>>>          present without access time, for example.)  TSize should equal
>>>>>>          (1 + 4*(number of set bits in Flags)), as the block is currently
>>>>>>          defined.  Other timestamps may be added in the future.
>>>>>> 
>>>>>> 
>>>>>> --------------
>>>>>> 
>>>>>> It's hard to comment on why you received the error that you did but it is possible the tool that was used for writing the entry  did something unexpected.
>>>>>> 
>>>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the entry?
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Lance
>>>>>> 
>>>>>> 
>>>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
>>>>>> entries are intact) that causes a runtime exception when scanning its
>>>>>> contents with ZipFileSystem. The exception indicates an invalid
>>>>>> signature when parsing EXTID_EXTT. I don't quite understand this
>>>>>> comment in the code:
>>>>>> 
>>>>>> case EXTID_EXTT:
>>>>>>   // spec says the Extened timestamp in cen only has mtime
>>>>>>   // need to read the loc to get the extra a/ctime, if flag
>>>>>>   // "zipinfo-time" is not specified to false;
>>>>>>   // there is performance cost (move up to loc and read) to
>>>>>>   // access the loc table foreach entry;
>>>>>>   if (zipfs.noExtt) {
>>>>>>       if (sz == 5)
>>>>>>           mtime = unixToJavaTime(LG(extra, pos + 1));
>>>>>>        break;
>>>>>>   }
>>>>>> ...
>>>>>> 
>>>>>> but this ZIP file has the extra data block of exactly 5 bytes, as
>>>>>> indicated by zipinfo:
>>>>>> 
>>>>>> Central directory entry #6:
>>>>>> ---------------------------
>>>>>> ...
>>>>>> file system or operating system of origin:      Unix
>>>>>> version of encoding software:                   3.0
>>>>>> minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
>>>>>> minimum software version required to extract:   2.0
>>>>>> compression method:                             deflated
>>>>>> ...
>>>>>> extended local header:                          no
>>>>>> file last modified on (DOS date/time):          2018 Mar 1 04:56:20
>>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
>>>>>> ...
>>>>>> Unix file attributes (100000 octal):            ----------
>>>>>> MS-DOS file attributes (01 hex):                read-only
>>>>>> 
>>>>>> The central-directory extra field contains:
>>>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>>>>>   The local extra field has UTC/GMT modification/access times.
>>>>>> 
>>>>>> The above conditional block checking for length == 5 would have worked
>>>>>> in ZipFileSystem but it's surrounded by a condition over an
>>>>>> externally-provided property - zipfs.noExtt is only set to true if:
>>>>>> 
>>>>>> this.noExtt = "false".equals(env.get("zipinfo-time"));
>>>>>> 
>>>>>> I can't share this particular ZIP file with you and I don't know how
>>>>>> it was created but it seems like that "zipinfo-time" flag could be
>>>>>> omitted if the length of the extra data field is exactly 5?
>>>>>> 
>>>>>> Dawid
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Best
>>>>>> Lance
>>>>>> ------------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>>>>> Oracle Java Engineering
>>>>>> 1 Network Drive
>>>>>> Burlington, MA 01803
>>>>>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>> 
>> 
>> Best
>> Lance
>> ------------------
>> 
>> <oracle_sig_logo.gif>
>> 
>> 
>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering 
>> 1 Network Drive 
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
> 
> Best
> Lance
> ------------------
> 
> <oracle_sig_logo.gif>
> 
> 
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering 
> 1 Network Drive 
> Burlington, MA 01803
> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
> 
> 
> 
> 


Best
Lance
------------------




Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering 
1 Network Drive 
Burlington, MA 01803
Lance.Andersen at oracle.com






More information about the core-libs-dev mailing list