"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)
Lance Andersen
LANCE.ANDERSEN at ORACLE.COM
Sun Nov 29 22:22:15 UTC 2020
Hi Dawid
Thank you for getting back to me.
> On Nov 29, 2020, at 5:00 PM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
>
> Hi Lance,
>
> I looked at the test you pointed to... This is exactly the test for the problem I originally reported on October 21st... The bug has been filed
> for this on October 25th (by you) -- https://bugs.openjdk.java.net/browse/JDK-8255380 <https://bugs.openjdk.java.net/browse/JDK-8255380> and the fix is in commit 6606e0909cbda9. Weird.
>
> I think that test is slightly broken in that it passes "zipinfo-time=false" -- this is exactly the workaround that makes it work everywhere. The
> test should *not* have that flag set?
The ZipFileSystem check is case sensitive:
———
this.noExtt = "false".equals(env.get("zipinfo-time"));
————
>
> try (FileSystem fs =
> FileSystems.newFileSystem(Paths.get(ZIP_FILE_NAME), Map.of("zipinfo-time", "False"))) {
>
You are right though, it would be clearer if I change it to Map.of();
Thank you for pointing that out
Best
Lance
> Anyway, the problem is fixed. I think it'd be good to backport to LTS releases too.
>
> Dawid
>
> On Sat, Nov 28, 2020 at 4:32 PM Lance Andersen <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
> Hi Dawid,
>
> Thank you for the follow up.
>
>> On Nov 28, 2020, at 9:34 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>
>>
>> I know what the problem is and I know what the fix should be (and tried to convey it) but I don't have a patch.
>
> Given you believe you know what the fix would be, it would be great to create a patch for ZipFileSystem and see if it addresses the issue and then contribute the fix back to the OpenJDK community :-).
>
>
>> Writing the test case is not a problem but it'll be a lengthy test on slower drives (has to create a zip file that exceeds 4gb of disk space).
>
> The test case would be configured as a manual test and would look similar to test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/987*diff-73b401685399c9d59851cda49e9c0d377093086119555708253c8df99649e3c9__;Iw!!GqivPVa7Brio!O7QI8ncxtRbHUjGhW-GmeXZnM3KYdut-r74YfTVxu6etbBcluUHLqDp6DaLsampu2w$> which also creates a 4GB zip.
>
>> I don't think there is a way to fake the file system by using a sparse file, sadly. Would this be acceptable? If so then sure - I can provide a reproducible test case that will fail for the current implementation.
>
> The above test runs in a reasonable time so I would expect this test case would as well.
>
> I will look to reproduce the issue over the weekend or early next week and log a bug with what you provided in your earlier email.
>
>
> Best
> Lance
>>
>> D.
>>
>> On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
>> Hi Dawid,
>>
>> If you believe you have a a fix, please submit a patch along with a JTREG test case. Ideally the test case would create zip file as part of the test.
>>
>> Best
>> Lance
>>
>>> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>
>>> Just for the archives - I'm not sure if this ended up being filed to
>>> the bug system - the repro below demonstrates the bug on all JDKs up
>>> to the newest one.
>>>
>>> # create a single random file of 250 megabytes
>>> head -c 250M < /dev/urandom > rnd.bin
>>> # create a bunch of hardlinks to the same file.
>>> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
>>> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
>>> zip -0 archive.zip rnd*.bin
>>> # show the content of the archive - for all entries beyond 4gb this should
>>> # show extra data blocks like below (note the order of extra data
>>> blocks - this is important).
>>> #
>>> # The central-directory extra field contains:
>>> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>> # The local extra field has UTC/GMT modification/access times.
>>> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
>>> # 01 04 ea 03 00 00 04 ea 03 00 00.
>>> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
>>> # a4 06 a0 86 01 00 00 00.
>>> #
>>> zipinfo -v archive.zip
>>>
>>> # Bug repro code:
>>> cat > Test.java <<EOF
>>> import java.io <https://urldefense.com/v3/__http://java.io__;!!GqivPVa7Brio!KhvWnawkon4E5sVeCpHmwDPJusjNPLMtqPh-ZBleD-U4Dv6tmWKB3cq3sT68yelqdA$>.*;
>>> import java.nio.*;
>>> import java.nio.file.*;
>>> import java.nio.file.attribute.*;
>>>
>>> public class Test {
>>> public static void main(String[]args) throws IOException {
>>> try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) {
>>> for (Path root : fs.getRootDirectories()) {
>>> Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>> public FileVisitResult visitFile(Path file,
>>> BasicFileAttributes attrs) throws IOException {
>>> return FileVisitResult.CONTINUE;
>>> }
>>> });
>>> }
>>> }
>>> }
>>> }
>>> EOF
>>> # and run it.
>>> java Test.java
>>>
>>> The above ends with:
>>>
>>> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig ->841cd111
>>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
>>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
>>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
>>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
>>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
>>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
>>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
>>> at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
>>> at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
>>> at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
>>> at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
>>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
>>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
>>> at Test.main(Test.java:10)
>>>
>>> The fix is relatively simple I think - the current code assumes entry
>>> offset is < 4GB when parsing extra data for
>>> ID 0x5455. This should be evaluated lazily after all extra data blocks
>>> are parsed to ensure the file offset is correctly
>>> updated based on block ID 0x0001, regardless of its ordering within
>>> other extra data blocks.
>>>
>>>
>>>
>>> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>
>>>>
>>>> This looks like a legitimate bug to me (the zip file system implementation is sensitive to the order of extra parameters and may throw an unexpected error
>>>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no permissions - but I think it'd be worth adding one?
>>>>
>>>> Dawid
>>>>
>>>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>>
>>>>> Hi Lance,
>>>>>
>>>>> Yes, this is exactly the point where the problem is in JDK code. It's directly
>>>>> related to the zip entry beyond max int offset. The code tries to read data
>>>>> from a local zip entry header at locoff, here:
>>>>>
>>>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>>>>>
>>>>> but the locoff is set to ~0 as per the spec for files exceeding
>>>>> 4 gigabytes, which says:
>>>>>
>>>>> 4.4.16 relative offset of local header: (4 bytes)
>>>>>
>>>>> This is the offset from the start of the first disk on
>>>>> which this file appears, to where the local header SHOULD
>>>>> be found. If an archive is in ZIP64 format and the value
>>>>> in this field is 0xFFFFFFFF, the size will be in the
>>>>> corresponding 8 byte zip64 extended information extra field.
>>>>>
>>>>> A proper fix would be to read the local header from zip64 extra data
>>>>> first. I don't know how this interferes with the rest of the code though -
>>>>> didn't have enough time to look at it. As it stands, zip entries
>>>>> beyond 4GB cause
>>>>> an unchecked exception while attribute-scanning. This simple snippet
>>>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>>>>>
>>>>> try (FileSystem fs =
>>>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>>>>> for (Path root : fs.getRootDirectories()) {
>>>>> Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>>>> @Override
>>>>> public FileVisitResult visitFile(Path file, BasicFileAttributes
>>>>> attrs) throws IOException {
>>>>> return FileVisitResult.CONTINUE;
>>>>> }
>>>>> });
>>>>> }
>>>>> }
>>>>>
>>>>> The walkFileTree method is key here as it attempts to harvest
>>>>> attributes (why we use it is another story -- this saves a lot of time
>>>>> on large, network-mounted regular directories when you're collecting
>>>>> file metadata).
>>>>>
>>>>>
>>>>> Dawid
>>>>>
>>>>>
>>>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
>>>>> <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> From a quick look at ZipFileSystem this appears to be an optimization to avoid looking at the LOC extended Timestamp Extra field
>>>>>>
>>>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>>>>>>
>>>>>> If the "zipinfo-time" entry was set to “false” in the Map specified when creating the Zip FileSystem,
>>>>>>
>>>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>>>>>>
>>>>>> and the data size of the CEN extended Timestamp is 5 (flag + mod time)
>>>>>>
>>>>>> The modified time is used from the CEN Extended Timestamp extra field
>>>>>>
>>>>>> Otherwise get the modified time, creation time, and access time from the LOC Extended Timestamp extra field.
>>>>>>
>>>>>>
>>>>>> —————
>>>>>>
>>>>>> Extended Timestamp Extra Field:
>>>>>>
>>>>>> ==============================
>>>>>>
>>>>>> The following is the layout of the extended-timestamp extra block.
>>>>>> (Last Revision 19970118)
>>>>>>
>>>>>> Local-header version:
>>>>>>
>>>>>> Value Size Description
>>>>>> ----- ---- -----------
>>>>>> (time) 0x5455 Short tag for this extra block type ("UT")
>>>>>> TSize Short total data size for this block
>>>>>> Flags Byte info bits
>>>>>> (ModTime) Long time of last modification (UTC/GMT)
>>>>>> (AcTime) Long time of last access (UTC/GMT)
>>>>>> (CrTime) Long time of original creation (UTC/GMT)
>>>>>>
>>>>>> Central-header version:
>>>>>>
>>>>>> Value Size Description
>>>>>> ----- ---- -----------
>>>>>> (time) 0x5455 Short tag for this extra block type ("UT")
>>>>>> TSize Short total data size for this block
>>>>>> Flags Byte info bits (refers to local header!)
>>>>>> (ModTime) Long time of last modification (UTC/GMT)
>>>>>>
>>>>>> The central-header extra field contains the modification time only,
>>>>>> or no timestamp at all. TSize is used to flag its presence or
>>>>>> absence. But note:
>>>>>>
>>>>>> If "Flags" indicates that Modtime is present in the local header
>>>>>> field, it MUST be present in the central header field, too!
>>>>>> This correspondence is required because the modification time
>>>>>> value may be used to support trans-timezone freshening and
>>>>>> updating operations with zip archives.
>>>>>>
>>>>>> The time values are in standard Unix signed-long format, indicating
>>>>>> the number of seconds since 1 January 1970 00:00:00. The times
>>>>>> are relative to Coordinated Universal Time (UTC), also sometimes
>>>>>> referred to as Greenwich Mean Time (GMT). To convert to local time,
>>>>>> the software must know the local timezone offset from UTC/GMT.
>>>>>>
>>>>>> The lower three bits of Flags in both headers indicate which time-
>>>>>> stamps are present in the LOCAL extra field:
>>>>>>
>>>>>> bit 0 if set, modification time is present
>>>>>> bit 1 if set, access time is present
>>>>>> bit 2 if set, creation time is present
>>>>>> bits 3-7 reserved for additional timestamps; not set
>>>>>>
>>>>>> Those times that are present will appear in the order indicated, but
>>>>>> any combination of times may be omitted. (Creation time may be
>>>>>> present without access time, for example.) TSize should equal
>>>>>> (1 + 4*(number of set bits in Flags)), as the block is currently
>>>>>> defined. Other timestamps may be added in the future.
>>>>>>
>>>>>>
>>>>>> --------------
>>>>>>
>>>>>> It's hard to comment on why you received the error that you did but it is possible the tool that was used for writing the entry did something unexpected.
>>>>>>
>>>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the entry?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Lance
>>>>>>
>>>>>>
>>>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
>>>>>> entries are intact) that causes a runtime exception when scanning its
>>>>>> contents with ZipFileSystem. The exception indicates an invalid
>>>>>> signature when parsing EXTID_EXTT. I don't quite understand this
>>>>>> comment in the code:
>>>>>>
>>>>>> case EXTID_EXTT:
>>>>>> // spec says the Extened timestamp in cen only has mtime
>>>>>> // need to read the loc to get the extra a/ctime, if flag
>>>>>> // "zipinfo-time" is not specified to false;
>>>>>> // there is performance cost (move up to loc and read) to
>>>>>> // access the loc table foreach entry;
>>>>>> if (zipfs.noExtt) {
>>>>>> if (sz == 5)
>>>>>> mtime = unixToJavaTime(LG(extra, pos + 1));
>>>>>> break;
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> but this ZIP file has the extra data block of exactly 5 bytes, as
>>>>>> indicated by zipinfo:
>>>>>>
>>>>>> Central directory entry #6:
>>>>>> ---------------------------
>>>>>> ...
>>>>>> file system or operating system of origin: Unix
>>>>>> version of encoding software: 3.0
>>>>>> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
>>>>>> minimum software version required to extract: 2.0
>>>>>> compression method: deflated
>>>>>> ...
>>>>>> extended local header: no
>>>>>> file last modified on (DOS date/time): 2018 Mar 1 04:56:20
>>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
>>>>>> ...
>>>>>> Unix file attributes (100000 octal): ----------
>>>>>> MS-DOS file attributes (01 hex): read-only
>>>>>>
>>>>>> The central-directory extra field contains:
>>>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>>>>> The local extra field has UTC/GMT modification/access times.
>>>>>>
>>>>>> The above conditional block checking for length == 5 would have worked
>>>>>> in ZipFileSystem but it's surrounded by a condition over an
>>>>>> externally-provided property - zipfs.noExtt is only set to true if:
>>>>>>
>>>>>> this.noExtt = "false".equals(env.get("zipinfo-time"));
>>>>>>
>>>>>> I can't share this particular ZIP file with you and I don't know how
>>>>>> it was created but it seems like that "zipinfo-time" flag could be
>>>>>> omitted if the length of the extra data field is exactly 5?
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best
>>>>>> Lance
>>>>>> ------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>>>>> Oracle Java Engineering
>>>>>> 1 Network Drive
>>>>>> Burlington, MA 01803
>>>>>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>
>>
>> Best
>> Lance
>> ------------------
>>
>> <oracle_sig_logo.gif>
>>
>>
>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>
> Best
> Lance
> ------------------
>
> <oracle_sig_logo.gif>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>
>
>
>
Best
Lance
------------------
Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com
More information about the core-libs-dev
mailing list