"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)
Lance Andersen
LANCE.ANDERSEN at ORACLE.COM
Sat Nov 28 15:32:05 UTC 2020
Hi Dawid,
Thank you for the follow up.
> On Nov 28, 2020, at 9:34 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
>
> I know what the problem is and I know what the fix should be (and tried to convey it) but I don't have a patch.
Given you believe you know what the fix would be, it would be great to create a patch for ZipFileSystem and see if it addresses the issue and then contribute the fix back to the OpenJDK community :-).
> Writing the test case is not a problem but it'll be a lengthy test on slower drives (has to create a zip file that exceeds 4gb of disk space).
The test case would be configured as a manual test and would look similar to test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java <https://github.com/openjdk/jdk/pull/987#diff-73b401685399c9d59851cda49e9c0d377093086119555708253c8df99649e3c9> which also creates a 4GB zip.
> I don't think there is a way to fake the file system by using a sparse file, sadly. Would this be acceptable? If so then sure - I can provide a reproducible test case that will fail for the current implementation.
The above test runs in a reasonable time so I would expect this test case would as well.
I will look to reproduce the issue over the weekend or early next week and log a bug with what you provided in your earlier email.
Best
Lance
>
> D.
>
> On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
> Hi Dawid,
>
> If you believe you have a a fix, please submit a patch along with a JTREG test case. Ideally the test case would create zip file as part of the test.
>
> Best
> Lance
>
>> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>
>> Just for the archives - I'm not sure if this ended up being filed to
>> the bug system - the repro below demonstrates the bug on all JDKs up
>> to the newest one.
>>
>> # create a single random file of 250 megabytes
>> head -c 250M < /dev/urandom > rnd.bin
>> # create a bunch of hardlinks to the same file.
>> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
>> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
>> zip -0 archive.zip rnd*.bin
>> # show the content of the archive - for all entries beyond 4gb this should
>> # show extra data blocks like below (note the order of extra data
>> blocks - this is important).
>> #
>> # The central-directory extra field contains:
>> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>> # The local extra field has UTC/GMT modification/access times.
>> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
>> # 01 04 ea 03 00 00 04 ea 03 00 00.
>> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
>> # a4 06 a0 86 01 00 00 00.
>> #
>> zipinfo -v archive.zip
>>
>> # Bug repro code:
>> cat > Test.java <<EOF
>> import java.io <https://urldefense.com/v3/__http://java.io__;!!GqivPVa7Brio!KhvWnawkon4E5sVeCpHmwDPJusjNPLMtqPh-ZBleD-U4Dv6tmWKB3cq3sT68yelqdA$>.*;
>> import java.nio.*;
>> import java.nio.file.*;
>> import java.nio.file.attribute.*;
>>
>> public class Test {
>> public static void main(String[]args) throws IOException {
>> try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) {
>> for (Path root : fs.getRootDirectories()) {
>> Files.walkFileTree(root, new SimpleFileVisitor<>() {
>> public FileVisitResult visitFile(Path file,
>> BasicFileAttributes attrs) throws IOException {
>> return FileVisitResult.CONTINUE;
>> }
>> });
>> }
>> }
>> }
>> }
>> EOF
>> # and run it.
>> java Test.java
>>
>> The above ends with:
>>
>> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig ->841cd111
>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
>> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
>> at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
>> at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
>> at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
>> at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
>> at Test.main(Test.java:10)
>>
>> The fix is relatively simple I think - the current code assumes entry
>> offset is < 4GB when parsing extra data for
>> ID 0x5455. This should be evaluated lazily after all extra data blocks
>> are parsed to ensure the file offset is correctly
>> updated based on block ID 0x0001, regardless of its ordering within
>> other extra data blocks.
>>
>>
>>
>> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>
>>>
>>> This looks like a legitimate bug to me (the zip file system implementation is sensitive to the order of extra parameters and may throw an unexpected error
>>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no permissions - but I think it'd be worth adding one?
>>>
>>> Dawid
>>>
>>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>
>>>> Hi Lance,
>>>>
>>>> Yes, this is exactly the point where the problem is in JDK code. It's directly
>>>> related to the zip entry beyond max int offset. The code tries to read data
>>>> from a local zip entry header at locoff, here:
>>>>
>>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>>>>
>>>> but the locoff is set to ~0 as per the spec for files exceeding
>>>> 4 gigabytes, which says:
>>>>
>>>> 4.4.16 relative offset of local header: (4 bytes)
>>>>
>>>> This is the offset from the start of the first disk on
>>>> which this file appears, to where the local header SHOULD
>>>> be found. If an archive is in ZIP64 format and the value
>>>> in this field is 0xFFFFFFFF, the size will be in the
>>>> corresponding 8 byte zip64 extended information extra field.
>>>>
>>>> A proper fix would be to read the local header from zip64 extra data
>>>> first. I don't know how this interferes with the rest of the code though -
>>>> didn't have enough time to look at it. As it stands, zip entries
>>>> beyond 4GB cause
>>>> an unchecked exception while attribute-scanning. This simple snippet
>>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>>>>
>>>> try (FileSystem fs =
>>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>>>> for (Path root : fs.getRootDirectories()) {
>>>> Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>>> @Override
>>>> public FileVisitResult visitFile(Path file, BasicFileAttributes
>>>> attrs) throws IOException {
>>>> return FileVisitResult.CONTINUE;
>>>> }
>>>> });
>>>> }
>>>> }
>>>>
>>>> The walkFileTree method is key here as it attempts to harvest
>>>> attributes (why we use it is another story -- this saves a lot of time
>>>> on large, network-mounted regular directories when you're collecting
>>>> file metadata).
>>>>
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
>>>> <LANCE.ANDERSEN at oracle.com <mailto:LANCE.ANDERSEN at oracle.com>> wrote:
>>>>>
>>>>> Hi David,
>>>>>
>>>>> From a quick look at ZipFileSystem this appears to be an optimization to avoid looking at the LOC extended Timestamp Extra field
>>>>>
>>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>>>>>
>>>>> If the "zipinfo-time" entry was set to “false” in the Map specified when creating the Zip FileSystem,
>>>>>
>>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>>>>>
>>>>> and the data size of the CEN extended Timestamp is 5 (flag + mod time)
>>>>>
>>>>> The modified time is used from the CEN Extended Timestamp extra field
>>>>>
>>>>> Otherwise get the modified time, creation time, and access time from the LOC Extended Timestamp extra field.
>>>>>
>>>>>
>>>>> —————
>>>>>
>>>>> Extended Timestamp Extra Field:
>>>>>
>>>>> ==============================
>>>>>
>>>>> The following is the layout of the extended-timestamp extra block.
>>>>> (Last Revision 19970118)
>>>>>
>>>>> Local-header version:
>>>>>
>>>>> Value Size Description
>>>>> ----- ---- -----------
>>>>> (time) 0x5455 Short tag for this extra block type ("UT")
>>>>> TSize Short total data size for this block
>>>>> Flags Byte info bits
>>>>> (ModTime) Long time of last modification (UTC/GMT)
>>>>> (AcTime) Long time of last access (UTC/GMT)
>>>>> (CrTime) Long time of original creation (UTC/GMT)
>>>>>
>>>>> Central-header version:
>>>>>
>>>>> Value Size Description
>>>>> ----- ---- -----------
>>>>> (time) 0x5455 Short tag for this extra block type ("UT")
>>>>> TSize Short total data size for this block
>>>>> Flags Byte info bits (refers to local header!)
>>>>> (ModTime) Long time of last modification (UTC/GMT)
>>>>>
>>>>> The central-header extra field contains the modification time only,
>>>>> or no timestamp at all. TSize is used to flag its presence or
>>>>> absence. But note:
>>>>>
>>>>> If "Flags" indicates that Modtime is present in the local header
>>>>> field, it MUST be present in the central header field, too!
>>>>> This correspondence is required because the modification time
>>>>> value may be used to support trans-timezone freshening and
>>>>> updating operations with zip archives.
>>>>>
>>>>> The time values are in standard Unix signed-long format, indicating
>>>>> the number of seconds since 1 January 1970 00:00:00. The times
>>>>> are relative to Coordinated Universal Time (UTC), also sometimes
>>>>> referred to as Greenwich Mean Time (GMT). To convert to local time,
>>>>> the software must know the local timezone offset from UTC/GMT.
>>>>>
>>>>> The lower three bits of Flags in both headers indicate which time-
>>>>> stamps are present in the LOCAL extra field:
>>>>>
>>>>> bit 0 if set, modification time is present
>>>>> bit 1 if set, access time is present
>>>>> bit 2 if set, creation time is present
>>>>> bits 3-7 reserved for additional timestamps; not set
>>>>>
>>>>> Those times that are present will appear in the order indicated, but
>>>>> any combination of times may be omitted. (Creation time may be
>>>>> present without access time, for example.) TSize should equal
>>>>> (1 + 4*(number of set bits in Flags)), as the block is currently
>>>>> defined. Other timestamps may be added in the future.
>>>>>
>>>>>
>>>>> --------------
>>>>>
>>>>> It's hard to comment on why you received the error that you did but it is possible the tool that was used for writing the entry did something unexpected.
>>>>>
>>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the entry?
>>>>>
>>>>>
>>>>> Best,
>>>>> Lance
>>>>>
>>>>>
>>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com <mailto:dawid.weiss at gmail.com>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
>>>>> entries are intact) that causes a runtime exception when scanning its
>>>>> contents with ZipFileSystem. The exception indicates an invalid
>>>>> signature when parsing EXTID_EXTT. I don't quite understand this
>>>>> comment in the code:
>>>>>
>>>>> case EXTID_EXTT:
>>>>> // spec says the Extened timestamp in cen only has mtime
>>>>> // need to read the loc to get the extra a/ctime, if flag
>>>>> // "zipinfo-time" is not specified to false;
>>>>> // there is performance cost (move up to loc and read) to
>>>>> // access the loc table foreach entry;
>>>>> if (zipfs.noExtt) {
>>>>> if (sz == 5)
>>>>> mtime = unixToJavaTime(LG(extra, pos + 1));
>>>>> break;
>>>>> }
>>>>> ...
>>>>>
>>>>> but this ZIP file has the extra data block of exactly 5 bytes, as
>>>>> indicated by zipinfo:
>>>>>
>>>>> Central directory entry #6:
>>>>> ---------------------------
>>>>> ...
>>>>> file system or operating system of origin: Unix
>>>>> version of encoding software: 3.0
>>>>> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
>>>>> minimum software version required to extract: 2.0
>>>>> compression method: deflated
>>>>> ...
>>>>> extended local header: no
>>>>> file last modified on (DOS date/time): 2018 Mar 1 04:56:20
>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
>>>>> ...
>>>>> Unix file attributes (100000 octal): ----------
>>>>> MS-DOS file attributes (01 hex): read-only
>>>>>
>>>>> The central-directory extra field contains:
>>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>>>> The local extra field has UTC/GMT modification/access times.
>>>>>
>>>>> The above conditional block checking for length == 5 would have worked
>>>>> in ZipFileSystem but it's surrounded by a condition over an
>>>>> externally-provided property - zipfs.noExtt is only set to true if:
>>>>>
>>>>> this.noExtt = "false".equals(env.get("zipinfo-time"));
>>>>>
>>>>> I can't share this particular ZIP file with you and I don't know how
>>>>> it was created but it seems like that "zipinfo-time" flag could be
>>>>> omitted if the length of the extra data field is exactly 5?
>>>>>
>>>>> Dawid
>>>>>
>>>>>
>>>>>
>>>>> Best
>>>>> Lance
>>>>> ------------------
>>>>>
>>>>>
>>>>>
>>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>>>> Oracle Java Engineering
>>>>> 1 Network Drive
>>>>> Burlington, MA 01803
>>>>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>>>>
>>>>>
>>>>>
>>>>>
>
>
> Best
> Lance
> ------------------
>
> <oracle_sig_logo.gif>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
Best
Lance
------------------
Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com
More information about the core-libs-dev
mailing list