"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)
Lance Andersen
LANCE.ANDERSEN at ORACLE.COM
Fri Nov 27 19:37:41 UTC 2020
Hi Dawid,
If you believe you have a a fix, please submit a patch along with a JTREG test case. Ideally the test case would create zip file as part of the test.
Best
Lance
> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
> Just for the archives - I'm not sure if this ended up being filed to
> the bug system - the repro below demonstrates the bug on all JDKs up
> to the newest one.
>
> # create a single random file of 250 megabytes
> head -c 250M < /dev/urandom > rnd.bin
> # create a bunch of hardlinks to the same file.
> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
> zip -0 archive.zip rnd*.bin
> # show the content of the archive - for all entries beyond 4gb this should
> # show extra data blocks like below (note the order of extra data
> blocks - this is important).
> #
> # The central-directory extra field contains:
> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
> # The local extra field has UTC/GMT modification/access times.
> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
> # 01 04 ea 03 00 00 04 ea 03 00 00.
> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
> # a4 06 a0 86 01 00 00 00.
> #
> zipinfo -v archive.zip
>
> # Bug repro code:
> cat > Test.java <<EOF
> import java.io.*;
> import java.nio.*;
> import java.nio.file.*;
> import java.nio.file.attribute.*;
>
> public class Test {
> public static void main(String[]args) throws IOException {
> try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) {
> for (Path root : fs.getRootDirectories()) {
> Files.walkFileTree(root, new SimpleFileVisitor<>() {
> public FileVisitResult visitFile(Path file,
> BasicFileAttributes attrs) throws IOException {
> return FileVisitResult.CONTINUE;
> }
> });
> }
> }
> }
> }
> EOF
> # and run it.
> java Test.java
>
> The above ends with:
>
> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig ->841cd111
> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
> at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
> at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
> at java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
> at java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
> at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
> at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
> at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
> at Test.main(Test.java:10)
>
> The fix is relatively simple I think - the current code assumes entry
> offset is < 4GB when parsing extra data for
> ID 0x5455. This should be evaluated lazily after all extra data blocks
> are parsed to ensure the file offset is correctly
> updated based on block ID 0x0001, regardless of its ordering within
> other extra data blocks.
>
>
>
> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com> wrote:
>>
>>
>> This looks like a legitimate bug to me (the zip file system implementation is sensitive to the order of extra parameters and may throw an unexpected error
>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no permissions - but I think it'd be worth adding one?
>>
>> Dawid
>>
>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com> wrote:
>>>
>>> Hi Lance,
>>>
>>> Yes, this is exactly the point where the problem is in JDK code. It's directly
>>> related to the zip entry beyond max int offset. The code tries to read data
>>> from a local zip entry header at locoff, here:
>>>
>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>>>
>>> but the locoff is set to ~0 as per the spec for files exceeding
>>> 4 gigabytes, which says:
>>>
>>> 4.4.16 relative offset of local header: (4 bytes)
>>>
>>> This is the offset from the start of the first disk on
>>> which this file appears, to where the local header SHOULD
>>> be found. If an archive is in ZIP64 format and the value
>>> in this field is 0xFFFFFFFF, the size will be in the
>>> corresponding 8 byte zip64 extended information extra field.
>>>
>>> A proper fix would be to read the local header from zip64 extra data
>>> first. I don't know how this interferes with the rest of the code though -
>>> didn't have enough time to look at it. As it stands, zip entries
>>> beyond 4GB cause
>>> an unchecked exception while attribute-scanning. This simple snippet
>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>>>
>>> try (FileSystem fs =
>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>>> for (Path root : fs.getRootDirectories()) {
>>> Files.walkFileTree(root, new SimpleFileVisitor<>() {
>>> @Override
>>> public FileVisitResult visitFile(Path file, BasicFileAttributes
>>> attrs) throws IOException {
>>> return FileVisitResult.CONTINUE;
>>> }
>>> });
>>> }
>>> }
>>>
>>> The walkFileTree method is key here as it attempts to harvest
>>> attributes (why we use it is another story -- this saves a lot of time
>>> on large, network-mounted regular directories when you're collecting
>>> file metadata).
>>>
>>>
>>> Dawid
>>>
>>>
>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
>>> <LANCE.ANDERSEN at oracle.com> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> From a quick look at ZipFileSystem this appears to be an optimization to avoid looking at the LOC extended Timestamp Extra field
>>>>
>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>>>>
>>>> If the "zipinfo-time" entry was set to “false” in the Map specified when creating the Zip FileSystem,
>>>>
>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>>>>
>>>> and the data size of the CEN extended Timestamp is 5 (flag + mod time)
>>>>
>>>> The modified time is used from the CEN Extended Timestamp extra field
>>>>
>>>> Otherwise get the modified time, creation time, and access time from the LOC Extended Timestamp extra field.
>>>>
>>>>
>>>> —————
>>>>
>>>> Extended Timestamp Extra Field:
>>>>
>>>> ==============================
>>>>
>>>> The following is the layout of the extended-timestamp extra block.
>>>> (Last Revision 19970118)
>>>>
>>>> Local-header version:
>>>>
>>>> Value Size Description
>>>> ----- ---- -----------
>>>> (time) 0x5455 Short tag for this extra block type ("UT")
>>>> TSize Short total data size for this block
>>>> Flags Byte info bits
>>>> (ModTime) Long time of last modification (UTC/GMT)
>>>> (AcTime) Long time of last access (UTC/GMT)
>>>> (CrTime) Long time of original creation (UTC/GMT)
>>>>
>>>> Central-header version:
>>>>
>>>> Value Size Description
>>>> ----- ---- -----------
>>>> (time) 0x5455 Short tag for this extra block type ("UT")
>>>> TSize Short total data size for this block
>>>> Flags Byte info bits (refers to local header!)
>>>> (ModTime) Long time of last modification (UTC/GMT)
>>>>
>>>> The central-header extra field contains the modification time only,
>>>> or no timestamp at all. TSize is used to flag its presence or
>>>> absence. But note:
>>>>
>>>> If "Flags" indicates that Modtime is present in the local header
>>>> field, it MUST be present in the central header field, too!
>>>> This correspondence is required because the modification time
>>>> value may be used to support trans-timezone freshening and
>>>> updating operations with zip archives.
>>>>
>>>> The time values are in standard Unix signed-long format, indicating
>>>> the number of seconds since 1 January 1970 00:00:00. The times
>>>> are relative to Coordinated Universal Time (UTC), also sometimes
>>>> referred to as Greenwich Mean Time (GMT). To convert to local time,
>>>> the software must know the local timezone offset from UTC/GMT.
>>>>
>>>> The lower three bits of Flags in both headers indicate which time-
>>>> stamps are present in the LOCAL extra field:
>>>>
>>>> bit 0 if set, modification time is present
>>>> bit 1 if set, access time is present
>>>> bit 2 if set, creation time is present
>>>> bits 3-7 reserved for additional timestamps; not set
>>>>
>>>> Those times that are present will appear in the order indicated, but
>>>> any combination of times may be omitted. (Creation time may be
>>>> present without access time, for example.) TSize should equal
>>>> (1 + 4*(number of set bits in Flags)), as the block is currently
>>>> defined. Other timestamps may be added in the future.
>>>>
>>>>
>>>> --------------
>>>>
>>>> It's hard to comment on why you received the error that you did but it is possible the tool that was used for writing the entry did something unexpected.
>>>>
>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the entry?
>>>>
>>>>
>>>> Best,
>>>> Lance
>>>>
>>>>
>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
>>>> entries are intact) that causes a runtime exception when scanning its
>>>> contents with ZipFileSystem. The exception indicates an invalid
>>>> signature when parsing EXTID_EXTT. I don't quite understand this
>>>> comment in the code:
>>>>
>>>> case EXTID_EXTT:
>>>> // spec says the Extened timestamp in cen only has mtime
>>>> // need to read the loc to get the extra a/ctime, if flag
>>>> // "zipinfo-time" is not specified to false;
>>>> // there is performance cost (move up to loc and read) to
>>>> // access the loc table foreach entry;
>>>> if (zipfs.noExtt) {
>>>> if (sz == 5)
>>>> mtime = unixToJavaTime(LG(extra, pos + 1));
>>>> break;
>>>> }
>>>> ...
>>>>
>>>> but this ZIP file has the extra data block of exactly 5 bytes, as
>>>> indicated by zipinfo:
>>>>
>>>> Central directory entry #6:
>>>> ---------------------------
>>>> ...
>>>> file system or operating system of origin: Unix
>>>> version of encoding software: 3.0
>>>> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
>>>> minimum software version required to extract: 2.0
>>>> compression method: deflated
>>>> ...
>>>> extended local header: no
>>>> file last modified on (DOS date/time): 2018 Mar 1 04:56:20
>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
>>>> ...
>>>> Unix file attributes (100000 octal): ----------
>>>> MS-DOS file attributes (01 hex): read-only
>>>>
>>>> The central-directory extra field contains:
>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>>>> The local extra field has UTC/GMT modification/access times.
>>>>
>>>> The above conditional block checking for length == 5 would have worked
>>>> in ZipFileSystem but it's surrounded by a condition over an
>>>> externally-provided property - zipfs.noExtt is only set to true if:
>>>>
>>>> this.noExtt = "false".equals(env.get("zipinfo-time"));
>>>>
>>>> I can't share this particular ZIP file with you and I don't know how
>>>> it was created but it seems like that "zipinfo-time" flag could be
>>>> omitted if the length of the extra data field is exactly 5?
>>>>
>>>> Dawid
>>>>
>>>>
>>>>
>>>> Best
>>>> Lance
>>>> ------------------
>>>>
>>>>
>>>>
>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>>> Oracle Java Engineering
>>>> 1 Network Drive
>>>> Burlington, MA 01803
>>>> Lance.Andersen at oracle.com
>>>>
>>>>
>>>>
>>>>
Best
Lance
------------------
Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com
More information about the core-libs-dev
mailing list