"loc: wrong sig" in ZipFileSystem on a valid ZIP file (extra data field of exactly 5 bytes)
Dawid Weiss
dawid.weiss at gmail.com
Sun Nov 29 22:00:32 UTC 2020
Hi Lance,
I looked at the test you pointed to... This is exactly the test for the
problem I originally reported on October 21st... The bug has been filed
for this on October 25th (by you) --
https://bugs.openjdk.java.net/browse/JDK-8255380 and the fix is in commit
6606e0909cbda9. Weird.
I think that test is slightly broken in that it passes "zipinfo-time=false"
-- this is exactly the workaround that makes it work everywhere. The
test should *not* have that flag set?
try (FileSystem fs =
FileSystems.newFileSystem(Paths.get(ZIP_FILE_NAME),
Map.of("zipinfo-time", "False"))) {
Anyway, the problem is fixed. I think it'd be good to backport to LTS
releases too.
Dawid
On Sat, Nov 28, 2020 at 4:32 PM Lance Andersen <LANCE.ANDERSEN at oracle.com>
wrote:
> Hi Dawid,
>
> Thank you for the follow up.
>
> On Nov 28, 2020, at 9:34 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>
>
> I know what the problem is and I know what the fix should be (and tried to
> convey it) but I don't have a patch.
>
>
> Given you believe you know what the fix would be, it would be great to
> create a patch for ZipFileSystem and see if it addresses the issue and then
> contribute the fix back to the OpenJDK community :-).
>
>
> Writing the test case is not a problem but it'll be a lengthy test on
> slower drives (has to create a zip file that exceeds 4gb of disk space).
>
>
> The test case would be configured as a manual test and would look similar
> to test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java
> <https://github.com/openjdk/jdk/pull/987#diff-73b401685399c9d59851cda49e9c0d377093086119555708253c8df99649e3c9> which
> also creates a 4GB zip.
>
> I don't think there is a way to fake the file system by using a sparse
> file, sadly. Would this be acceptable? If so then sure - I can provide a
> reproducible test case that will fail for the current implementation.
>
>
> The above test runs in a reasonable time so I would expect this test case
> would as well.
>
> I will look to reproduce the issue over the weekend or early next week and
> log a bug with what you provided in your earlier email.
>
>
> Best
> Lance
>
>
> D.
>
> On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <LANCE.ANDERSEN at oracle.com>
> wrote:
>
>> Hi Dawid,
>>
>> If you believe you have a a fix, please submit a patch along with a JTREG
>> test case. Ideally the test case would create zip file as part of the test.
>>
>> Best
>> Lance
>>
>> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>>
>> Just for the archives - I'm not sure if this ended up being filed to
>> the bug system - the repro below demonstrates the bug on all JDKs up
>> to the newest one.
>>
>> # create a single random file of 250 megabytes
>> head -c 250M < /dev/urandom > rnd.bin
>> # create a bunch of hardlinks to the same file.
>> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done
>> # create a zip archive exceeding 4gb (in stored mode so that it's faster)
>> zip -0 archive.zip rnd*.bin
>> # show the content of the archive - for all entries beyond 4gb this should
>> # show extra data blocks like below (note the order of extra data
>> blocks - this is important).
>> #
>> # The central-directory extra field contains:
>> # - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>> # The local extra field has UTC/GMT modification/access times.
>> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes:
>> # 01 04 ea 03 00 00 04 ea 03 00 00.
>> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes:
>> # a4 06 a0 86 01 00 00 00.
>> #
>> zipinfo -v archive.zip
>>
>> # Bug repro code:
>> cat > Test.java <<EOF
>> import java.io
>> <https://urldefense.com/v3/__http://java.io__;!!GqivPVa7Brio!KhvWnawkon4E5sVeCpHmwDPJusjNPLMtqPh-ZBleD-U4Dv6tmWKB3cq3sT68yelqdA$>
>> .*;
>> import java.nio.*;
>> import java.nio.file.*;
>> import java.nio.file.attribute.*;
>>
>> public class Test {
>> public static void main(String[]args) throws IOException {
>> try (FileSystem fs =
>> FileSystems.newFileSystem(Paths.get("archive.zip"))) {
>> for (Path root : fs.getRootDirectories()) {
>> Files.walkFileTree(root, new SimpleFileVisitor<>() {
>> public FileVisitResult visitFile(Path file,
>> BasicFileAttributes attrs) throws IOException {
>> return FileVisitResult.CONTINUE;
>> }
>> });
>> }
>> }
>> }
>> }
>> EOF
>> # and run it.
>> java Test.java
>>
>> The above ends with:
>>
>> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig
>> ->841cd111
>> at
>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899)
>> at
>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600)
>> at
>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536)
>> at
>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532)
>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767)
>> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777)
>> at
>> jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276)
>> at java.base/java.nio.file.Files.readAttributes(Files.java:1843)
>> at
>> java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
>> at
>> java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
>> at
>> java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373)
>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2840)
>> at java.base/java.nio.file.Files.walkFileTree(Files.java:2876)
>> at Test.main(Test.java:10)
>>
>> The fix is relatively simple I think - the current code assumes entry
>> offset is < 4GB when parsing extra data for
>> ID 0x5455. This should be evaluated lazily after all extra data blocks
>> are parsed to ensure the file offset is correctly
>> updated based on block ID 0x0001, regardless of its ordering within
>> other extra data blocks.
>>
>>
>>
>> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.weiss at gmail.com>
>> wrote:
>>
>>
>>
>> This looks like a legitimate bug to me (the zip file system
>> implementation is sensitive to the order of extra parameters and may throw
>> an unexpected error
>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no
>> permissions - but I think it'd be worth adding one?
>>
>> Dawid
>>
>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.weiss at gmail.com>
>> wrote:
>>
>>
>> Hi Lance,
>>
>> Yes, this is exactly the point where the problem is in JDK code. It's
>> directly
>> related to the zip entry beyond max int offset. The code tries to read
>> data
>> from a local zip entry header at locoff, here:
>>
>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff)
>>
>> but the locoff is set to ~0 as per the spec for files exceeding
>> 4 gigabytes, which says:
>>
>> 4.4.16 relative offset of local header: (4 bytes)
>>
>> This is the offset from the start of the first disk on
>> which this file appears, to where the local header SHOULD
>> be found. If an archive is in ZIP64 format and the value
>> in this field is 0xFFFFFFFF, the size will be in the
>> corresponding 8 byte zip64 extended information extra field.
>>
>> A proper fix would be to read the local header from zip64 extra data
>> first. I don't know how this interferes with the rest of the code though -
>> didn't have enough time to look at it. As it stands, zip entries
>> beyond 4GB cause
>> an unchecked exception while attribute-scanning. This simple snippet
>> is enough to demonstrate it, given a ZIP entry beyond 4GB range:
>>
>> try (FileSystem fs =
>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) {
>> for (Path root : fs.getRootDirectories()) {
>> Files.walkFileTree(root, new SimpleFileVisitor<>() {
>> @Override
>> public FileVisitResult visitFile(Path file, BasicFileAttributes
>> attrs) throws IOException {
>> return FileVisitResult.CONTINUE;
>> }
>> });
>> }
>> }
>>
>> The walkFileTree method is key here as it attempts to harvest
>> attributes (why we use it is another story -- this saves a lot of time
>> on large, network-mounted regular directories when you're collecting
>> file metadata).
>>
>>
>> Dawid
>>
>>
>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen
>> <LANCE.ANDERSEN at oracle.com> wrote:
>>
>>
>> Hi David,
>>
>> From a quick look at ZipFileSystem this appears to be an optimization to
>> avoid looking at the LOC extended Timestamp Extra field
>>
>> If a Info-ZIP Extended Timestamp (0x5455)is found then:
>>
>> If the "zipinfo-time" entry was set to “false” in the Map specified
>> when creating the Zip FileSystem,
>>
>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false")
>>
>> and the data size of the CEN extended Timestamp is 5 (flag + mod time)
>>
>> The modified time is used from the CEN Extended Timestamp extra field
>>
>> Otherwise get the modified time, creation time, and access time from the
>> LOC Extended Timestamp extra field.
>>
>>
>> —————
>>
>> Extended Timestamp Extra Field:
>>
>> ==============================
>>
>> The following is the layout of the extended-timestamp extra
>> block.
>> (Last Revision 19970118)
>>
>> Local-header version:
>>
>> Value Size Description
>> ----- ---- -----------
>> (time) 0x5455 Short tag for this extra block type ("UT")
>> TSize Short total data size for this block
>> Flags Byte info bits
>> (ModTime) Long time of last modification (UTC/GMT)
>> (AcTime) Long time of last access (UTC/GMT)
>> (CrTime) Long time of original creation (UTC/GMT)
>>
>> Central-header version:
>>
>> Value Size Description
>> ----- ---- -----------
>> (time) 0x5455 Short tag for this extra block type ("UT")
>> TSize Short total data size for this block
>> Flags Byte info bits (refers to local header!)
>> (ModTime) Long time of last modification (UTC/GMT)
>>
>> The central-header extra field contains the modification time
>> only,
>> or no timestamp at all. TSize is used to flag its presence or
>> absence. But note:
>>
>> If "Flags" indicates that Modtime is present in the local
>> header
>> field, it MUST be present in the central header field, too!
>> This correspondence is required because the modification time
>> value may be used to support trans-timezone freshening and
>> updating operations with zip archives.
>>
>> The time values are in standard Unix signed-long format,
>> indicating
>> the number of seconds since 1 January 1970 00:00:00. The times
>> are relative to Coordinated Universal Time (UTC), also sometimes
>> referred to as Greenwich Mean Time (GMT). To convert to local
>> time,
>> the software must know the local timezone offset from UTC/GMT.
>>
>> The lower three bits of Flags in both headers indicate which
>> time-
>> stamps are present in the LOCAL extra field:
>>
>> bit 0 if set, modification time is present
>> bit 1 if set, access time is present
>> bit 2 if set, creation time is present
>> bits 3-7 reserved for additional timestamps; not set
>>
>> Those times that are present will appear in the order indicated,
>> but
>> any combination of times may be omitted. (Creation time may be
>> present without access time, for example.) TSize should equal
>> (1 + 4*(number of set bits in Flags)), as the block is currently
>> defined. Other timestamps may be added in the future.
>>
>>
>> --------------
>>
>> It's hard to comment on why you received the error that you did but it is
>> possible the tool that was used for writing the entry did something
>> unexpected.
>>
>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the
>> entry?
>>
>>
>> Best,
>> Lance
>>
>>
>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>>
>> Hello,
>>
>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its
>> entries are intact) that causes a runtime exception when scanning its
>> contents with ZipFileSystem. The exception indicates an invalid
>> signature when parsing EXTID_EXTT. I don't quite understand this
>> comment in the code:
>>
>> case EXTID_EXTT:
>> // spec says the Extened timestamp in cen only has mtime
>> // need to read the loc to get the extra a/ctime, if flag
>> // "zipinfo-time" is not specified to false;
>> // there is performance cost (move up to loc and read) to
>> // access the loc table foreach entry;
>> if (zipfs.noExtt) {
>> if (sz == 5)
>> mtime = unixToJavaTime(LG(extra, pos + 1));
>> break;
>> }
>> ...
>>
>> but this ZIP file has the extra data block of exactly 5 bytes, as
>> indicated by zipinfo:
>>
>> Central directory entry #6:
>> ---------------------------
>> ...
>> file system or operating system of origin: Unix
>> version of encoding software: 3.0
>> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
>> minimum software version required to extract: 2.0
>> compression method: deflated
>> ...
>> extended local header: no
>> file last modified on (DOS date/time): 2018 Mar 1 04:56:20
>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local
>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC
>> ...
>> Unix file attributes (100000 octal): ----------
>> MS-DOS file attributes (01 hex): read-only
>>
>> The central-directory extra field contains:
>> - A subfield with ID 0x5455 (universal time) and 5 data bytes.
>> The local extra field has UTC/GMT modification/access times.
>>
>> The above conditional block checking for length == 5 would have worked
>> in ZipFileSystem but it's surrounded by a condition over an
>> externally-provided property - zipfs.noExtt is only set to true if:
>>
>> this.noExtt = "false".equals(env.get("zipinfo-time"));
>>
>> I can't share this particular ZIP file with you and I don't know how
>> it was created but it seems like that "zipinfo-time" flag could be
>> omitted if the length of the extra data field is exactly 5?
>>
>> Dawid
>>
>>
>>
>> Best
>> Lance
>> ------------------
>>
>>
>>
>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com
>>
>>
>>
>>
>>
>>
>> Best
>> Lance
>> ------------------
>>
>> <oracle_sig_logo.gif>
>>
>>
>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com
>>
>
>
> Best
> Lance
> ------------------
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>
>
>
More information about the core-libs-dev
mailing list