RFR: 8336025: Improve ZipOutputSream validation of MAX CEN Header field limits [v2]

Sun Sep 22 18:06:37 UTC 2024

On Mon, 16 Sep 2024 11:42:46 GMT, Lance Andersen <lancea at openjdk.org> wrote:

>> I'm curious why the combined header length validation is being placed so late. In general I would assume failing fast is better?
>> 
>> Also, since the combined header length clause applies to "any directory record", it also applies to LOC?
>> 
>> So why is this happening late in `writeCEN`, as opposed to early in `putNextEntry`?
>> 
>> Edit: I'm aware that moving it to `putNextEntry` means you probably need to repeat it in writeCEN, just in case the `ZipEntry` was updated meanwhile. 
>> 
>> Some comments inline.
>
>> I'm curious why the combined header length validation is being placed so late. In general I would assume failing fast is better?
>> 
>> Also, since the combined header length clause applies to "any directory record", it also applies to LOC?
>> 
>> So why is this happening late in `writeCEN`, as opposed to early in `putNextEntry`?
>> 
>> Edit: I'm aware that moving it to `putNextEntry` means you probably need to repeat it in writeCEN, just in case the `ZipEntry` was updated meanwhile.
>> 
>> Some comments inline.
> 
> As this is really a corner case at best, I decided to keep the changes to a minimum and the validation in writeCEN given that is where the encoded comment bytes are obtained and written out.

@LanceAndersen 

I had a look at your update of the `CenSizeTooLarge` test.

This test uses artificially large extra fields to produce a CEN size exceeding the implementation limit. Since this otherwise would create a lot of IO and a huge file, the test writes a sparse file until the last CEN record is written, after which real bytes are written out for the "ZIP64 End of Central Directory" and "End of Central Directory" records are written.

The current extra field size of 0xFFFF is too large to pass the combined length validation introduced by this PR. Because of this, the field size is reduced to account for the CENHDR length, the length of the entry name and the length of the comment (which is added to the last entry only)

Since the comment is only added to the last entry (as a hint to SparseOutputStream),  the CEN_HEADER_SIZE no longer reflects the actual size of the CEN headers written. While the test still passes (by increasing NUM_ENTRIES as needed), this makes the test somewhat confusing for future maintainers.

Could we perhaps skip the comment altogether, and instead use equal-sized CEN records for all entries? Instead of using a comment as a signal to `SparseOutputStream` we could use a separate extra byte array on the last entry, as suggested in this diff on top of your PR:


Index: test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================

diff --git a/test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java b/test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java
--- a/test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java	(revision e250340ba4c8da6143f2c0032fed517d3feb8440)
+++ b/test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java	(date 1727026889012)
@@ -34,9 +34,7 @@
 import java.nio.ByteBuffer;
 import java.nio.ByteOrder;
 import java.nio.channels.FileChannel;
-import java.nio.charset.StandardCharsets;
 import java.time.LocalDateTime;
-import java.util.Arrays;
 import java.util.zip.ZipEntry;
 import java.util.zip.ZipException;
 import java.util.zip.ZipFile;
@@ -47,11 +45,6 @@
 
 public class CenSizeTooLarge {
 
-    // Helps SparseOutputStream detect write of the last CEN entry
-    private static final String LAST_CEN_COMMENT = "LastCEN";
-    private static final byte[] LAST_CEN_COMMENT_BYTES =
-            LAST_CEN_COMMENT.getBytes(StandardCharsets.UTF_8);
-
     // Entry names produced in this test are fixed-length
     public static final int NAME_LENGTH = 10;
 
@@ -71,8 +64,7 @@
      *.
      * Create a maximum extra field which does not exceed 65,535 bytes
      */
-    static final int MAX_EXTRA_FIELD_SIZE =
-            65_535 - ZipFile.CENHDR - NAME_LENGTH - LAST_CEN_COMMENT.length();
+    static final int MAX_EXTRA_FIELD_SIZE = 65_535 - ZipFile.CENHDR - NAME_LENGTH;
 
     // Data size (unsigned short)
     // Field size minus the leading header 'tag' and 'data size' fields (2 bytes each)
@@ -96,6 +88,10 @@
     // Zip file to create for testing
     private File hugeZipFile;
 
+    private static final byte[] EXTRA_BYTES = makeLargeExtraField();
+    // Helps SparseOutputStream detect write of the last CEN entry
+    private static final byte[] LAST_EXTRA_BYTES = makeLargeExtraField();
+
     /**
      * Create a zip file with a CEN size which does not fit within a Java byte array
      */
@@ -127,10 +123,6 @@
                 // Set the time/date field for faster processing
                 entry.setTimeLocal(TIME_LOCAL);
 
-                if (i == NUM_ENTRIES -1) {
-                    // Help SparseOutputStream detect the last CEN entry write
-                    entry.setComment(LAST_CEN_COMMENT);
-                }
                 // Add the entry
                 zip.putNextEntry(entry);
 
@@ -140,10 +132,11 @@
             zip.closeEntry();
 
             // Before the CEN headers are written, set the extra data on each entry
-            byte[] extra = makeLargeExtraField();
             for (ZipEntry entry : entries) {
-                entry.setExtra(extra);
+                entry.setExtra(EXTRA_BYTES);
             }
+            // Help SparseOutputSream detect the last entry
+            entries[entries.length-1].setExtra(LAST_EXTRA_BYTES);
         }
     }
 
@@ -167,7 +160,7 @@
      * Data Size  (Two byte short)
      * Data Block (Contents depend on field type)
      */
-    private byte[] makeLargeExtraField() {
+    private static byte[] makeLargeExtraField() {
         // Make a maximally sized extra field
         byte[] extra = new byte[MAX_EXTRA_FIELD_SIZE];
         // Little-endian ByteBuffer for updating the header fields
@@ -205,7 +198,7 @@
                 // but instead simply advance the position, creating a sparse file
                 channel.position(position);
                 // Check for last CEN record
-                if (Arrays.equals(LAST_CEN_COMMENT_BYTES, 0, LAST_CEN_COMMENT_BYTES.length, b, off, len)) {
+                if (b == LAST_EXTRA_BYTES) {
                     // From here on, write actual bytes
                     sparse = false;
                 }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21003#issuecomment-2366897313