RFR: 8336025: Improve ZipOutputSream validation of MAX CEN Header field limits [v2]

Mon Sep 23 11:04:24 UTC 2024

On Mon, 16 Sep 2024 11:42:46 GMT, Lance Andersen <lancea at openjdk.org> wrote:

>> I'm curious why the combined header length validation is being placed so late. In general I would assume failing fast is better?
>> 
>> Also, since the combined header length clause applies to "any directory record", it also applies to LOC?
>> 
>> So why is this happening late in `writeCEN`, as opposed to early in `putNextEntry`?
>> 
>> Edit: I'm aware that moving it to `putNextEntry` means you probably need to repeat it in writeCEN, just in case the `ZipEntry` was updated meanwhile. 
>> 
>> Some comments inline.
>
>> I'm curious why the combined header length validation is being placed so late. In general I would assume failing fast is better?
>> 
>> Also, since the combined header length clause applies to "any directory record", it also applies to LOC?
>> 
>> So why is this happening late in `writeCEN`, as opposed to early in `putNextEntry`?
>> 
>> Edit: I'm aware that moving it to `putNextEntry` means you probably need to repeat it in writeCEN, just in case the `ZipEntry` was updated meanwhile.
>> 
>> Some comments inline.
> 
> As this is really a corner case at best, I decided to keep the changes to a minimum and the validation in writeCEN given that is where the encoded comment bytes are obtained and written out.

> @LanceAndersen
> 
> I had a look at your update of the `CenSizeTooLarge` test.
> 
> This test uses artificially large extra fields to produce a CEN size exceeding the implementation limit. Since this otherwise would create a lot of IO and a huge file, the test writes a sparse file until the last CEN record is written, after which real bytes are written out for the "ZIP64 End of Central Directory" and "End of Central Directory" records are written.
> 
> The current extra field size of 0xFFFF is too large to pass the combined length validation introduced by this PR. Because of this, the field size is reduced to account for the CENHDR length, the length of the entry name and the length of the comment (which is added to the last entry only)
> 
> Since the comment is only added to the last entry (as a hint to SparseOutputStream), the CEN_HEADER_SIZE no longer reflects the actual size of the CEN headers written. While the test still passes (by increasing NUM_ENTRIES as needed), this makes the test somewhat confusing for future maintainers.
> 
> Could we perhaps skip the comment altogether, and instead use equal-sized CEN records for all entries? Instead of using a comment as a signal to `SparseOutputStream` we could use a separate extra byte array on the last entry, as suggested in this diff on top of your PR:

Thanks for the suggestions to the test.  They look fine.  I have done a run across our internal Mach5 systems and no issues(as expected).

PR has been updated with these changes

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21003#issuecomment-2367883851