RFR: 8321156: Improve the handling of invalid UTF-8 byte sequences for ZipInputStream::getNextEntry and ZipFile::getComment

Eirik Bjørsnøs eirbjo at openjdk.org
Sat Feb 24 17:17:52 UTC 2024


On Sat, 24 Feb 2024 14:56:01 GMT, Lance Andersen <lancea at openjdk.org> wrote:

> Please review this PR which addresses the handling of invalid UTF-8 byte sequences in the entry name of a LOC file header and a Zip file comment which is returned via ZipFile::getComment.
> 
> As part of the change, `ZipFile::getComment` will now return `null` if an invalid UTF-8 byte sequence is encountered while converting the byte array to a String.  The CSR for this change has also been approved.
> 
> Mach5 tiers 1-3 are clean with this change.

Since the CSR is already approved, I'll add a question here:

`ZipFile` performs a lot of validation while opening ZIP files, including throwning ZipException for invalid entry names or comments. Why handle the ZIP file comment differently (lazily)? If this comment was also validated by the constructor, then the API change for ZipFile::getComment would not be needed.

Do we have reason to belive the encoding quality of ZIP file comments is less reliable than that of ZIP entry comments? Or is there some other reason this validation is done lazily?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17995#issuecomment-1962426366


More information about the core-libs-dev mailing list