RFR: 4890732: GZIPOutputStream doesn't support optional GZIP fields [v3]
Lance Andersen
lance.andersen at oracle.com
Sun Apr 11 11:10:01 UTC 2021
Hi Lin
On Apr 10, 2021, at 11:16 PM, Lin Zang <lzang at openjdk.java.net<mailto:lzang at openjdk.java.net>> wrote:
Dear @AlanBateman and @LanceAndersen,
Thanks a lot for your review and comments!
We should look to see if it makes sense to use some of the more recent java features such as Record.
If we are adding a specific class to hold the header fields, perhaps using the builder pattern should be considered vs constructors.
I agree, we should finalize the API before moving forward. I am not quite fimilar with Record, I will do some investigation, trying to use it in this PR and discuss with you later.
Please look at:
The Jep https://openjdk.java.net/jeps/395
A simple tutorial
https://oracle.github.io/learning-library/oci-library/oci-hol/oci-java-app/workshops/freetier/?customTrackingParam=:ow:lp:pt:::RC_WWMK200728P00009:OCIJava_HOL&elqTrackId=7e168170255b485a917a5e6602097868&elqaid=100601&elqat=2&source=:ow:lp:cpo:::RC_WWMK200728P00012:LPD400088742+:ow:lp:cpo::&lab=lab-6-records#STEP1:AsimpleRecord
Chris and Julia’s video is also very good:
https://www.youtube.com/watch?v=pxiHwrEMEtI
Between the use of Record and/or the Builder pattern, assembling the header values can be done quite nicely without constructor overload.
If we are writing out these additional fields, what should we do with them when reading a gzip file back?
IMO, generally there should be check of header crc, and some other checks like the header format, magic number etc. May be we could implement it like the gnu gzip tool, which ingore contents of most header fields but verified the general header info.
Sorry, that what not my point Your current proposal provides no wayto access these fields similar to for example ZipEntry. If we are going to set these additional fields then we should we should provided a means to access the fields
Have you looked at other gzip api implementations to see what they provide in this area?
Please look at api’s such as commons-compress. They provide access to these fields via the API in addition to being able to set the fields. This is what I was referring to above.
I have looked at the gzip implementation at https://github.com/linzang/gzip/blob/distrotech-gzip/gzip.c#L1281, this method is use to generate header crc value for check. And there is some legal header check in this method. And `file name` is used to save original file name in gzip when it is truncated. refer to https://github.com/linzang/gzip/blob/distrotech-gzip/zip.c#L37
In JDK, there is a usage of `file comment` in the gzip heap dump file generated by jcmd `jmap -dump` command. at https://github.com/openjdk/jdk/blob/ff52f2989fd60ec8251eaf76f4c4b78f10d3e048/src/hotspot/share/services/heapDumperCompression.cpp#L127 , and this info is used in testing like an hint for uncompression, please refer https://github.com/openjdk/jdk/blob/ff52f2989fd60ec8251eaf76f4c4b78f10d3e048/test/lib/jdk/test/lib/hprof/parser/GzipRandomAccess.java#L280.
And IMHO the behavior of adding information in `file comments` seems like a general way to transfer extra information between compressor and uncompressor.
Is there anyone who relies on this additional header info? As I mentioned in an earlier comment, we should validate the changes against another implementation to ensure that we can read the data back correctly and that the additional header data that we write can be read by other tools.
May be we could have similar behavior? which just do some general header info check, calculate CRC of the header and ignore the real contents of most header fields. Do you think it is reasonable?
Adding additional support for these fields is fine, we just need to agree on the API changes to move forward prior to updating the PR
Thanks!
Lin
-------------
PR: https://git.openjdk.java.net/jdk/pull/3072
[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC at home]
Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com<mailto:Lance.Andersen at oracle.com>
More information about the core-libs-dev
mailing list