RFR: 8276400: openjdk image Jars, Zips and Jmods built from the same source are not reproducible

Andrew Leonard aleonard at openjdk.java.net
Fri Nov 5 12:27:09 UTC 2021


On Fri, 5 Nov 2021 11:31:34 GMT, Andrew Leonard <aleonard at openjdk.org> wrote:

>> This PR enables reproducible Jars, Jmods and openjdk image zip files (eg.src.zip).
>> It provides support for SOURCE_DATE_EPOCH for Jar, Jmod and underlying ZipOutputStream's.
>> It fixes the following keys issues relating to reproducibility:
>> - Jar and ZipOutputStream are not SOURCE_DATE_EPOCH aware
>>   - Jar and ZipOutputStream now detect SOURCE_DATE_EPOCH environment setting
>> - Jar and Jmod file content ordering was non-determinsitic
>>   - Fixes to Jar and Jmod main's to ensure sorted classes content ordering
>> - openjdk image zip file generation used "zip" which is non-determinsitic
>>   - New openjdk build tool "GenerateZip" which produces the final determinsitic zip files as part of the build and also detects SOURCE_DATE_EPOCH
>> 
>> Signed-off-by: Andrew Leonard <anleonar at redhat.com>
>
>> I think this is going to require discussion, a PR may be too premature. Is your goal to get the JDK build and run-time images creates with jlink to use the SOURCE_DATE_EPOCH? Are you looking for project builds that use the jar/jmod/etc. tools to use it? Or are you looking to have every library and application that uses the java.util.zip APIs to read it? If it includes the latter, and the patch in the PR suggests you are, then it will require analysis to work through the API spec changes.
>> 
>> One suggestion is to look at JDK-8231640 and PR 5372 [1]. The original proposal was to use SOURCE_DATE_EPOCH. After a lengthy discussion we converged on the system property java.properties.date rather than the env variable. I suspect that much of that discussion applies here.
>> 
>> -Alan
>> 
>> [1] https://github.com/[/pull/5372](https://github.com/openjdk/jdk/pull/5372)
> 
> @AlanBateman Hi Alan, thank you for the comments. So yes, totally agree this has a way to go, I thought creating a PR would kick things off...! So just to clarify the initial objectives which you raise above. My current aim is actually purely an OpenJDK JDK image perspective, building on the work @magicus has been doing with SOURCE_DATE_EPOCH in the build, hence my natural approach to this PR. So from that perspective, it's purely a "build" change. However, the openjdk build obviously uses openjdk Jar, Jmod tooling to build itself, hence the direction I took in this PR.
> 
> Magnus's PR #5372 is useful thank you, I had not discovered that change, as you say the direction there is similar to mine, and I understand the potential pitfalls of adding doPrivileged's etc. So that resolved to a new system property java.properties.date, which makes sense.
> 
> So with that in mind, and given the objective of JDK image reproducibility, this would imply presumably an approach of a new cmd line option to Jar and Jmod tools (eg.--source-date <date>), since these tools are cmds?
> 
> One of the key changes in this PR is to ZipOutputStream where it sets the ZipEntry.xdostime if not set, to the current time. Now I am thinking given the "objective", if we fix all upstream tooling to ensure they set ZipEntry times, using "--source-date" (or whatever mechanism), then we can then avoid changing ZipOutputStream I believe. JDK tools like jar, jmod generate/update extra files like Manifests, and in fact I think the current jmod implementation writes all its content using the current time.
> 
> So something along these lines:
>     jar -cf myjar.jar --source-date "<date>" * 
>     jmod create --source-date "<date>" ... my.jmod
> The jar, jmod,.. tooling then ensure all ZipEntry times are set to "source-date".
> 
> I chose "source-date" name based on being synonymous with SOURCE_DATE_EPOCH, implying a similar "use case".
> 
> So taking this approach for JDK tooling should achieve the reproducible JDK image objective, without a java API behavior change I am hoping.
> Thoughts?
> 
> Regards
> Andrew

> @andrew-m-leonard Just to be clear: changing arguments to the command line utilities is still an API change and will need a CSR.
> 
> My gut feeling is that using a system property that affect all users of ZipStream is preferable. That way a user could run like `java -Djava.properties.date=$(SOURCE_DATE_EPOCH) -Djava.zipfile.date=$(SOURCE_DATE_EPOCH) ...` (or whatever) and get reproducible behavior in all their code, for places that could otherwise be hard to fix.

Thanks @magicus 
Yes, I understand a CSR is required, I was trying to state there would not be a java.util.zip API behavior change necessarily.

Just to state again, and although I need to check this, I think if all JDK tooling ensures ZipEntry's had date set to whatever <source-date> prior to calling ZipOutputStream.putNextEntry(e), then I think no change to ZipOutputStream would be needed (I think).
However, you do elude to the benefit to "others" of changing ZipOutputStream to use (say -Djava.zipfile.date), in that it makes it easier for consumers to make reproducible builds rather than tracking down all the code that calls it in their project.

GenerateZip as you say is slightly separate, and bespoke to the openjdk build. I did start by trying to create my own "zip", but that is actually quite involved due to how openjdk build makes zip files, and how it leverages zip functionality for things like --includes & --excludes. It is far easier actually to just construct the final deterministic zip file using this class at the end. But I agree, if someone could write a determinisitic "zip", that would be great please!
I will move GenerateZip to a separate PR, and make it take "timestamp" just like CreateSymbols already does.

Thanks again
Andrew

-------------

PR: https://git.openjdk.java.net/jdk/pull/6268


More information about the compiler-dev mailing list