RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

Erik Joelsson erikj at openjdk.java.net
Tue Nov 9 17:28:45 UTC 2021


On Tue, 9 Nov 2021 14:55:52 GMT, Andrew Leonard <aleonard at openjdk.org> wrote:

>> make/common/ZipArchive.gmk line 178:
>> 
>>> 176: 	    (cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \
>>> 177: 	     $(RM) $$@ && \
>>> 178: 	     $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \
>> 
>> Having to explode the zip here is unfortunate. This means we are creating an almost full copy of the whole src tree in the build directory, something I tried to avoid by leveraging the include/exclude functionality of zip, instead of generating make rules for copying the files I wanted into a source tree to run zip on. This may be a small overhead on Linux, but I'm pretty sure it will be very noticeable on Windows.
>> 
>> When reading about your tool at first, I assumed it would read the intermediate zip file directly when rebuilding the zip. I don't think modifying it to do that would be too complicated, basically read and processing ZipEntrys instead of walking the file system.
>
> @erikj79 so had a bit of a think, and part of the unzipping.. then re-gen'ing is not having to load all the entries into memory. You can't guarantee the order "zip" has created them in, so realistically i'd have to read all the ZipEntry's into memory, then re-write.. which we can do.. src.zip is only 55MB or so, so memory requirements won't be huge given src.zip is the only target here currently.

You are already keeping all the filenames in memory for sorting, so reading up the ZipEntry:s isn't that much more data, just some extra metadata for each entry. The actual file contents is not part of the ZipEntry object. When actually copying the files, you can use the ZipFile class to access ZipEntry's in arbitrary order to read their streams as InputStream.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6311



More information about the build-dev mailing list