Reproducible Properties

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Mon Sep 30 09:51:12 UTC 2019


On 2019-09-29 18:46, Robert Scholte wrote:
> On Sun, 29 Sep 2019 16:57:33 +0200, Alan Bateman 
> <Alan.Bateman at oracle.com> wrote:
>
>> On 29/09/2019 14:08, Robert Scholte wrote:
>>> Hi,
>>>
>>> the Maven team gets quite some requests regarding reproducible builds.
>>> Depending on the source we're able to fix it ourselves.
>>> However, in case of writing properties via Properties.store() we'll 
>>> get unreproducible properties, because it includes the current 
>>> Date().[1]. The only option we have right now is writing our own 
>>> Properties writer.
>>> I'm kind of surprised not to find any related issue, but does it 
>>> make sense to make the inclusion of the date optional?
>> The Properties.store methods have always been specified to write a 
>> comment line with the current date and time. It wouldn't be 
>> unreasonable to look at changing it to not specify a date/time 
>> comment when the comment is provided by the user of the API but 
>> changing it after 20+ years would need effort to understand the 
>> impact. I think the bigger issue with reproducibility is that the 
>> ordering that the properties are written is not specified. 
>> Reproducible builds are important but maybe it needs Maven tooling or 
>> plugin to do smart comparisons. I think this is something that the 
>> Skara project was looking into at one point.
>>
>> -Alan
>
> The order of properties and the line-endings are indeed important too, 
> but these are less hard to solve.
> As far as I understand, reproducible builds is not about comparison, 
> but being able to get the same checksum value because the content will 
> stay exactly same for one specific build, byte per byte. This is a 
> challenge not just for Maven, but for any tool that generates output.
>
> I wasn't expecting a complete replacement of the Date(), but more a 
> way to either set the date or skip the date in the comment( like 
> setCommentDate(Date date) ).
> As one might see in the code, writing properties is not that easy. My 
> concern is that any third party attempt to do this on its own will not 
> be as good as the original implementation.

Being in a situation to trying to get reproducible builds to OpenJDK 
itself, I understand your pain and fully support the idea that it should 
be at least *possible* for the standard methods to produce reproducible 
output. Adding an additional property to ignore the date, or set it to a 
given fixed date, seems like a very good idea to me.

Ordering is important as well, though. If properties files are written 
in the order found using an internal structure of a hash map, then you 
cannot rely in reproducibility. I've found that when doing that, even 
for very simple programs, *most of the time* the file stays the same, 
but at some occasions, the structure changes. I've spent some time 
trying to figure out what makes this non-deterministic, but failed. One 
could be excused for believing that a really trivial, non-concurrent 
program that just adds a few entries to a hash map would have 
deterministic output, but that is not the case. Possibly, the internal 
structure of the hash map is ultimately dependent on available memory, 
or some external factor like that.

So I'd strongly recommend adding a second property to enforce sorted (or 
"stable" by some other definition) output, to be fully certain to 
achieve reproducibility.

/Magnus

>
> Robert



More information about the jdk-dev mailing list