Proposal: JDK-8231640 - (prop) Canonical property storage

Daniel Fuchs daniel.fuchs at oracle.com
Fri Aug 27 14:35:25 UTC 2021


Hi Jaikiran,

What about writing the keys in natural ordering?
Has that been abandoned, or will the implementation silently do it,
or will it require a new API?

AFAIU there are two problems that are hostile to reproducible builds:
   - date comments differ from store to store
   - property ordering may change from VM to VM due to differences
     in hashing

best regards,

-- daniel

On 27/08/2021 15:16, Jaikiran Pai wrote:
> 
> On 26/08/21 5:06 pm, Alan Bateman wrote:
>> On 25/08/2021 15:57, Jaikiran Pai wrote:
>>> :
>>>
>>> Introducing an overloaded "store" which takes the timestamp or 
>>> implicitly using the SOURCE_DATE_EPOCH environment variable would 
>>> mean that the Properties.store(...) APIs will continue to always 
>>> write out a Date representation into the outputstream. That 
>>> effectively means that there will continue to be no way to disable 
>>> writing out a (date) comment. More specifically, even if a user 
>>> doesn't explicitly specify a comment, we would end up writing a 
>>> default (date) comment. Do we want that? Or while we are doing these 
>>> changes, should we introduce this ability of disabling writing out 
>>> these date comments by default? That, IMO, can only be done by 
>>> introducing new APIs instead of trying to slightly change the spec 
>>> and the implementation of the current "store" APIs. After all, if any 
>>> callers do want a date (and a reproducible one at that), they could 
>>> always do so by passing that as a value for the "comment" parameter 
>>> of these (new) "storeXXX" APIs.
>>
>> Properties save/store has always emitted a comment line with the 
>> current date/time, it goes back to JDK 1.0. It's unfortunate that 
>> newer store methods didn't re-examine this, also unfortunate that it 
>> continued with 8859-1. In the discussion on jdk-dev about 
>> reproducibility then I think we concluded that we don't want to change 
>> the behavior of existing methods, hence the discussion on introducing 
>> a new method.
>>
>> An new overload of store would give the maximum flexibility to new 
>> code but it would require programs used in builds to use it 
>> consistently. It's possible that libraries or tools that are using 
>> Properties::store have no idea that they will ultimately be used in a 
>> system that is trying to produce reproducible builds. So I have some 
>> sympathy to the argument that there should a way to omit the date or 
>> emit it as the Unix epoch time or a fixed value. This would mean 
>> changing the spec to allow for some implementation means to do this, 
>> and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH. 
>> I think Roger has misgivings about this.
>>
>> So let's list the options and the pros/cons and see if we can converge 
>> on an approach.
>>
> Keeping aside the discussion about whether to introduce a new API or 
> change the spec of the existing API, just for a moment, I think the main 
> question is whether the current date comment that gets added by the 
> store(...) is being used by anything out there, other than maybe for 
> visual aspects while reading the properties file. From the discussions I 
> have seen so far in the threads on openjdk and other mailing lists, I 
> don't think anyone is using that comment for anything. So if we are 
> indeed willing to change the spec of the store(...) API would it be too 
> big/unacceptable a change to disable writing this date comment, at least 
> in certain specific cases? With my limited usage of this API, my guess 
> is that it's higly unlikely that it will break anything - after all it's 
> a comment that was being added. The only potential breakages perhaps 
> would be some scripts? But even those, I don't know how it would break 
> them since anyway the date comment wasn't a constant value. Having said 
> that, I looked at the patch that you pointed out[1] that got integrated 
> into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a 
> bit surprised that the patched implementation decided to write out a 
> formatted reproducible date instead of patching it to just not write the 
> date. After all that IMO would have been much simpler and it would 
> anyway have changed (removed) the exact same line in the code that was 
> patched. So maybe I'm underestimating the usage of this date comment.
> 
> So now coming to the API part of it. The potential ways to tackle this, 
> that I could think of are as follows:
> 
> 1. No new APIs to be added. Instead, the existing "store(...)" APIs 
> specification will be changed to allow for reproducibility of the 
> content that gets stored. The possible specification changes that we 
> could attempt are:
> 
>      1a. The store(...) APIs by default will continue to write out a 
> date comment, which represents the current date. In the case where 
> SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will 
> use that value to construct a (UTC) Date and write out the date comment.
> 
>          Pros:
>          ----
>          - Existing callers of these APIs won't have to change their 
> code to use this new semantics.
>          - Tools/environments where reproducibility is required, can 
> configure their environment (and they probably already would have) to 
> set the SOURCE_DATE_EPOCH value.
> 
>          Cons:
>          ----
>          - Behaviour of the same API will change dynamically based on 
> the current environment of the process. However, this change in 
> behaviour will not have functional impact on the generated output, since 
> it's just a comment that changes.
>          - There is no way to disable writing out comments to the 
> output. i.e. even if user passes a null "comments" value to the APIs, 
> there will always be the date comment in the output.
>          - Requires spec change
> 
>      OR
> 
>      1b. The store(...) APIs by default will continue to write out a 
> date comment, which represents the current date. In the case where 
> SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will 
> _not_ write out any date comment. This is unlike 1a where we use the 
> value of the environment variable to compute the Date. This alternate 
> approach of not using the value but instead considering the presence of 
> the environment variable being set, to not write out any date comment 
> will still provide reproducibility and also will get rid of the 
> unnecessary date comment.
> 
>          Pros:
>          ----
>          - Existing callers of these APIs won't have to change their 
> code to use this new semantics.
>          - Tools/environments where reproducibility is required, can 
> configure their environment (and they probably already would have) to 
> set the SOURCE_DATE_EPOCH value.
>          - The date comment will no longer end up in the output in 
> environments where SOURCE_DATE_EPOCH is set (to any value).
> 
>          Cons:
>          ----
>          - Just like 1a, the API output is determined by the current 
> environment of the process.
>          - Requires spec change.
>          - The only way to generate a output without any comments would 
> require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a 
> necessity.
>          - When SOURCE_DATE_EPOCH is set, it will potentially (even if 
> low probability) will breaks existing tools, code(?) or scripts that 
> might be looking for some/any date in the comments of the output.
> 
>      OR
> 
>      1c. The store(...) APIs which both currently take a "comments" 
> parameter will write out the date comment only if the "comments" 
> parameter is null. If the "comments" value is non-null, then, only that 
> user passed comment will be written out to the output.
> 
>          Pros:
>          ----
>          - There will now be a way for the callers to decide what exact 
> comment appears in the output. It will either be their explicit comment 
> or a default date comment.
>          - For reproducibility, the callers can pass any reproducible 
> comment of their choice and it no longer will be the responsibility of 
> the store(...) API to decide what values for the comments would provide 
> reproducibility of the output.
> 
>          (Note that the store(...) APIs will still take it upon 
> themselves to write out the property keys in a reproducible order, 
> that's now been decided in this discussion).
> 
>          Cons:
>          ----
>          - This "may" require source code changes in callers' code. If 
> callers are already passing a comment, they will no longer see the 
> additional default date comment being added. However, if callers are not 
> passing any comment, then they will continue to see the default date 
> comment being added. In essence, callers who aren't passing any comment 
> to the APIs, MUST change their code, if they want to stop seeing the 
> date comment.
>          - There is no way to disable writing out the comment to the 
> output. Every single store(...) output will have a comment, either user 
> supplied or the default date comment.
>          - Requires spec change.
>          - It will potentially (even if low probability) will breaks 
> existing tools, code(?) or scripts that might be looking for some/any 
> date in the comments of the output.
> 
>      OR
> 
>      1d. Since we are thinking of changing the spec, I will include this 
> option as well. The store(...) APIs which both current take the 
> "comments" parameter will write out the caller passed comment only if 
> the comment is non-null. No default comment will be written out. If the 
> caller passed comment is null, then only the property key and values 
> will be written out and the output will have no comments.
> 
>          Pros:
>          ----
>          - No code changes needed in callers' code.
>          - Gets rid of the (IMO, unnecessary) date comment from the output.
>          - Callers now have full control over what comment gets written 
> out.
>          - There's now a way to disable writing out any comments to the 
> output. If callers provide no comment, then only the property key values 
> will be written out.
>          - These APIs will now feel much more natural without any "if 
> this, else that" or any "magic/default" comments appearing in the output.
>          - Reproducibility of the comments is now decided by the callers 
> and it no longer will be the responsibility of these store(...) APIs to 
> come up with reproducible comments.
> 
>          (Note that the store(...) APIs will still take it upon 
> themselves to write out the property keys in a reproducible order, 
> that's now been decided in this discussion).
> 
>          Cons:
>          ----
>          - Requires spec change.
>          - It will potentially (even if low probability) will breaks 
> existing tools, code(?) or scripts that might be looking for some/any 
> date in the comments of the output.
> 
> 2. Introduce a new (and only one) overloaded API as follows:
> 
>      public void store(Writer writer, String comments, java.time.Instant 
> dateComment) throws IOException;
> 
>      This new API will write out an optional date comment, if the 
> java.time.Instant parameter value is not null. In such a case where the 
> Instant value is non-null, a java.time.ZonedDateTime will be computed 
> out of the Instant value, for the "UTC" ZoneId and the string 
> representation will be written out. In the case where the Instant 
> parameter value is null, no date comment will be added in the output.
>      If the user passes the "comments" value, then that will be written 
> out as the comments in the output.
> 
>      Pros:
>      ----
>      - No spec change needed
>      - Callers now have control over what date gets written out as a 
> comment
>      - Callers now have control to completely disable writing out any 
> comments, by passing null for both "comments" and the Instant parameters
>      - Reproducibility of the date comment will not be a concern of this 
> new API and instead is controlled by the callers.
> 
>      Cons:
>      ----
>      - Requires code changes in callers' code. Potentially in much more 
> places than any of other proposed options, since this is a completely 
> new API.
>      - The new API still allows the Date comment to be written out, 
> which IMO adds no value. Furthermore, this now introduces 2 ways to add 
> a comment - one by passing value to the "comments" parameter and the 
> other by passing a value to the "java.time.Instant" parameter.
>      - Introduces/continues with, avoidable, Instant parsing and date 
> formatting logic/complexity into the new API.
> 
> Let me know if I might have missed any other potential option.
> 
> [1] 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=reproducible-properties-timestamp.patch;msg=5 
> 
> 
> -Jaikiran
> 
> 



More information about the core-libs-dev mailing list