Proposal: JDK-8231640 - (prop) Canonical property storage

Jaikiran Pai jai.forums2013 at gmail.com
Fri Aug 27 14:16:55 UTC 2021


On 26/08/21 5:06 pm, Alan Bateman wrote:
> On 25/08/2021 15:57, Jaikiran Pai wrote:
>> :
>>
>> Introducing an overloaded "store" which takes the timestamp or 
>> implicitly using the SOURCE_DATE_EPOCH environment variable would 
>> mean that the Properties.store(...) APIs will continue to always 
>> write out a Date representation into the outputstream. That 
>> effectively means that there will continue to be no way to disable 
>> writing out a (date) comment. More specifically, even if a user 
>> doesn't explicitly specify a comment, we would end up writing a 
>> default (date) comment. Do we want that? Or while we are doing these 
>> changes, should we introduce this ability of disabling writing out 
>> these date comments by default? That, IMO, can only be done by 
>> introducing new APIs instead of trying to slightly change the spec 
>> and the implementation of the current "store" APIs. After all, if any 
>> callers do want a date (and a reproducible one at that), they could 
>> always do so by passing that as a value for the "comment" parameter 
>> of these (new) "storeXXX" APIs.
>
> Properties save/store has always emitted a comment line with the 
> current date/time, it goes back to JDK 1.0. It's unfortunate that 
> newer store methods didn't re-examine this, also unfortunate that it 
> continued with 8859-1. In the discussion on jdk-dev about 
> reproducibility then I think we concluded that we don't want to change 
> the behavior of existing methods, hence the discussion on introducing 
> a new method.
>
> An new overload of store would give the maximum flexibility to new 
> code but it would require programs used in builds to use it 
> consistently. It's possible that libraries or tools that are using 
> Properties::store have no idea that they will ultimately be used in a 
> system that is trying to produce reproducible builds. So I have some 
> sympathy to the argument that there should a way to omit the date or 
> emit it as the Unix epoch time or a fixed value. This would mean 
> changing the spec to allow for some implementation means to do this, 
> and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH. 
> I think Roger has misgivings about this.
>
> So let's list the options and the pros/cons and see if we can converge 
> on an approach.
>
Keeping aside the discussion about whether to introduce a new API or 
change the spec of the existing API, just for a moment, I think the main 
question is whether the current date comment that gets added by the 
store(...) is being used by anything out there, other than maybe for 
visual aspects while reading the properties file. From the discussions I 
have seen so far in the threads on openjdk and other mailing lists, I 
don't think anyone is using that comment for anything. So if we are 
indeed willing to change the spec of the store(...) API would it be too 
big/unacceptable a change to disable writing this date comment, at least 
in certain specific cases? With my limited usage of this API, my guess 
is that it's higly unlikely that it will break anything - after all it's 
a comment that was being added. The only potential breakages perhaps 
would be some scripts? But even those, I don't know how it would break 
them since anyway the date comment wasn't a constant value. Having said 
that, I looked at the patch that you pointed out[1] that got integrated 
into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a 
bit surprised that the patched implementation decided to write out a 
formatted reproducible date instead of patching it to just not write the 
date. After all that IMO would have been much simpler and it would 
anyway have changed (removed) the exact same line in the code that was 
patched. So maybe I'm underestimating the usage of this date comment.

So now coming to the API part of it. The potential ways to tackle this, 
that I could think of are as follows:

1. No new APIs to be added. Instead, the existing "store(...)" APIs 
specification will be changed to allow for reproducibility of the 
content that gets stored. The possible specification changes that we 
could attempt are:

     1a. The store(...) APIs by default will continue to write out a 
date comment, which represents the current date. In the case where 
SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will 
use that value to construct a (UTC) Date and write out the date comment.

         Pros:
         ----
         - Existing callers of these APIs won't have to change their 
code to use this new semantics.
         - Tools/environments where reproducibility is required, can 
configure their environment (and they probably already would have) to 
set the SOURCE_DATE_EPOCH value.

         Cons:
         ----
         - Behaviour of the same API will change dynamically based on 
the current environment of the process. However, this change in 
behaviour will not have functional impact on the generated output, since 
it's just a comment that changes.
         - There is no way to disable writing out comments to the 
output. i.e. even if user passes a null "comments" value to the APIs, 
there will always be the date comment in the output.
         - Requires spec change

     OR

     1b. The store(...) APIs by default will continue to write out a 
date comment, which represents the current date. In the case where 
SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will 
_not_ write out any date comment. This is unlike 1a where we use the 
value of the environment variable to compute the Date. This alternate 
approach of not using the value but instead considering the presence of 
the environment variable being set, to not write out any date comment 
will still provide reproducibility and also will get rid of the 
unnecessary date comment.

         Pros:
         ----
         - Existing callers of these APIs won't have to change their 
code to use this new semantics.
         - Tools/environments where reproducibility is required, can 
configure their environment (and they probably already would have) to 
set the SOURCE_DATE_EPOCH value.
         - The date comment will no longer end up in the output in 
environments where SOURCE_DATE_EPOCH is set (to any value).

         Cons:
         ----
         - Just like 1a, the API output is determined by the current 
environment of the process.
         - Requires spec change.
         - The only way to generate a output without any comments would 
require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a 
necessity.
         - When SOURCE_DATE_EPOCH is set, it will potentially (even if 
low probability) will breaks existing tools, code(?) or scripts that 
might be looking for some/any date in the comments of the output.

     OR

     1c. The store(...) APIs which both currently take a "comments" 
parameter will write out the date comment only if the "comments" 
parameter is null. If the "comments" value is non-null, then, only that 
user passed comment will be written out to the output.

         Pros:
         ----
         - There will now be a way for the callers to decide what exact 
comment appears in the output. It will either be their explicit comment 
or a default date comment.
         - For reproducibility, the callers can pass any reproducible 
comment of their choice and it no longer will be the responsibility of 
the store(...) API to decide what values for the comments would provide 
reproducibility of the output.

         (Note that the store(...) APIs will still take it upon 
themselves to write out the property keys in a reproducible order, 
that's now been decided in this discussion).

         Cons:
         ----
         - This "may" require source code changes in callers' code. If 
callers are already passing a comment, they will no longer see the 
additional default date comment being added. However, if callers are not 
passing any comment, then they will continue to see the default date 
comment being added. In essence, callers who aren't passing any comment 
to the APIs, MUST change their code, if they want to stop seeing the 
date comment.
         - There is no way to disable writing out the comment to the 
output. Every single store(...) output will have a comment, either user 
supplied or the default date comment.
         - Requires spec change.
         - It will potentially (even if low probability) will breaks 
existing tools, code(?) or scripts that might be looking for some/any 
date in the comments of the output.

     OR

     1d. Since we are thinking of changing the spec, I will include this 
option as well. The store(...) APIs which both current take the 
"comments" parameter will write out the caller passed comment only if 
the comment is non-null. No default comment will be written out. If the 
caller passed comment is null, then only the property key and values 
will be written out and the output will have no comments.

         Pros:
         ----
         - No code changes needed in callers' code.
         - Gets rid of the (IMO, unnecessary) date comment from the output.
         - Callers now have full control over what comment gets written out.
         - There's now a way to disable writing out any comments to the 
output. If callers provide no comment, then only the property key values 
will be written out.
         - These APIs will now feel much more natural without any "if 
this, else that" or any "magic/default" comments appearing in the output.
         - Reproducibility of the comments is now decided by the callers 
and it no longer will be the responsibility of these store(...) APIs to 
come up with reproducible comments.

         (Note that the store(...) APIs will still take it upon 
themselves to write out the property keys in a reproducible order, 
that's now been decided in this discussion).

         Cons:
         ----
         - Requires spec change.
         - It will potentially (even if low probability) will breaks 
existing tools, code(?) or scripts that might be looking for some/any 
date in the comments of the output.

2. Introduce a new (and only one) overloaded API as follows:

     public void store(Writer writer, String comments, java.time.Instant 
dateComment) throws IOException;

     This new API will write out an optional date comment, if the 
java.time.Instant parameter value is not null. In such a case where the 
Instant value is non-null, a java.time.ZonedDateTime will be computed 
out of the Instant value, for the "UTC" ZoneId and the string 
representation will be written out. In the case where the Instant 
parameter value is null, no date comment will be added in the output.
     If the user passes the "comments" value, then that will be written 
out as the comments in the output.

     Pros:
     ----
     - No spec change needed
     - Callers now have control over what date gets written out as a comment
     - Callers now have control to completely disable writing out any 
comments, by passing null for both "comments" and the Instant parameters
     - Reproducibility of the date comment will not be a concern of this 
new API and instead is controlled by the callers.

     Cons:
     ----
     - Requires code changes in callers' code. Potentially in much more 
places than any of other proposed options, since this is a completely 
new API.
     - The new API still allows the Date comment to be written out, 
which IMO adds no value. Furthermore, this now introduces 2 ways to add 
a comment - one by passing value to the "comments" parameter and the 
other by passing a value to the "java.time.Instant" parameter.
     - Introduces/continues with, avoidable, Instant parsing and date 
formatting logic/complexity into the new API.

Let me know if I might have missed any other potential option.

[1] 
https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=reproducible-properties-timestamp.patch;msg=5

-Jaikiran




More information about the core-libs-dev mailing list