Proposal: JDK-8231640 - (prop) Canonical property storage
Daniel Fuchs
daniel.fuchs at oracle.com
Fri Aug 27 14:35:25 UTC 2021
Hi Jaikiran,
What about writing the keys in natural ordering?
Has that been abandoned, or will the implementation silently do it,
or will it require a new API?
AFAIU there are two problems that are hostile to reproducible builds:
- date comments differ from store to store
- property ordering may change from VM to VM due to differences
in hashing
best regards,
-- daniel
On 27/08/2021 15:16, Jaikiran Pai wrote:
>
> On 26/08/21 5:06 pm, Alan Bateman wrote:
>> On 25/08/2021 15:57, Jaikiran Pai wrote:
>>> :
>>>
>>> Introducing an overloaded "store" which takes the timestamp or
>>> implicitly using the SOURCE_DATE_EPOCH environment variable would
>>> mean that the Properties.store(...) APIs will continue to always
>>> write out a Date representation into the outputstream. That
>>> effectively means that there will continue to be no way to disable
>>> writing out a (date) comment. More specifically, even if a user
>>> doesn't explicitly specify a comment, we would end up writing a
>>> default (date) comment. Do we want that? Or while we are doing these
>>> changes, should we introduce this ability of disabling writing out
>>> these date comments by default? That, IMO, can only be done by
>>> introducing new APIs instead of trying to slightly change the spec
>>> and the implementation of the current "store" APIs. After all, if any
>>> callers do want a date (and a reproducible one at that), they could
>>> always do so by passing that as a value for the "comment" parameter
>>> of these (new) "storeXXX" APIs.
>>
>> Properties save/store has always emitted a comment line with the
>> current date/time, it goes back to JDK 1.0. It's unfortunate that
>> newer store methods didn't re-examine this, also unfortunate that it
>> continued with 8859-1. In the discussion on jdk-dev about
>> reproducibility then I think we concluded that we don't want to change
>> the behavior of existing methods, hence the discussion on introducing
>> a new method.
>>
>> An new overload of store would give the maximum flexibility to new
>> code but it would require programs used in builds to use it
>> consistently. It's possible that libraries or tools that are using
>> Properties::store have no idea that they will ultimately be used in a
>> system that is trying to produce reproducible builds. So I have some
>> sympathy to the argument that there should a way to omit the date or
>> emit it as the Unix epoch time or a fixed value. This would mean
>> changing the spec to allow for some implementation means to do this,
>> and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH.
>> I think Roger has misgivings about this.
>>
>> So let's list the options and the pros/cons and see if we can converge
>> on an approach.
>>
> Keeping aside the discussion about whether to introduce a new API or
> change the spec of the existing API, just for a moment, I think the main
> question is whether the current date comment that gets added by the
> store(...) is being used by anything out there, other than maybe for
> visual aspects while reading the properties file. From the discussions I
> have seen so far in the threads on openjdk and other mailing lists, I
> don't think anyone is using that comment for anything. So if we are
> indeed willing to change the spec of the store(...) API would it be too
> big/unacceptable a change to disable writing this date comment, at least
> in certain specific cases? With my limited usage of this API, my guess
> is that it's higly unlikely that it will break anything - after all it's
> a comment that was being added. The only potential breakages perhaps
> would be some scripts? But even those, I don't know how it would break
> them since anyway the date comment wasn't a constant value. Having said
> that, I looked at the patch that you pointed out[1] that got integrated
> into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a
> bit surprised that the patched implementation decided to write out a
> formatted reproducible date instead of patching it to just not write the
> date. After all that IMO would have been much simpler and it would
> anyway have changed (removed) the exact same line in the code that was
> patched. So maybe I'm underestimating the usage of this date comment.
>
> So now coming to the API part of it. The potential ways to tackle this,
> that I could think of are as follows:
>
> 1. No new APIs to be added. Instead, the existing "store(...)" APIs
> specification will be changed to allow for reproducibility of the
> content that gets stored. The possible specification changes that we
> could attempt are:
>
> 1a. The store(...) APIs by default will continue to write out a
> date comment, which represents the current date. In the case where
> SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will
> use that value to construct a (UTC) Date and write out the date comment.
>
> Pros:
> ----
> - Existing callers of these APIs won't have to change their
> code to use this new semantics.
> - Tools/environments where reproducibility is required, can
> configure their environment (and they probably already would have) to
> set the SOURCE_DATE_EPOCH value.
>
> Cons:
> ----
> - Behaviour of the same API will change dynamically based on
> the current environment of the process. However, this change in
> behaviour will not have functional impact on the generated output, since
> it's just a comment that changes.
> - There is no way to disable writing out comments to the
> output. i.e. even if user passes a null "comments" value to the APIs,
> there will always be the date comment in the output.
> - Requires spec change
>
> OR
>
> 1b. The store(...) APIs by default will continue to write out a
> date comment, which represents the current date. In the case where
> SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will
> _not_ write out any date comment. This is unlike 1a where we use the
> value of the environment variable to compute the Date. This alternate
> approach of not using the value but instead considering the presence of
> the environment variable being set, to not write out any date comment
> will still provide reproducibility and also will get rid of the
> unnecessary date comment.
>
> Pros:
> ----
> - Existing callers of these APIs won't have to change their
> code to use this new semantics.
> - Tools/environments where reproducibility is required, can
> configure their environment (and they probably already would have) to
> set the SOURCE_DATE_EPOCH value.
> - The date comment will no longer end up in the output in
> environments where SOURCE_DATE_EPOCH is set (to any value).
>
> Cons:
> ----
> - Just like 1a, the API output is determined by the current
> environment of the process.
> - Requires spec change.
> - The only way to generate a output without any comments would
> require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a
> necessity.
> - When SOURCE_DATE_EPOCH is set, it will potentially (even if
> low probability) will breaks existing tools, code(?) or scripts that
> might be looking for some/any date in the comments of the output.
>
> OR
>
> 1c. The store(...) APIs which both currently take a "comments"
> parameter will write out the date comment only if the "comments"
> parameter is null. If the "comments" value is non-null, then, only that
> user passed comment will be written out to the output.
>
> Pros:
> ----
> - There will now be a way for the callers to decide what exact
> comment appears in the output. It will either be their explicit comment
> or a default date comment.
> - For reproducibility, the callers can pass any reproducible
> comment of their choice and it no longer will be the responsibility of
> the store(...) API to decide what values for the comments would provide
> reproducibility of the output.
>
> (Note that the store(...) APIs will still take it upon
> themselves to write out the property keys in a reproducible order,
> that's now been decided in this discussion).
>
> Cons:
> ----
> - This "may" require source code changes in callers' code. If
> callers are already passing a comment, they will no longer see the
> additional default date comment being added. However, if callers are not
> passing any comment, then they will continue to see the default date
> comment being added. In essence, callers who aren't passing any comment
> to the APIs, MUST change their code, if they want to stop seeing the
> date comment.
> - There is no way to disable writing out the comment to the
> output. Every single store(...) output will have a comment, either user
> supplied or the default date comment.
> - Requires spec change.
> - It will potentially (even if low probability) will breaks
> existing tools, code(?) or scripts that might be looking for some/any
> date in the comments of the output.
>
> OR
>
> 1d. Since we are thinking of changing the spec, I will include this
> option as well. The store(...) APIs which both current take the
> "comments" parameter will write out the caller passed comment only if
> the comment is non-null. No default comment will be written out. If the
> caller passed comment is null, then only the property key and values
> will be written out and the output will have no comments.
>
> Pros:
> ----
> - No code changes needed in callers' code.
> - Gets rid of the (IMO, unnecessary) date comment from the output.
> - Callers now have full control over what comment gets written
> out.
> - There's now a way to disable writing out any comments to the
> output. If callers provide no comment, then only the property key values
> will be written out.
> - These APIs will now feel much more natural without any "if
> this, else that" or any "magic/default" comments appearing in the output.
> - Reproducibility of the comments is now decided by the callers
> and it no longer will be the responsibility of these store(...) APIs to
> come up with reproducible comments.
>
> (Note that the store(...) APIs will still take it upon
> themselves to write out the property keys in a reproducible order,
> that's now been decided in this discussion).
>
> Cons:
> ----
> - Requires spec change.
> - It will potentially (even if low probability) will breaks
> existing tools, code(?) or scripts that might be looking for some/any
> date in the comments of the output.
>
> 2. Introduce a new (and only one) overloaded API as follows:
>
> public void store(Writer writer, String comments, java.time.Instant
> dateComment) throws IOException;
>
> This new API will write out an optional date comment, if the
> java.time.Instant parameter value is not null. In such a case where the
> Instant value is non-null, a java.time.ZonedDateTime will be computed
> out of the Instant value, for the "UTC" ZoneId and the string
> representation will be written out. In the case where the Instant
> parameter value is null, no date comment will be added in the output.
> If the user passes the "comments" value, then that will be written
> out as the comments in the output.
>
> Pros:
> ----
> - No spec change needed
> - Callers now have control over what date gets written out as a
> comment
> - Callers now have control to completely disable writing out any
> comments, by passing null for both "comments" and the Instant parameters
> - Reproducibility of the date comment will not be a concern of this
> new API and instead is controlled by the callers.
>
> Cons:
> ----
> - Requires code changes in callers' code. Potentially in much more
> places than any of other proposed options, since this is a completely
> new API.
> - The new API still allows the Date comment to be written out,
> which IMO adds no value. Furthermore, this now introduces 2 ways to add
> a comment - one by passing value to the "comments" parameter and the
> other by passing a value to the "java.time.Instant" parameter.
> - Introduces/continues with, avoidable, Instant parsing and date
> formatting logic/complexity into the new API.
>
> Let me know if I might have missed any other potential option.
>
> [1]
> https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=reproducible-properties-timestamp.patch;msg=5
>
>
> -Jaikiran
>
>
More information about the core-libs-dev
mailing list