Proposal: JDK-8231640 - (prop) Canonical property storage
Jaikiran Pai
jai.forums2013 at gmail.com
Fri Aug 27 14:16:55 UTC 2021
On 26/08/21 5:06 pm, Alan Bateman wrote:
> On 25/08/2021 15:57, Jaikiran Pai wrote:
>> :
>>
>> Introducing an overloaded "store" which takes the timestamp or
>> implicitly using the SOURCE_DATE_EPOCH environment variable would
>> mean that the Properties.store(...) APIs will continue to always
>> write out a Date representation into the outputstream. That
>> effectively means that there will continue to be no way to disable
>> writing out a (date) comment. More specifically, even if a user
>> doesn't explicitly specify a comment, we would end up writing a
>> default (date) comment. Do we want that? Or while we are doing these
>> changes, should we introduce this ability of disabling writing out
>> these date comments by default? That, IMO, can only be done by
>> introducing new APIs instead of trying to slightly change the spec
>> and the implementation of the current "store" APIs. After all, if any
>> callers do want a date (and a reproducible one at that), they could
>> always do so by passing that as a value for the "comment" parameter
>> of these (new) "storeXXX" APIs.
>
> Properties save/store has always emitted a comment line with the
> current date/time, it goes back to JDK 1.0. It's unfortunate that
> newer store methods didn't re-examine this, also unfortunate that it
> continued with 8859-1. In the discussion on jdk-dev about
> reproducibility then I think we concluded that we don't want to change
> the behavior of existing methods, hence the discussion on introducing
> a new method.
>
> An new overload of store would give the maximum flexibility to new
> code but it would require programs used in builds to use it
> consistently. It's possible that libraries or tools that are using
> Properties::store have no idea that they will ultimately be used in a
> system that is trying to produce reproducible builds. So I have some
> sympathy to the argument that there should a way to omit the date or
> emit it as the Unix epoch time or a fixed value. This would mean
> changing the spec to allow for some implementation means to do this,
> and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH.
> I think Roger has misgivings about this.
>
> So let's list the options and the pros/cons and see if we can converge
> on an approach.
>
Keeping aside the discussion about whether to introduce a new API or
change the spec of the existing API, just for a moment, I think the main
question is whether the current date comment that gets added by the
store(...) is being used by anything out there, other than maybe for
visual aspects while reading the properties file. From the discussions I
have seen so far in the threads on openjdk and other mailing lists, I
don't think anyone is using that comment for anything. So if we are
indeed willing to change the spec of the store(...) API would it be too
big/unacceptable a change to disable writing this date comment, at least
in certain specific cases? With my limited usage of this API, my guess
is that it's higly unlikely that it will break anything - after all it's
a comment that was being added. The only potential breakages perhaps
would be some scripts? But even those, I don't know how it would break
them since anyway the date comment wasn't a constant value. Having said
that, I looked at the patch that you pointed out[1] that got integrated
into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a
bit surprised that the patched implementation decided to write out a
formatted reproducible date instead of patching it to just not write the
date. After all that IMO would have been much simpler and it would
anyway have changed (removed) the exact same line in the code that was
patched. So maybe I'm underestimating the usage of this date comment.
So now coming to the API part of it. The potential ways to tackle this,
that I could think of are as follows:
1. No new APIs to be added. Instead, the existing "store(...)" APIs
specification will be changed to allow for reproducibility of the
content that gets stored. The possible specification changes that we
could attempt are:
1a. The store(...) APIs by default will continue to write out a
date comment, which represents the current date. In the case where
SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will
use that value to construct a (UTC) Date and write out the date comment.
Pros:
----
- Existing callers of these APIs won't have to change their
code to use this new semantics.
- Tools/environments where reproducibility is required, can
configure their environment (and they probably already would have) to
set the SOURCE_DATE_EPOCH value.
Cons:
----
- Behaviour of the same API will change dynamically based on
the current environment of the process. However, this change in
behaviour will not have functional impact on the generated output, since
it's just a comment that changes.
- There is no way to disable writing out comments to the
output. i.e. even if user passes a null "comments" value to the APIs,
there will always be the date comment in the output.
- Requires spec change
OR
1b. The store(...) APIs by default will continue to write out a
date comment, which represents the current date. In the case where
SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will
_not_ write out any date comment. This is unlike 1a where we use the
value of the environment variable to compute the Date. This alternate
approach of not using the value but instead considering the presence of
the environment variable being set, to not write out any date comment
will still provide reproducibility and also will get rid of the
unnecessary date comment.
Pros:
----
- Existing callers of these APIs won't have to change their
code to use this new semantics.
- Tools/environments where reproducibility is required, can
configure their environment (and they probably already would have) to
set the SOURCE_DATE_EPOCH value.
- The date comment will no longer end up in the output in
environments where SOURCE_DATE_EPOCH is set (to any value).
Cons:
----
- Just like 1a, the API output is determined by the current
environment of the process.
- Requires spec change.
- The only way to generate a output without any comments would
require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a
necessity.
- When SOURCE_DATE_EPOCH is set, it will potentially (even if
low probability) will breaks existing tools, code(?) or scripts that
might be looking for some/any date in the comments of the output.
OR
1c. The store(...) APIs which both currently take a "comments"
parameter will write out the date comment only if the "comments"
parameter is null. If the "comments" value is non-null, then, only that
user passed comment will be written out to the output.
Pros:
----
- There will now be a way for the callers to decide what exact
comment appears in the output. It will either be their explicit comment
or a default date comment.
- For reproducibility, the callers can pass any reproducible
comment of their choice and it no longer will be the responsibility of
the store(...) API to decide what values for the comments would provide
reproducibility of the output.
(Note that the store(...) APIs will still take it upon
themselves to write out the property keys in a reproducible order,
that's now been decided in this discussion).
Cons:
----
- This "may" require source code changes in callers' code. If
callers are already passing a comment, they will no longer see the
additional default date comment being added. However, if callers are not
passing any comment, then they will continue to see the default date
comment being added. In essence, callers who aren't passing any comment
to the APIs, MUST change their code, if they want to stop seeing the
date comment.
- There is no way to disable writing out the comment to the
output. Every single store(...) output will have a comment, either user
supplied or the default date comment.
- Requires spec change.
- It will potentially (even if low probability) will breaks
existing tools, code(?) or scripts that might be looking for some/any
date in the comments of the output.
OR
1d. Since we are thinking of changing the spec, I will include this
option as well. The store(...) APIs which both current take the
"comments" parameter will write out the caller passed comment only if
the comment is non-null. No default comment will be written out. If the
caller passed comment is null, then only the property key and values
will be written out and the output will have no comments.
Pros:
----
- No code changes needed in callers' code.
- Gets rid of the (IMO, unnecessary) date comment from the output.
- Callers now have full control over what comment gets written out.
- There's now a way to disable writing out any comments to the
output. If callers provide no comment, then only the property key values
will be written out.
- These APIs will now feel much more natural without any "if
this, else that" or any "magic/default" comments appearing in the output.
- Reproducibility of the comments is now decided by the callers
and it no longer will be the responsibility of these store(...) APIs to
come up with reproducible comments.
(Note that the store(...) APIs will still take it upon
themselves to write out the property keys in a reproducible order,
that's now been decided in this discussion).
Cons:
----
- Requires spec change.
- It will potentially (even if low probability) will breaks
existing tools, code(?) or scripts that might be looking for some/any
date in the comments of the output.
2. Introduce a new (and only one) overloaded API as follows:
public void store(Writer writer, String comments, java.time.Instant
dateComment) throws IOException;
This new API will write out an optional date comment, if the
java.time.Instant parameter value is not null. In such a case where the
Instant value is non-null, a java.time.ZonedDateTime will be computed
out of the Instant value, for the "UTC" ZoneId and the string
representation will be written out. In the case where the Instant
parameter value is null, no date comment will be added in the output.
If the user passes the "comments" value, then that will be written
out as the comments in the output.
Pros:
----
- No spec change needed
- Callers now have control over what date gets written out as a comment
- Callers now have control to completely disable writing out any
comments, by passing null for both "comments" and the Instant parameters
- Reproducibility of the date comment will not be a concern of this
new API and instead is controlled by the callers.
Cons:
----
- Requires code changes in callers' code. Potentially in much more
places than any of other proposed options, since this is a completely
new API.
- The new API still allows the Date comment to be written out,
which IMO adds no value. Furthermore, this now introduces 2 ways to add
a comment - one by passing value to the "comments" parameter and the
other by passing a value to the "java.time.Instant" parameter.
- Introduces/continues with, avoidable, Instant parsing and date
formatting logic/complexity into the new API.
Let me know if I might have missed any other potential option.
[1]
https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=reproducible-properties-timestamp.patch;msg=5
-Jaikiran
More information about the core-libs-dev
mailing list