Proposal: JDK-8231640 - (prop) Canonical property storage
The java.util.Properties class allows the properties to be written out to a stream or through a writer. In its current form, the specification of these APIs state that a comment comprising of the current date is always written out. The spec doesn't make any guarantees about the order of the properties when they are written out. There have been requests asking to make these APIs more deterministic. These requests, from what I have seen, mainly ask for: - A way to disable writing out the date comment - A way to write out the properties in a deterministic and reproducible way There have been discussions in the mailing list in the past which have been captured in JDK-8231640[1]. In these discussions, there has been an inclination to not touch the current existing API implementations and instead introduce new API(s) to achieve the proposed use cases. Before starting off with an implementation, I wanted to try and get some inputs on what the new API(s) would look like and what the scope of such a work should be. Right now, the Properties class has 2 "store" APIs: public void store(Writer writer, String comments) throws IOException public void store(OutputStream out, String comments) throws IOException For the sake of this discussion we won't be talking about the "save" API, in that same class, which has been deprecated. As part of this enhancement, one of the proposed API name is "storeCanonical". As a start, I plan to use this name for the new API. Is there any different suggestion for the name? As for the API signature, I plan to introduce 2 new APIs: public void storeCanonical(Writer writer, String comments) throws IOException public void storeCanonical(OutputStream out, String comments) throws IOException to match what the "store" variants currently provide. The first difference between these "storeCanonical" and the "store" will be the trivial part where these new "storeCanonical" implementations will no longer write out the date comment. Users are allowed to pass optional comments to these new APIs. If any comment is passed to these APIs, these will continue to be written out first, before the actual properties are written out. Speaking of optional comments, should the APIs accept an instance of java.util.Optional for the comments parameter. Perhaps: public void storeCanonical(Writer writer, Optional<String> comments) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments) throws IOException Coming to the part where we write out the properties, these APIs will write out the properties in the lexicographical order of the property keys. An additional enhancement perhaps could be to allow users to pass in an optional java.util.Comparator instance to provide for application specific ordering of the property keys while being written out by these APIs. I am not too sure if we should introduce that. Any inputs? If we do introduce it, we would end up with 4 new APIs: public void storeCanonical(Writer writer, Optional<String> comments) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments) throws IOException public void storeCanonical(Writer writer, Optional<String> comments, Comparator<String> keyOrderer) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments, Comparator<String> keyOrderer) throws IOException Is that worth it? Finally, the other semantics, like the property key value separators, how/where newlines are inserted, what character encoding is used etc... will continue to match with the current semantics of the "store" APIs. [1] https://bugs.openjdk.java.net/browse/JDK-8231640 -Jaikiran
Hi Jaikiran, Thanks for taking this on and getting it started. One use case of canonical storage is repeatable builds. It would be useful to identify the uses in the JDK that would need to be changed to use the new function. On 8/24/21 10:07 AM, Jaikiran Pai wrote:
The java.util.Properties class allows the properties to be written out to a stream or through a writer. In its current form, the specification of these APIs state that a comment comprising of the current date is always written out. The spec doesn't make any guarantees about the order of the properties when they are written out.
There have been requests asking to make these APIs more deterministic. These requests, from what I have seen, mainly ask for:
- A way to disable writing out the date comment - A way to write out the properties in a deterministic and reproducible way
There have been discussions in the mailing list in the past which have been captured in JDK-8231640[1]. In these discussions, there has been an inclination to not touch the current existing API implementations and instead introduce new API(s) to achieve the proposed use cases.
Before starting off with an implementation, I wanted to try and get some inputs on what the new API(s) would look like and what the scope of such a work should be.
Right now, the Properties class has 2 "store" APIs:
public void store(Writer writer, String comments) throws IOException public void store(OutputStream out, String comments) throws IOException
I don't think two methods are needed, its easy enough for the caller to adapt an OutputStream to a Writer (OutputStreamWriter) and take control of the encoding, so the OutputStream version is not essential.
For the sake of this discussion we won't be talking about the "save" API, in that same class, which has been deprecated.
As part of this enhancement, one of the proposed API name is "storeCanonical". As a start, I plan to use this name for the new API. Is there any different suggestion for the name?
As for the API signature, I plan to introduce 2 new APIs:
public void storeCanonical(Writer writer, String comments) throws IOException public void storeCanonical(OutputStream out, String comments) throws IOException
to match what the "store" variants currently provide. The first difference between these "storeCanonical" and the "store" will be the trivial part where these new "storeCanonical" implementations will no longer write out the date comment. Users are allowed to pass optional comments to these new APIs. If any comment is passed to these APIs, these will continue to be written out first, before the actual properties are written out.
'storeCanonical' captures the semantics.
Speaking of optional comments, should the APIs accept an instance of java.util.Optional for the comments parameter. Perhaps:
public void storeCanonical(Writer writer, Optional<String> comments) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments) throws IOException
Optional is overkill here, using null for no comment is conventional and matches the current usage in the store(..) methods.
Coming to the part where we write out the properties, these APIs will write out the properties in the lexicographical order of the property keys. An additional enhancement perhaps could be to allow users to pass in an optional java.util.Comparator instance to provide for application specific ordering of the property keys while being written out by these APIs. I am not too sure if we should introduce that. Any inputs? If we do introduce it, we would end up with 4 new APIs:
public void storeCanonical(Writer writer, Optional<String> comments) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments) throws IOException public void storeCanonical(Writer writer, Optional<String> comments, Comparator<String> keyOrderer) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments, Comparator<String> keyOrderer) throws IOException
Canonical usually already means a non-variable encoding, that seems the inconsistent with providing a Comparator. However, it should be a goal that properties stored with storeCanonical can be loaded with load().
Is that worth it?
Finally, the other semantics, like the property key value separators, how/where newlines are inserted, what character encoding is used etc... will continue to match with the current semantics of the "store" APIs.
If a client has the need for a custom format, its quite easy to iterate over the contents, sorting if desires and writing the format itself. A custom format would not be usable with Properties.load. Simpler is better, I think a single new method should suffice. Thanks, Roger
[1] https://bugs.openjdk.java.net/browse/JDK-8231640
-Jaikiran
Hello Roger, On 24/08/21 8:14 pm, Roger Riggs wrote:
Hi Jaikiran,
Thanks for taking this on and getting it started.
One use case of canonical storage is repeatable builds. It would be useful to identify the uses in the JDK that would need to be changed to use the new function.
Surprisingly, the usage of the current "store" APIs within the JDK appears to be very minimal. I'm hoping I didn't miss them, but from what I see the Properties.store(...) which takes the Writer as a param isn't used in anywhere within source of the JDK. The Properties.store(...) which takes an OutputStream as a parameter gets used in the following few places: 1. jdk.internal.vm.VMSupport#serializePropertiesToByteArray - Looking at the implementation in this method and as the name suggests, the Properties is being written out a byte array. This method gets called from native code, from what I can see. I'll need a bit of help on this one to understand whether that usage in the native code would benefit from switching to the new API (when it is introduced) for reproducibility when it comes to this call in VMSupport#serializePropertiesToByteArray. 2. sun.net.www.MimeTable#saveAsProperties - This is a "protected" method and from what I could see, doesn't get called anywhere in the JDK code. Furthermore, This class resides in an internal package (sun.net.www) and although it implements the public java.net.FileNameMap interface, the "saveAsProperties" isn't a method exposed on that interface. So I think this existing code/method wouldn't need to be updated to use the new API. 3. sun.font.FcFontConfiguration#writeFcInfo - This method writes out a "cache file" which appears to be an internal implementation detail of the FcFontConfiguration. More specifically, the contents of this file apparently is "JDK Font Configuration", going by the comment it adds to that file. This file gets written out to the "user.home" directory and as far as I can see it, is only used/relevant to this FcFontConfiguration. In context of reproducibility and the new API, I think using that new API here could have a potential benefit of reproducibility on the same system. However, someone with more experience of this code would have to review if at all this reproducibility is desirable. These are the only references I could find in the JDK code (I didn't include test cases). One thing I do remember is the JDK build (through the make files) would have certain Java code it would call to do some build steps. Is there a easy way to find all such build related Java files within the JDK? I would like to see if there are any references/calls to this method from those build files. I am doing some searches myself, but knowing where to search will give me more confidence that I haven't missed out any. -Jaikiran
On 25/08/2021 11:51, Jaikiran Pai wrote:
:
1. jdk.internal.vm.VMSupport#serializePropertiesToByteArray - Looking at the implementation in this method and as the name suggests, the Properties is being written out a byte array. This method gets called from native code, from what I can see. I'll need a bit of help on this one to understand whether that usage in the native code would benefit from switching to the new API (when it is introduced) for reproducibility when it comes to this call in VMSupport#serializePropertiesToByteArray.
This is the VM side of the attach mechanism. The API to look at is com.sun.tools.attach.VirtualMachine::getSystemProperties. When this method is invoked (by a tool) then response is a byte stream of the property list (8859-1 encoded). I wouldn't expect you should need to touch this one. -Alan
Hi Jaikiran, I'm glad to see this issue finally getting some love and attention! :) I don't fully support those "inclinations" that say that the old API should not change. I think keeping the old random order of store() would mean a missed chance to do good, otherwise a lot of Java programs will never get reproducible output of property files. We do have a chance of helping the community, in a single stroke, to make lots of application (more) reproducible without any programmer effort. There is a growing community pushing for reproducible builds (see e.g. https://reproducible-builds.org). Java programs tend to have reproducibility issues due to several cases of non-determinism in the Java platform, and I really think we should try to rectify that. The specification for store() does not say anything about the order properties are written, so I think the implementation is free to change this to a deterministic order. As long as files written by an updated version of store() can be read by old versions (and vice versa) there will be no compatibility issue in Java programs. And since the old (current...) order is non-deterministic, it seems exceedingly unlikely that any external program depends on the order currently generated by store(). The problem is with the time stamp, which the spec states should be present. I understand why changing this might need a new method. But I think we should try to steer users to this new method. Otherwise it is likely not to be used by those who should use it (the programmers) to the detrimental effect of users who get properties file which change for no good reason. Having an "attractive" name is definitely part of that. The name should scream "use me as a first hand choice for storing properties to a file". I don't really get that feeling from storeCanonical().
One thing I do remember is the JDK build (through the make files) would have certain Java code it would call to do some build steps. Is there a easy way to find all such build related Java files within the JDK? I would like to see if there are any references/calls to this method from those build files. I am doing some searches myself, but knowing where to search will give me more confidence that I haven't missed out any.
The java buildtool sources are located in the "make" directory, more specifically in "make/jdk/src", "make/langtools/src" and "make/src" (yeah, I know -- a cleanup is way overdue). I did a quick search now but could not find any references to Properties.store(). I'm trying to remember if we create properties during the build... We have several instances where .properties files in the Java source code are converted to hard-coded classes (for performance), and other where .properties files are copied verbatim. Ah, right, they are "cleaned" beforehand. We used to do this by a Java program, but nowadays they are mangled by sed. I think replacing that sed script with a trivial Java program doing storeCanonical() would be on the list of things to do. I can assist with that, when you are starting to get your implementation done. We might also write something as part of the jlink process that gets embedded in the resulting jimage; not quite sure about that. You should find that code in the normal src/ codebase though. /Magnus
On 2021-08-25 14:03, Magnus Ihse Bursie wrote:
Hi Jaikiran,
I'm glad to see this issue finally getting some love and attention! :)
I don't fully support those "inclinations" that say that the old API should not change. I think keeping the old random order of store() would mean a missed chance to do good, otherwise a lot of Java programs will never get reproducible output of property files. We do have a chance of helping the community, in a single stroke, to make lots of application (more) reproducible without any programmer effort. There is a growing community pushing for reproducible builds (see e.g. https://reproducible-builds.org). Java programs tend to have reproducibility issues due to several cases of non-determinism in the Java platform, and I really think we should try to rectify that.
The specification for store() does not say anything about the order properties are written, so I think the implementation is free to change this to a deterministic order. As long as files written by an updated version of store() can be read by old versions (and vice versa) there will be no compatibility issue in Java programs. And since the old (current...) order is non-deterministic, it seems exceedingly unlikely that any external program depends on the order currently generated by store().
The problem is with the time stamp, which the spec states should be present. I understand why changing this might need a new method.
This got me thinking a bit. If we are willing to change the spec slightly, we could use the standard `SOURCE_DATE_EPOCH` environment variable. Many tools support this already. The meaning of this variable is that, if present, tools that generate timestamps should use that value instead. [1] It seems unlikely that a user running Java with `SOURCE_DATE_EPOCH` should get upset if the behavior of Properties.store() changed. Without this variable, store() would generate timestamps as usual of the current time. /Magnus [1] https://reproducible-builds.org/docs/source-date-epoch/
But I think we should try to steer users to this new method. Otherwise it is likely not to be used by those who should use it (the programmers) to the detrimental effect of users who get properties file which change for no good reason. Having an "attractive" name is definitely part of that. The name should scream "use me as a first hand choice for storing properties to a file". I don't really get that feeling from storeCanonical().
One thing I do remember is the JDK build (through the make files) would have certain Java code it would call to do some build steps. Is there a easy way to find all such build related Java files within the JDK? I would like to see if there are any references/calls to this method from those build files. I am doing some searches myself, but knowing where to search will give me more confidence that I haven't missed out any.
The java buildtool sources are located in the "make" directory, more specifically in "make/jdk/src", "make/langtools/src" and "make/src" (yeah, I know -- a cleanup is way overdue). I did a quick search now but could not find any references to Properties.store().
I'm trying to remember if we create properties during the build... We have several instances where .properties files in the Java source code are converted to hard-coded classes (for performance), and other where .properties files are copied verbatim. Ah, right, they are "cleaned" beforehand. We used to do this by a Java program, but nowadays they are mangled by sed. I think replacing that sed script with a trivial Java program doing storeCanonical() would be on the list of things to do. I can assist with that, when you are starting to get your implementation done.
We might also write something as part of the jlink process that gets embedded in the resulting jimage; not quite sure about that. You should find that code in the normal src/ codebase though.
/Magnus
On 25/08/21 5:33 pm, Magnus Ihse Bursie wrote:
...
The problem is with the time stamp, which the spec states should be present. I understand why changing this might need a new method. But I think we should try to steer users to this new method. Otherwise it is likely not to be used by those who should use it (the programmers) to the detrimental effect of users who get properties file which change for no good reason. Having an "attractive" name is definitely part of that. The name should scream "use me as a first hand choice for storing properties to a file". I don't really get that feeling from storeCanonical().
I'm not saying we should do this, but one other way to scream, is to deprecate (but not for removal) the old "store" APIs and point the users to the new ones (if we introduce them). Similar to what "save" API in that same class currently does. -Jaikiran
Hello Magnus, On 25/08/21 5:33 pm, Magnus Ihse Bursie wrote:
...
One thing I do remember is the JDK build (through the make files) would have certain Java code it would call to do some build steps. Is there a easy way to find all such build related Java files within the JDK? I would like to see if there are any references/calls to this method from those build files. I am doing some searches myself, but knowing where to search will give me more confidence that I haven't missed out any.
The java buildtool sources are located in the "make" directory, more specifically in "make/jdk/src", "make/langtools/src" and "make/src" (yeah, I know -- a cleanup is way overdue). I did a quick search now but could not find any references to Properties.store().
Thank you for that. I went ahead and verified it again and like you note, I don't see any usages in this code.
I'm trying to remember if we create properties during the build... We have several instances where .properties files in the Java source code are converted to hard-coded classes (for performance), and other where .properties files are copied verbatim. Ah, right, they are "cleaned" beforehand. We used to do this by a Java program, but nowadays they are mangled by sed. I think replacing that sed script with a trivial Java program doing storeCanonical() would be on the list of things to do. I can assist with that, when you are starting to get your implementation done.
Thank you. Once the initial draft version is ready, I will contact you.
We might also write something as part of the jlink process that gets embedded in the resulting jimage; not quite sure about that. You should find that code in the normal src/ codebase though.
I had a look at the the jlink and image building code and I haven't found any references/usages in this area. -Jaikiran
Hello Roger, Sorry, I just read the top part of your reply the last time and didn't realize there were inline comments. I just noticed them. Replying inline. On 24/08/21 8:14 pm, Roger Riggs wrote:
Hi Jaikiran,
Thanks for taking this on and getting it started.
One use case of canonical storage is repeatable builds. It would be useful to identify the uses in the JDK that would need to be changed to use the new function.
On 8/24/21 10:07 AM, Jaikiran Pai wrote:
The java.util.Properties class allows the properties to be written out to a stream or through a writer. In its current form, the specification of these APIs state that a comment comprising of the current date is always written out. The spec doesn't make any guarantees about the order of the properties when they are written out.
There have been requests asking to make these APIs more deterministic. These requests, from what I have seen, mainly ask for:
- A way to disable writing out the date comment - A way to write out the properties in a deterministic and reproducible way
There have been discussions in the mailing list in the past which have been captured in JDK-8231640[1]. In these discussions, there has been an inclination to not touch the current existing API implementations and instead introduce new API(s) to achieve the proposed use cases.
Before starting off with an implementation, I wanted to try and get some inputs on what the new API(s) would look like and what the scope of such a work should be.
Right now, the Properties class has 2 "store" APIs:
public void store(Writer writer, String comments) throws IOException public void store(OutputStream out, String comments) throws IOException
I don't think two methods are needed, its easy enough for the caller to adapt an OutputStream to a Writer (OutputStreamWriter) and take control of the encoding, so the OutputStream version is not essential.
That's a good point and makes sense.
Speaking of optional comments, should the APIs accept an instance of java.util.Optional for the comments parameter. Perhaps:
public void storeCanonical(Writer writer, Optional<String> comments) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments) throws IOException
Optional is overkill here, using null for no comment is conventional and matches the current usage in the store(..) methods.
Okay. Not using Optional sounds fine.
Coming to the part where we write out the properties, these APIs will write out the properties in the lexicographical order of the property keys. An additional enhancement perhaps could be to allow users to pass in an optional java.util.Comparator instance to provide for application specific ordering of the property keys while being written out by these APIs. I am not too sure if we should introduce that. Any inputs? If we do introduce it, we would end up with 4 new APIs:
public void storeCanonical(Writer writer, Optional<String> comments) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments) throws IOException public void storeCanonical(Writer writer, Optional<String> comments, Comparator<String> keyOrderer) throws IOException public void storeCanonical(OutputStream out, Optional<String> comments, Comparator<String> keyOrderer) throws IOException
Canonical usually already means a non-variable encoding, that seems the inconsistent with providing a Comparator.
From the inputs received so far, there hasn't been a real use case where a custom user provided Comparator would be of genuine help. So I don't plan to look more into this aspect.
However, it should be a goal that properties stored with storeCanonical can be loaded with load().
Agreed.
Is that worth it?
Finally, the other semantics, like the property key value separators, how/where newlines are inserted, what character encoding is used etc... will continue to match with the current semantics of the "store" APIs.
If a client has the need for a custom format, its quite easy to iterate over the contents, sorting if desires and writing the format itself. A custom format would not be usable with Properties.load.
Simpler is better,
Agreed. -Jaikiran
On 24/08/2021 15:07, Jaikiran Pai wrote:
The java.util.Properties class allows the properties to be written out to a stream or through a writer. In its current form, the specification of these APIs state that a comment comprising of the current date is always written out. The spec doesn't make any guarantees about the order of the properties when they are written out.
There have been requests asking to make these APIs more deterministic. These requests, from what I have seen, mainly ask for:
- A way to disable writing out the date comment - A way to write out the properties in a deterministic and reproducible way
There have been discussions in the mailing list in the past which have been captured in JDK-8231640[1]. In these discussions, there has been an inclination to not touch the current existing API implementations and instead introduce new API(s) to achieve the proposed use cases.
Before starting off with an implementation, I wanted to try and get some inputs on what the new API(s) would look like and what the scope of such a work should be.
Another possibility is to add an overload of store that adds a java.time.Instant argument for the timestamp. It could be specified as Instant.EPOCH where reproducibility is required. Magnus brings up the possibility of specifying that the store methods look for SOURCE_DATE_EPOCH. As it happens, someone tried that [1]. With a spec change and the correct implementation then it should be feasible option to explore. We touched on the ordering issue in the jdk-dev discussion in 2019. I think that can be changed to write in sorted key order without a spec change, although it could be a useful property to specify. -Alan [1] https://bugs.openjdk.java.net/browse/JDK-8272157
Hi, For the occasional human reader of a property file, a deterministic order would make them easier to use as well. The natural sort order of the keys as String would be suitable. I'd support changing the order written without changing the API. Load and other readers of property files have no expectation of an order; there is almost no compatibility risk. Providing an alternate timestamp (or no timestamp) would need a new method. An overload of store(...) would be consistent with current uses. To produce a compatible format, including a timezone, it would need to be either java.util.Date or java.time.ZonedDateTime. (I'm not a fan of system properties that change API behavior). Regards, Roger On 8/25/21 8:51 AM, Alan Bateman wrote:
On 24/08/2021 15:07, Jaikiran Pai wrote:
The java.util.Properties class allows the properties to be written out to a stream or through a writer. In its current form, the specification of these APIs state that a comment comprising of the current date is always written out. The spec doesn't make any guarantees about the order of the properties when they are written out.
There have been requests asking to make these APIs more deterministic. These requests, from what I have seen, mainly ask for:
- A way to disable writing out the date comment - A way to write out the properties in a deterministic and reproducible way
There have been discussions in the mailing list in the past which have been captured in JDK-8231640[1]. In these discussions, there has been an inclination to not touch the current existing API implementations and instead introduce new API(s) to achieve the proposed use cases.
Before starting off with an implementation, I wanted to try and get some inputs on what the new API(s) would look like and what the scope of such a work should be.
Another possibility is to add an overload of store that adds a java.time.Instant argument for the timestamp. It could be specified as Instant.EPOCH where reproducibility is required.
Magnus brings up the possibility of specifying that the store methods look for SOURCE_DATE_EPOCH. As it happens, someone tried that [1]. With a spec change and the correct implementation then it should be feasible option to explore.
We touched on the ordering issue in the jdk-dev discussion in 2019. I think that can be changed to write in sorted key order without a spec change, although it could be a useful property to specify.
-Alan
Hi, Not all properties file might benefit from being sorted in key ordering. Think for instance about the logging.properties file - where you would certainly want to keep the original ordering. That said - we rarely store() a logging properties file, and if we stored it, key might appear in random order anyway (due to HashMap hashing) - so maybe that's OK after all... As long as the build doesn't fiddle with these... :-) cheers, -- daniel On 25/08/2021 15:06, Roger Riggs wrote:
Hi,
For the occasional human reader of a property file, a deterministic order would make them easier to use as well. The natural sort order of the keys as String would be suitable.
I'd support changing the order written without changing the API. Load and other readers of property files have no expectation of an order; there is almost no compatibility risk.
Providing an alternate timestamp (or no timestamp) would need a new method. An overload of store(...) would be consistent with current uses. To produce a compatible format, including a timezone, it would need to be either java.util.Date or java.time.ZonedDateTime.
(I'm not a fan of system properties that change API behavior).
Regards, Roger
On 25/08/21 6:21 pm, Alan Bateman wrote:
On 24/08/2021 15:07, Jaikiran Pai wrote:
The java.util.Properties class allows the properties to be written out to a stream or through a writer. In its current form, the specification of these APIs state that a comment comprising of the current date is always written out. The spec doesn't make any guarantees about the order of the properties when they are written out.
There have been requests asking to make these APIs more deterministic. These requests, from what I have seen, mainly ask for:
- A way to disable writing out the date comment - A way to write out the properties in a deterministic and reproducible way
There have been discussions in the mailing list in the past which have been captured in JDK-8231640[1]. In these discussions, there has been an inclination to not touch the current existing API implementations and instead introduce new API(s) to achieve the proposed use cases.
Before starting off with an implementation, I wanted to try and get some inputs on what the new API(s) would look like and what the scope of such a work should be.
Another possibility is to add an overload of store that adds a java.time.Instant argument for the timestamp. It could be specified as Instant.EPOCH where reproducibility is required.
Magnus brings up the possibility of specifying that the store methods look for SOURCE_DATE_EPOCH. As it happens, someone tried that [1]. With a spec change and the correct implementation then it should be feasible option to explore.
Introducing an overloaded "store" which takes the timestamp or implicitly using the SOURCE_DATE_EPOCH environment variable would mean that the Properties.store(...) APIs will continue to always write out a Date representation into the outputstream. That effectively means that there will continue to be no way to disable writing out a (date) comment. More specifically, even if a user doesn't explicitly specify a comment, we would end up writing a default (date) comment. Do we want that? Or while we are doing these changes, should we introduce this ability of disabling writing out these date comments by default? That, IMO, can only be done by introducing new APIs instead of trying to slightly change the spec and the implementation of the current "store" APIs. After all, if any callers do want a date (and a reproducible one at that), they could always do so by passing that as a value for the "comment" parameter of these (new) "storeXXX" APIs.
We touched on the ordering issue in the jdk-dev discussion in 2019. I think that can be changed to write in sorted key order without a spec change, although it could be a useful property to specify.
I think, based on the discussion/inputs so far, there's clarity on this part that changing the current implementation of "store" to write out the property keys in a specific order would be a good thing, irrespective of whether or not we introduce new APIs to deal with the date comment aspect. Daniel had an interesting point of logging.properties file and the order that would need to be maintained for those, but like he noted in that same reply, I don't think that's a use case to consider for the "store" APIs, since they never previously supported/guaranteed that use case. -Jaikiran
On 25/08/2021 15:57, Jaikiran Pai wrote:
:
Introducing an overloaded "store" which takes the timestamp or implicitly using the SOURCE_DATE_EPOCH environment variable would mean that the Properties.store(...) APIs will continue to always write out a Date representation into the outputstream. That effectively means that there will continue to be no way to disable writing out a (date) comment. More specifically, even if a user doesn't explicitly specify a comment, we would end up writing a default (date) comment. Do we want that? Or while we are doing these changes, should we introduce this ability of disabling writing out these date comments by default? That, IMO, can only be done by introducing new APIs instead of trying to slightly change the spec and the implementation of the current "store" APIs. After all, if any callers do want a date (and a reproducible one at that), they could always do so by passing that as a value for the "comment" parameter of these (new) "storeXXX" APIs.
Properties save/store has always emitted a comment line with the current date/time, it goes back to JDK 1.0. It's unfortunate that newer store methods didn't re-examine this, also unfortunate that it continued with 8859-1. In the discussion on jdk-dev about reproducibility then I think we concluded that we don't want to change the behavior of existing methods, hence the discussion on introducing a new method. An new overload of store would give the maximum flexibility to new code but it would require programs used in builds to use it consistently. It's possible that libraries or tools that are using Properties::store have no idea that they will ultimately be used in a system that is trying to produce reproducible builds. So I have some sympathy to the argument that there should a way to omit the date or emit it as the Unix epoch time or a fixed value. This would mean changing the spec to allow for some implementation means to do this, and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH. I think Roger has misgivings about this. So let's list the options and the pros/cons and see if we can converge on an approach.
I think, based on the discussion/inputs so far, there's clarity on this part that changing the current implementation of "store" to write out the property keys in a specific order would be a good thing, irrespective of whether or not we introduce new APIs to deal with the date comment aspect. Daniel had an interesting point of logging.properties file and the order that would need to be maintained for those, but like he noted in that same reply, I don't think that's a use case to consider for the "store" APIs, since they never previously supported/guaranteed that use case. Yes, I think we should be okay there. I suspect the iteration order has changed once or twice already, e.g. when Properties was re-implemented to use a CHM.
-Alan
On 26/08/21 5:06 pm, Alan Bateman wrote:
On 25/08/2021 15:57, Jaikiran Pai wrote:
:
Introducing an overloaded "store" which takes the timestamp or implicitly using the SOURCE_DATE_EPOCH environment variable would mean that the Properties.store(...) APIs will continue to always write out a Date representation into the outputstream. That effectively means that there will continue to be no way to disable writing out a (date) comment. More specifically, even if a user doesn't explicitly specify a comment, we would end up writing a default (date) comment. Do we want that? Or while we are doing these changes, should we introduce this ability of disabling writing out these date comments by default? That, IMO, can only be done by introducing new APIs instead of trying to slightly change the spec and the implementation of the current "store" APIs. After all, if any callers do want a date (and a reproducible one at that), they could always do so by passing that as a value for the "comment" parameter of these (new) "storeXXX" APIs.
Properties save/store has always emitted a comment line with the current date/time, it goes back to JDK 1.0. It's unfortunate that newer store methods didn't re-examine this, also unfortunate that it continued with 8859-1. In the discussion on jdk-dev about reproducibility then I think we concluded that we don't want to change the behavior of existing methods, hence the discussion on introducing a new method.
An new overload of store would give the maximum flexibility to new code but it would require programs used in builds to use it consistently. It's possible that libraries or tools that are using Properties::store have no idea that they will ultimately be used in a system that is trying to produce reproducible builds. So I have some sympathy to the argument that there should a way to omit the date or emit it as the Unix epoch time or a fixed value. This would mean changing the spec to allow for some implementation means to do this, and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH. I think Roger has misgivings about this.
So let's list the options and the pros/cons and see if we can converge on an approach.
Keeping aside the discussion about whether to introduce a new API or change the spec of the existing API, just for a moment, I think the main question is whether the current date comment that gets added by the store(...) is being used by anything out there, other than maybe for visual aspects while reading the properties file. From the discussions I have seen so far in the threads on openjdk and other mailing lists, I don't think anyone is using that comment for anything. So if we are indeed willing to change the spec of the store(...) API would it be too big/unacceptable a change to disable writing this date comment, at least in certain specific cases? With my limited usage of this API, my guess is that it's higly unlikely that it will break anything - after all it's a comment that was being added. The only potential breakages perhaps would be some scripts? But even those, I don't know how it would break them since anyway the date comment wasn't a constant value. Having said that, I looked at the patch that you pointed out[1] that got integrated into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a bit surprised that the patched implementation decided to write out a formatted reproducible date instead of patching it to just not write the date. After all that IMO would have been much simpler and it would anyway have changed (removed) the exact same line in the code that was patched. So maybe I'm underestimating the usage of this date comment. So now coming to the API part of it. The potential ways to tackle this, that I could think of are as follows: 1. No new APIs to be added. Instead, the existing "store(...)" APIs specification will be changed to allow for reproducibility of the content that gets stored. The possible specification changes that we could attempt are: 1a. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will use that value to construct a (UTC) Date and write out the date comment. Pros: ---- - Existing callers of these APIs won't have to change their code to use this new semantics. - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value. Cons: ---- - Behaviour of the same API will change dynamically based on the current environment of the process. However, this change in behaviour will not have functional impact on the generated output, since it's just a comment that changes. - There is no way to disable writing out comments to the output. i.e. even if user passes a null "comments" value to the APIs, there will always be the date comment in the output. - Requires spec change OR 1b. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will _not_ write out any date comment. This is unlike 1a where we use the value of the environment variable to compute the Date. This alternate approach of not using the value but instead considering the presence of the environment variable being set, to not write out any date comment will still provide reproducibility and also will get rid of the unnecessary date comment. Pros: ---- - Existing callers of these APIs won't have to change their code to use this new semantics. - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value. - The date comment will no longer end up in the output in environments where SOURCE_DATE_EPOCH is set (to any value). Cons: ---- - Just like 1a, the API output is determined by the current environment of the process. - Requires spec change. - The only way to generate a output without any comments would require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a necessity. - When SOURCE_DATE_EPOCH is set, it will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output. OR 1c. The store(...) APIs which both currently take a "comments" parameter will write out the date comment only if the "comments" parameter is null. If the "comments" value is non-null, then, only that user passed comment will be written out to the output. Pros: ---- - There will now be a way for the callers to decide what exact comment appears in the output. It will either be their explicit comment or a default date comment. - For reproducibility, the callers can pass any reproducible comment of their choice and it no longer will be the responsibility of the store(...) API to decide what values for the comments would provide reproducibility of the output. (Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion). Cons: ---- - This "may" require source code changes in callers' code. If callers are already passing a comment, they will no longer see the additional default date comment being added. However, if callers are not passing any comment, then they will continue to see the default date comment being added. In essence, callers who aren't passing any comment to the APIs, MUST change their code, if they want to stop seeing the date comment. - There is no way to disable writing out the comment to the output. Every single store(...) output will have a comment, either user supplied or the default date comment. - Requires spec change. - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output. OR 1d. Since we are thinking of changing the spec, I will include this option as well. The store(...) APIs which both current take the "comments" parameter will write out the caller passed comment only if the comment is non-null. No default comment will be written out. If the caller passed comment is null, then only the property key and values will be written out and the output will have no comments. Pros: ---- - No code changes needed in callers' code. - Gets rid of the (IMO, unnecessary) date comment from the output. - Callers now have full control over what comment gets written out. - There's now a way to disable writing out any comments to the output. If callers provide no comment, then only the property key values will be written out. - These APIs will now feel much more natural without any "if this, else that" or any "magic/default" comments appearing in the output. - Reproducibility of the comments is now decided by the callers and it no longer will be the responsibility of these store(...) APIs to come up with reproducible comments. (Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion). Cons: ---- - Requires spec change. - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output. 2. Introduce a new (and only one) overloaded API as follows: public void store(Writer writer, String comments, java.time.Instant dateComment) throws IOException; This new API will write out an optional date comment, if the java.time.Instant parameter value is not null. In such a case where the Instant value is non-null, a java.time.ZonedDateTime will be computed out of the Instant value, for the "UTC" ZoneId and the string representation will be written out. In the case where the Instant parameter value is null, no date comment will be added in the output. If the user passes the "comments" value, then that will be written out as the comments in the output. Pros: ---- - No spec change needed - Callers now have control over what date gets written out as a comment - Callers now have control to completely disable writing out any comments, by passing null for both "comments" and the Instant parameters - Reproducibility of the date comment will not be a concern of this new API and instead is controlled by the callers. Cons: ---- - Requires code changes in callers' code. Potentially in much more places than any of other proposed options, since this is a completely new API. - The new API still allows the Date comment to be written out, which IMO adds no value. Furthermore, this now introduces 2 ways to add a comment - one by passing value to the "comments" parameter and the other by passing a value to the "java.time.Instant" parameter. - Introduces/continues with, avoidable, Instant parsing and date formatting logic/complexity into the new API. Let me know if I might have missed any other potential option. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=repr... -Jaikiran
Hi Jaikiran, What about writing the keys in natural ordering? Has that been abandoned, or will the implementation silently do it, or will it require a new API? AFAIU there are two problems that are hostile to reproducible builds: - date comments differ from store to store - property ordering may change from VM to VM due to differences in hashing best regards, -- daniel On 27/08/2021 15:16, Jaikiran Pai wrote:
On 26/08/21 5:06 pm, Alan Bateman wrote:
On 25/08/2021 15:57, Jaikiran Pai wrote:
:
Introducing an overloaded "store" which takes the timestamp or implicitly using the SOURCE_DATE_EPOCH environment variable would mean that the Properties.store(...) APIs will continue to always write out a Date representation into the outputstream. That effectively means that there will continue to be no way to disable writing out a (date) comment. More specifically, even if a user doesn't explicitly specify a comment, we would end up writing a default (date) comment. Do we want that? Or while we are doing these changes, should we introduce this ability of disabling writing out these date comments by default? That, IMO, can only be done by introducing new APIs instead of trying to slightly change the spec and the implementation of the current "store" APIs. After all, if any callers do want a date (and a reproducible one at that), they could always do so by passing that as a value for the "comment" parameter of these (new) "storeXXX" APIs.
Properties save/store has always emitted a comment line with the current date/time, it goes back to JDK 1.0. It's unfortunate that newer store methods didn't re-examine this, also unfortunate that it continued with 8859-1. In the discussion on jdk-dev about reproducibility then I think we concluded that we don't want to change the behavior of existing methods, hence the discussion on introducing a new method.
An new overload of store would give the maximum flexibility to new code but it would require programs used in builds to use it consistently. It's possible that libraries or tools that are using Properties::store have no idea that they will ultimately be used in a system that is trying to produce reproducible builds. So I have some sympathy to the argument that there should a way to omit the date or emit it as the Unix epoch time or a fixed value. This would mean changing the spec to allow for some implementation means to do this, and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH. I think Roger has misgivings about this.
So let's list the options and the pros/cons and see if we can converge on an approach.
Keeping aside the discussion about whether to introduce a new API or change the spec of the existing API, just for a moment, I think the main question is whether the current date comment that gets added by the store(...) is being used by anything out there, other than maybe for visual aspects while reading the properties file. From the discussions I have seen so far in the threads on openjdk and other mailing lists, I don't think anyone is using that comment for anything. So if we are indeed willing to change the spec of the store(...) API would it be too big/unacceptable a change to disable writing this date comment, at least in certain specific cases? With my limited usage of this API, my guess is that it's higly unlikely that it will break anything - after all it's a comment that was being added. The only potential breakages perhaps would be some scripts? But even those, I don't know how it would break them since anyway the date comment wasn't a constant value. Having said that, I looked at the patch that you pointed out[1] that got integrated into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a bit surprised that the patched implementation decided to write out a formatted reproducible date instead of patching it to just not write the date. After all that IMO would have been much simpler and it would anyway have changed (removed) the exact same line in the code that was patched. So maybe I'm underestimating the usage of this date comment.
So now coming to the API part of it. The potential ways to tackle this, that I could think of are as follows:
1. No new APIs to be added. Instead, the existing "store(...)" APIs specification will be changed to allow for reproducibility of the content that gets stored. The possible specification changes that we could attempt are:
1a. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will use that value to construct a (UTC) Date and write out the date comment.
Pros: ---- - Existing callers of these APIs won't have to change their code to use this new semantics. - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value.
Cons: ---- - Behaviour of the same API will change dynamically based on the current environment of the process. However, this change in behaviour will not have functional impact on the generated output, since it's just a comment that changes. - There is no way to disable writing out comments to the output. i.e. even if user passes a null "comments" value to the APIs, there will always be the date comment in the output. - Requires spec change
OR
1b. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will _not_ write out any date comment. This is unlike 1a where we use the value of the environment variable to compute the Date. This alternate approach of not using the value but instead considering the presence of the environment variable being set, to not write out any date comment will still provide reproducibility and also will get rid of the unnecessary date comment.
Pros: ---- - Existing callers of these APIs won't have to change their code to use this new semantics. - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value. - The date comment will no longer end up in the output in environments where SOURCE_DATE_EPOCH is set (to any value).
Cons: ---- - Just like 1a, the API output is determined by the current environment of the process. - Requires spec change. - The only way to generate a output without any comments would require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a necessity. - When SOURCE_DATE_EPOCH is set, it will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.
OR
1c. The store(...) APIs which both currently take a "comments" parameter will write out the date comment only if the "comments" parameter is null. If the "comments" value is non-null, then, only that user passed comment will be written out to the output.
Pros: ---- - There will now be a way for the callers to decide what exact comment appears in the output. It will either be their explicit comment or a default date comment. - For reproducibility, the callers can pass any reproducible comment of their choice and it no longer will be the responsibility of the store(...) API to decide what values for the comments would provide reproducibility of the output.
(Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion).
Cons: ---- - This "may" require source code changes in callers' code. If callers are already passing a comment, they will no longer see the additional default date comment being added. However, if callers are not passing any comment, then they will continue to see the default date comment being added. In essence, callers who aren't passing any comment to the APIs, MUST change their code, if they want to stop seeing the date comment. - There is no way to disable writing out the comment to the output. Every single store(...) output will have a comment, either user supplied or the default date comment. - Requires spec change. - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.
OR
1d. Since we are thinking of changing the spec, I will include this option as well. The store(...) APIs which both current take the "comments" parameter will write out the caller passed comment only if the comment is non-null. No default comment will be written out. If the caller passed comment is null, then only the property key and values will be written out and the output will have no comments.
Pros: ---- - No code changes needed in callers' code. - Gets rid of the (IMO, unnecessary) date comment from the output. - Callers now have full control over what comment gets written out. - There's now a way to disable writing out any comments to the output. If callers provide no comment, then only the property key values will be written out. - These APIs will now feel much more natural without any "if this, else that" or any "magic/default" comments appearing in the output. - Reproducibility of the comments is now decided by the callers and it no longer will be the responsibility of these store(...) APIs to come up with reproducible comments.
(Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion).
Cons: ---- - Requires spec change. - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.
2. Introduce a new (and only one) overloaded API as follows:
public void store(Writer writer, String comments, java.time.Instant dateComment) throws IOException;
This new API will write out an optional date comment, if the java.time.Instant parameter value is not null. In such a case where the Instant value is non-null, a java.time.ZonedDateTime will be computed out of the Instant value, for the "UTC" ZoneId and the string representation will be written out. In the case where the Instant parameter value is null, no date comment will be added in the output. If the user passes the "comments" value, then that will be written out as the comments in the output.
Pros: ---- - No spec change needed - Callers now have control over what date gets written out as a comment - Callers now have control to completely disable writing out any comments, by passing null for both "comments" and the Instant parameters - Reproducibility of the date comment will not be a concern of this new API and instead is controlled by the callers.
Cons: ---- - Requires code changes in callers' code. Potentially in much more places than any of other proposed options, since this is a completely new API. - The new API still allows the Date comment to be written out, which IMO adds no value. Furthermore, this now introduces 2 ways to add a comment - one by passing value to the "comments" parameter and the other by passing a value to the "java.time.Instant" parameter. - Introduces/continues with, avoidable, Instant parsing and date formatting logic/complexity into the new API.
Let me know if I might have missed any other potential option.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=repr...
-Jaikiran
Hello Daniel, On 27/08/21 8:05 pm, Daniel Fuchs wrote:
Hi Jaikiran,
What about writing the keys in natural ordering? Has that been abandoned, or will the implementation silently do it, or will it require a new API?
In the discussion so far, there has been no opposition to changing the implementation of these existing store(...) APIs to use natural sorting order for writing out the keys. In fact, there seems to be an agreement that the current 2 store(...) API implementations and any new API that we might introduce to tackle the date comment issue, must use natural key ordering for writing out the properties. Whether or not this implementation detail will be noted/mentioned in the spec of these APIs hasn't yet been decided. So it's just the date comment which requires some kind of agreement on how we deal with it. -Jaikiran
On 27/08/21 8:12 pm, Jaikiran Pai wrote:
Hello Daniel,
On 27/08/21 8:05 pm, Daniel Fuchs wrote:
Hi Jaikiran,
What about writing the keys in natural ordering? Has that been abandoned, or will the implementation silently do it, or will it require a new API?
In the discussion so far, there has been no opposition to changing the implementation of these existing store(...) APIs to use natural sorting order for writing out the keys.
Just to avoid any kind of confusion - when I say natural sorting order of keys, I mean the implementation will be changed to lexicographically compare the keys as noted in the compareTo[1] method of java.lang.String. [1] https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/lang/Strin...) -Jaikiran
Hi, It would seem useful in the long term to move the inclusion of the date to be part of the comment and it would be up to the caller of store() to include a date (as a string) where ever appropriate. And drop the writing hardcoded date; but its not clear how to get there given compatibility concerns. One more option would be to add a separate method on a Properties instance to set or omit the date written by store(). It would not have a global affect on all Properties.store calls but would be specific to an instance. It would still require callers to have access to the instance but not have to modify the immediate caller of store(). Without a better understanding of the uses that affect reproducible builds, this might not be able to address them all. There is likely no good (clean) solution since the reproducible build goal implies a global modification of behavior and all the alternatives that are specific to Properties have only a local effect. $.02, Roger On 8/27/21 10:16 AM, Jaikiran Pai wrote:
On 26/08/21 5:06 pm, Alan Bateman wrote:
On 25/08/2021 15:57, Jaikiran Pai wrote:
:
Introducing an overloaded "store" which takes the timestamp or implicitly using the SOURCE_DATE_EPOCH environment variable would mean that the Properties.store(...) APIs will continue to always write out a Date representation into the outputstream. That effectively means that there will continue to be no way to disable writing out a (date) comment. More specifically, even if a user doesn't explicitly specify a comment, we would end up writing a default (date) comment. Do we want that? Or while we are doing these changes, should we introduce this ability of disabling writing out these date comments by default? That, IMO, can only be done by introducing new APIs instead of trying to slightly change the spec and the implementation of the current "store" APIs. After all, if any callers do want a date (and a reproducible one at that), they could always do so by passing that as a value for the "comment" parameter of these (new) "storeXXX" APIs.
Properties save/store has always emitted a comment line with the current date/time, it goes back to JDK 1.0. It's unfortunate that newer store methods didn't re-examine this, also unfortunate that it continued with 8859-1. In the discussion on jdk-dev about reproducibility then I think we concluded that we don't want to change the behavior of existing methods, hence the discussion on introducing a new method.
An new overload of store would give the maximum flexibility to new code but it would require programs used in builds to use it consistently. It's possible that libraries or tools that are using Properties::store have no idea that they will ultimately be used in a system that is trying to produce reproducible builds. So I have some sympathy to the argument that there should a way to omit the date or emit it as the Unix epoch time or a fixed value. This would mean changing the spec to allow for some implementation means to do this, and maybe an implNote that documents that it reads SOURCE_DATE_EPOCH. I think Roger has misgivings about this.
So let's list the options and the pros/cons and see if we can converge on an approach.
Keeping aside the discussion about whether to introduce a new API or change the spec of the existing API, just for a moment, I think the main question is whether the current date comment that gets added by the store(...) is being used by anything out there, other than maybe for visual aspects while reading the properties file. From the discussions I have seen so far in the threads on openjdk and other mailing lists, I don't think anyone is using that comment for anything. So if we are indeed willing to change the spec of the store(...) API would it be too big/unacceptable a change to disable writing this date comment, at least in certain specific cases? With my limited usage of this API, my guess is that it's higly unlikely that it will break anything - after all it's a comment that was being added. The only potential breakages perhaps would be some scripts? But even those, I don't know how it would break them since anyway the date comment wasn't a constant value. Having said that, I looked at the patch that you pointed out[1] that got integrated into the JDK shipped in Ubuntu, for introducing reproducibility. I'm a bit surprised that the patched implementation decided to write out a formatted reproducible date instead of patching it to just not write the date. After all that IMO would have been much simpler and it would anyway have changed (removed) the exact same line in the code that was patched. So maybe I'm underestimating the usage of this date comment.
So now coming to the API part of it. The potential ways to tackle this, that I could think of are as follows:
1. No new APIs to be added. Instead, the existing "store(...)" APIs specification will be changed to allow for reproducibility of the content that gets stored. The possible specification changes that we could attempt are:
1a. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will use that value to construct a (UTC) Date and write out the date comment.
Pros: ---- - Existing callers of these APIs won't have to change their code to use this new semantics. - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value.
Cons: ---- - Behaviour of the same API will change dynamically based on the current environment of the process. However, this change in behaviour will not have functional impact on the generated output, since it's just a comment that changes. - There is no way to disable writing out comments to the output. i.e. even if user passes a null "comments" value to the APIs, there will always be the date comment in the output. - Requires spec change
OR
1b. The store(...) APIs by default will continue to write out a date comment, which represents the current date. In the case where SOURCE_DATE_EPOCH environment variable is set, the store(...) APIs will _not_ write out any date comment. This is unlike 1a where we use the value of the environment variable to compute the Date. This alternate approach of not using the value but instead considering the presence of the environment variable being set, to not write out any date comment will still provide reproducibility and also will get rid of the unnecessary date comment.
Pros: ---- - Existing callers of these APIs won't have to change their code to use this new semantics. - Tools/environments where reproducibility is required, can configure their environment (and they probably already would have) to set the SOURCE_DATE_EPOCH value. - The date comment will no longer end up in the output in environments where SOURCE_DATE_EPOCH is set (to any value).
Cons: ---- - Just like 1a, the API output is determined by the current environment of the process. - Requires spec change. - The only way to generate a output without any comments would require setting the SOURCE_DATE_EPOCH even when reproducibility isn't a necessity. - When SOURCE_DATE_EPOCH is set, it will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.
OR
1c. The store(...) APIs which both currently take a "comments" parameter will write out the date comment only if the "comments" parameter is null. If the "comments" value is non-null, then, only that user passed comment will be written out to the output.
Pros: ---- - There will now be a way for the callers to decide what exact comment appears in the output. It will either be their explicit comment or a default date comment. - For reproducibility, the callers can pass any reproducible comment of their choice and it no longer will be the responsibility of the store(...) API to decide what values for the comments would provide reproducibility of the output.
(Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion).
Cons: ---- - This "may" require source code changes in callers' code. If callers are already passing a comment, they will no longer see the additional default date comment being added. However, if callers are not passing any comment, then they will continue to see the default date comment being added. In essence, callers who aren't passing any comment to the APIs, MUST change their code, if they want to stop seeing the date comment. - There is no way to disable writing out the comment to the output. Every single store(...) output will have a comment, either user supplied or the default date comment. - Requires spec change. - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.
OR
1d. Since we are thinking of changing the spec, I will include this option as well. The store(...) APIs which both current take the "comments" parameter will write out the caller passed comment only if the comment is non-null. No default comment will be written out. If the caller passed comment is null, then only the property key and values will be written out and the output will have no comments.
Pros: ---- - No code changes needed in callers' code. - Gets rid of the (IMO, unnecessary) date comment from the output. - Callers now have full control over what comment gets written out. - There's now a way to disable writing out any comments to the output. If callers provide no comment, then only the property key values will be written out. - These APIs will now feel much more natural without any "if this, else that" or any "magic/default" comments appearing in the output. - Reproducibility of the comments is now decided by the callers and it no longer will be the responsibility of these store(...) APIs to come up with reproducible comments.
(Note that the store(...) APIs will still take it upon themselves to write out the property keys in a reproducible order, that's now been decided in this discussion).
Cons: ---- - Requires spec change. - It will potentially (even if low probability) will breaks existing tools, code(?) or scripts that might be looking for some/any date in the comments of the output.
2. Introduce a new (and only one) overloaded API as follows:
public void store(Writer writer, String comments, java.time.Instant dateComment) throws IOException;
This new API will write out an optional date comment, if the java.time.Instant parameter value is not null. In such a case where the Instant value is non-null, a java.time.ZonedDateTime will be computed out of the Instant value, for the "UTC" ZoneId and the string representation will be written out. In the case where the Instant parameter value is null, no date comment will be added in the output. If the user passes the "comments" value, then that will be written out as the comments in the output.
Pros: ---- - No spec change needed - Callers now have control over what date gets written out as a comment - Callers now have control to completely disable writing out any comments, by passing null for both "comments" and the Instant parameters - Reproducibility of the date comment will not be a concern of this new API and instead is controlled by the callers.
Cons: ---- - Requires code changes in callers' code. Potentially in much more places than any of other proposed options, since this is a completely new API. - The new API still allows the Date comment to be written out, which IMO adds no value. Furthermore, this now introduces 2 ways to add a comment - one by passing value to the "comments" parameter and the other by passing a value to the "java.time.Instant" parameter. - Introduces/continues with, avoidable, Instant parsing and date formatting logic/complexity into the new API.
Let me know if I might have missed any other potential option.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=914278;filename=repr...
-Jaikiran
participants (5)
-
Alan Bateman
-
Daniel Fuchs
-
Jaikiran Pai
-
Magnus Ihse Bursie
-
Roger Riggs