RFR: 8276766: Enable jar and jmod to produce deterministic timestamped content [v9]
Andrew Leonard
aleonard at openjdk.java.net
Tue Nov 30 12:05:07 UTC 2021
On Tue, 30 Nov 2021 08:53:03 GMT, John Neffenger <jgneff at openjdk.org> wrote:
>> @andrew-m-leonard Thank you, Andrew, for your bold solution to Mark's request -- one that even solves the problem with the current `ZipEntry` API!
>>
>> Would you be willing to consider a minor change to your implementation?
>>
>> As I mentioned earlier, if we are going to allow an ISO 8601 zoned date and time, we need either to **truncate, restrict, or convert** its value. The current implementation **truncates,** discarding the time zone information. The better option might be to **convert.** That is, parse the ISO 8601 string with `Instant.parse`, get the seconds since the epoch with `getEpochSecond`, and then proceed as before. Treat the value of `--date` as an instant on the time line, just like `SOURCE_DATE_EPOCH`, and always store its value in UTC.
>>
>> This also solves the problem of storing dates before 1980 as a local date and time in UTC while storing those after 1980 as a local date and time with a discarded time zone. By converting instead of truncating, the value is always stored as the local date and time in UTC regardless of whether or not it's within the range of the "MS-DOS date and time" field.
>>
>> Those of us using the timestamp of the last commit can still get the value directly from `git`.
>>
>> For `SOURCE_DATE_EPOCH`, run:
>>
>>
>> $ git log -1 --pretty=%ct
>> 1638207366
>>
>>
>> For the `--date` option of the `jar` and `jmod` tools, run:
>>
>>
>> $ git log -1 --pretty=%cI
>> 2021-11-29T09:36:06-08:00
>>
>>
>> Specifically, that `git` ISO 8601 string, and even the sample `date` command that Mark shows in his comment, are what break the current implementation by creating archives that depend on the time zone of the build machine again.
>>
>> By the way, the `jar` utility displays a time zone in the output of its `--list` and `--verbose` options even when there is no time zone information in the file. (In other words, it lies, or at least embellishes.) Use the verbose output of `zipinfo` to see the true values of the timestamps in an archive.
>
>> Would you be willing to consider a minor change to your implementation?
>
> Just to be clear, this change should let us postpone any additions to the `ZipEntry` API for another day, if at all, and maybe even get this pull request integrated for JDK 18.
@jgneff Hi John, many thanks for the suggestions
So what you suggest sounds reasonable, although it would be a "behaviour change" in that whereas now if the date is between 1980->2106 only a xdostime is stored, we would now always be storing the additional "mtime" field which is a FileTime persisted as an int("seconds from epoch") in the zip file. I was trying to not change the binary semantics, but I actually think your suggestion makes sense.
I've done a bit of digging as to why the extended mtime is only stored outside the xdos date range, and found this bug: https://bugs.openjdk.java.net/browse/JDK-8073497
Which basically I think changed ZipEntry so it did not construct Date() objects from ZipEntries loaded during JVM startup, which it "implies" (although I can't see the evidence) improves startup performance.
There is an interesting comment here: https://github.com/openjdk/jdk/blame/739769c8fc4b496f08a92225a12d07414537b6c0/src/java.base/share/classes/java/util/zip/ZipEntry.java#L83
* This establish an approximate high-bound value for DOS times in
* milliseconds since epoch, used to enable an efficient but
* sufficient bounds check to avoid generating extended last modified
* time entries.
*
* Calculating the exact number is locale dependent, would require loading
* TimeZone data eagerly, and would make little practical sense. Since DOS
* times theoretically go to 2107 - with compatibility not guaranteed
* after 2099 - setting this to a time that is before but near 2099
* should be sufficient.
So if we were to always set the extended mtime field, then FileTime.from(long) would potentially cause the impact JDK-8073497 was trying to avoid...?
However, with this bug we seem to have not thought about deterministic behavior..."Calculating the exact number is locale dependent, would require loading TimeZone data eagerly, and would make little practical sense"
Thoughts?
@AlanBateman @LanceAndersen fyi
-------------
PR: https://git.openjdk.java.net/jdk/pull/6481
More information about the compiler-dev
mailing list