Version-string schemes for the Java SE Platform and the JDK
mark.reinhold at oracle.com
mark.reinhold at oracle.com
Thu Oct 19 15:08:36 UTC 2017
(This is a long note on a complex topic that's inherently difficult to
discuss. If you wish to reply, please first read all the way through
to the end.)
In my proposal to adopt a strict six-month release cadence I suggested
that, going forward, the version strings of feature releases be of the
form $YEAR.$MONTH [1][2]. Thus next year's March release would be 18.3,
the September release would be 18.9, and so on each year.
Not everyone likes this proposal, which isn't surprising -- discussions
of version-string schemes, much like those of language syntax, often tend
to degenerate into bike-sheds [3][4]. That's due, in part, to the use of
version strings -- across the software industry and for several decades
now -- to encode multiple not-quite-orthogonal axes of information, which
can answer different but often related questions:
- Compatibility -- "Will my code break if I upgrade to this release?"
- Significance -- "How different is this release from what I have now?"
- Security -- "Does this release contain new security fixes?"
- Support -- "For how long will this release be supported?"
- Identity -- "On exactly which build was this bug reported against?"
- Time -- "When did this release ship? How far behind am I?"
Convention dictates that the principal part of a version string, i.e.,
the version number, be a sequence of numerals separated by period
characters. (Let's ignore, for now, optional information such as
pre-release status and build numbers.) Convention also dictates that
version numbers be pointwise totally ordered, that they increase
monotonically over time, and that the version number of a feature
release be a prefix of the version numbers of its update releases.
Given these conventions and a strict, time-based release cadence, which
of the above axes are both important and appropriate to encode into
version numbers? Which are practical to encode into version numbers?
Which should have more weight, i.e., be encoded in the earlier numerals
of version numbers, and which should have less weight, i.e., be encoded
in the later numerals?
Some considerations for each axis, in turn:
- Compatibility is obviously important -- it's one of the core values
of the Java Platform, after all -- but it's problematic in at least
two respects and hence not a sound basis for version numbers.
First: Compatibility is, itself, multi-dimensional and therefore
difficult to encode into a simple sequence of numerals. What counts
as an incompatible change?
Some cases are obvious, e.g., a language change after which some old
source files no longer compile, a JVM change after which some old
class files are no longer valid, or an API change that removes an
existing module, package, type, or element thereof.
Many cases are, however, less than obvious, e.g., a language change
after which some previously-rejected source files do compile, a bug
fix that changes the element order of an array returned by an API,
an enhancement that allows a command-line option to accept some
previously-rejected arguments, or an optimization that removes an
internal API.
It might be practical to encode compatibility information into a two-
or three-numeral version number for something as simple as a single
library whose only interface with the outside world is its API [5].
It's far from clear how to do that, though, in a way that's easy for
everyone to understand for something as complex as the Java Platform
itself, and implementations thereof.
Second: The compatibility of a particular release with any of its
predecessors depends upon the set of features in that release. In a
time-based release model, however, the set of features is not known
until late in each release cycle, after the final feature is merged.
This complicates discussions of any specific release and the tracking
of changes in JIRA and related systems. If, e.g., we use the leading
numerals of version numbers to encode compatibility in the usual way,
with the first numeral increasing only when incompatible changes are
made, then would the March 2018 release be version 9.1, or 10? We
can't know until some time in December 2017, when the release closes
for stabilization.
We could address this problem by establishing secondary, time-based
labels for releases, but that would be awkward and could lead to even
more confusion.
- Significance is even harder to measure than compatibility, and like
compatibility it depends upon the set of features in a release and
hence can't be known until late in a release cycle. The best we can
do for significance is insist that, over time, differences in version
numbers roughly reflect differences in release content. An increment
of the first numeral of a version number should indicate a greater
amount of change than increments of later numerals.
- Security is important, but the security level can't be encoded in
one of the earlier numerals of a version number since it evolves at
a rate that's unpredictable relative to all the other axes and would
therefore violate the monotonicity constraint.
(JEP 223 [6] solved this problem by using the third numeral of a
version number to record the security level of a release within a
particular major-release family, resetting that number only at the
next major release. That scheme was, however, designed under the
assumption of multi-year major releases, each of which could have
several simultaneous update-release lines. If security fixes are
routinely delivered in one stream of update releases per feature
release, as envisioned in this new model [7], then there's less
reason to encode the security level in the version number.)
- The support lifetime of a release is useful information, but it's not
appropriate to encode that into the version number of the Java SE
Platform or the JDK. The version number should be identical in all
implementations of a given release, but the support lifetime of a
release may vary from implementor to implementor. Oracle might
choose, e.g., to offer support to its customers for twenty years on
releases three years apart, but another implementor might offer
support for ten years on releases two years apart.
- Identity is important, especially for use in bug reports, but it need
not be encoded in the version number itself. It's reasonable to ask
that bug reports include the full version string, so it suffices to
include a build identifier or other implementation information after
the version number itself (e.g., 9+181, the full version string of
JDK 9 GA).
These considerations leave us with the final axis, time, as the leading
candidate for the primary basis of Java SE and JDK version numbers.
This would be a departure from past releases, in which we've used version
numbers that roughly encode both compatibility and significance. It is,
however, a better fit for a strict, time-based release model since the
version number of any particular release is known well in advance. The
compatibility level of a release would still be indicated by the length
of its version number, since we'll continue our long-standing practice
of making obviously-incompatible changes only in feature releases. The
security level of an update release would, similarly, be reflected in its
version number as a whole, since a later release will always contain more
security fixes than its predecessor.
The main remaining question, then, is that of how to encode time in
version numbers: As an absolute value, derived from the date of the
release, or as a relative value, measuring the amount of time since
the previous release of the same type?
In the abstract, absolute times have three attractive properties:
- Absolute times reflect release dates, so they make it clear to all
involved -- both developers of the JDK and users of the JDK -- that
these are time-based releases. There can be no question of delaying
a release in order to add "just one more feature" to it.
- Absolute times make it easy to figure out how old a release is, so
that as a user you can understand how far behind you are. Relative
times require you to know what the time units are, and when these
time-based version numbers were adopted.
- Absolute times are independent of the release cadence. If in a few
years we switch to an even faster cadence, say every three months,
then an absolute scheme would need no change but a relative scheme
would need to be revised with a new time unit and starting point.
Now, at last, for some concrete alternatives:
(A) Absolute: $YY$MM, padding the month number with a zero as needed,
and $YY$MM.$AGE for update releases, where $AGE is the number of
months since $YY$MM.
(B) Absolute: $YY.$M as proposed, without padding the month number,
and $YY.$M.$AGE for update releases, where $AGE is as above.
(C) Relative: $N, where $N is the number of half-years since JDK 9 GA
(September 2017) plus nine, and $N.$AGE for update releases, where
$AGE is as above. ($AGE is more useful than another incrementing
counter since it leaves room for emergency update releases without
having to renumber subsequent update releases that are already in
development.)
Examples of these alternatives, for the next two feature releases and
their first two update releases:
(A) (B) (C)
GA (March 2018) 1803 18.3 10
First update (April) 1803.1 18.3.1 10.1
Second update (July) 1803.4 18.3.4 10.4
GA (September 2018) 1809 18.9 11
First update (October) 1809.1 18.9.1 11.1
Second update (January) 1809.4 18.9.4 11.4
Some pros (+) and cons (-) of each alternative (some of these points are
subjective, but so it goes with this topic):
(A) $YY$MM, with $YY$MM.$AGE for updates
(+) Has the three advantages of absolute times.
(+) The `Runtime.Version` API introduced by JEP 223 can be adapted
fairly easily. Code that parses raw version strings would need
little or no change (as long as it already does so correctly!).
(-) On the surface 1803 is an enormous leap from 9, is likely to
cause confusion, and has connotations of being very old [8].
(B) $YY.$M, with $YY.$M.$AGE for updates
(+) Has the three advantages of absolute times.
(+) Similar to some other significant platforms, e.g., Ubuntu Linux,
and less shocking in appearance than (A).
(-) People unfamiliar with the scheme could conflate 18.3 and 18.9 as
being minor releases of JDK 18, which isn't the case. There is
some evidence of similar confusion around Ubuntu releases [9].
(-) The logical "major" version number is now a pair of numbers, year
and month. We could mitigate this in the `Runtime.Version` API
by encoding the year and month as $YY$MM in the existing major
number, and adding new methods that return the year and month.
Code that parses raw version strings will likely require change,
including code not just in the JDK itself but in existing tools
and CI systems [a].
(C) $N, with $N.$AGE for updates
(+) The most straightforward and least-surprising option, and
familiar from other rapidly-evolving projects such as Firefox
and Chrome.
(+) The `Runtime.Version` API can be adapted very easily, and code
that (correctly) parses raw version strings would need no change.
(-) Lacks the three advantages of absolute times.
(-) If we ever switch to an even faster cadence then we could
eventually have very large version numbers, as in (A). In the
limit we could wind up in a situation like that of CoreOS, whose
latest stable release is numbered 1520.6 [b].
These are three plausible alternatives; there are countless others, but
I suspect that many if not most are minor variants of these three. To
mention just two examples:
- We could simplify our grandchildren's lives and represent the year
with four digits rather than two. That would, however, lead to even
longer version numbers.
- We could zero-pad the month number in (B) so as to be exactly like
Ubuntu ($YY.$MM) which might make it a bit more obvious that JDK
18.03 isn't an update release of JDK 18. This would only work,
though, so long as we never ship a feature release after September
in any particular year. (Ubuntu ships in April (04) and October
(10), so zero-padding really only helps them half the time.)
* * *
If you've read this far, my question to you now is not the question that
you might expect. Please don't say which version-number scheme you
prefer for Java SE and the JDK. Instead, please only communicate any
additional information that's relevant to the choice of such a scheme.
In particular:
- Are there additional pros and cons to the alternatives listed above?
- Are there additional alternatives worth considering, and if so what
are their pros and cons?
- Are there specific experiences with other projects or products that
can inform this choice?
In order to discourage this from devolving into another version-numbering
bike-shed discussion I'll give much greater weight to your first reply to
this message than to any other, so please think and write carefully
before you post. I'll also ignore replies-to-replies -- if you really
want to argue with someone else about one scheme vs. another then I won't
stop you, but I don't think that's a useful use of most readers' time.
Finally, I'll heavily discount replies that quote more text from this
message than add new text of their own, so please quote just the text
that's actually needed to provide context for your reply.
In a week or so I'll summarize any new information received, and then
make a specific proposal.
- Mark
[1] https://mreinhold.org/blog/forward-faster
[2] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004281.html
[3] http://bikeshed.org/
[4] https://wiki.haskell.org/Wadler's_Law
[5] http://semver.org
[6] http://openjdk.java.net/jeps/223
[7] https://mreinhold.org/blog/forward-faster#Proposal
[8] https://en.wikipedia.org/wiki/1803
[9] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004429.html
[a] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004352.html
[b] https://coreos.com/releases
More information about the jdk-dev
mailing list