Version-string schemes for the Java SE Platform and the JDK

Thu Oct 19 15:08:36 UTC 2017

(This is a long note on a complex topic that's inherently difficult to
 discuss.  If you wish to reply, please first read all the way through
 to the end.)

In my proposal to adopt a strict six-month release cadence I suggested
that, going forward, the version strings of feature releases be of the
form $YEAR.$MONTH [1][2].  Thus next year's March release would be 18.3,
the September release would be 18.9, and so on each year.

Not everyone likes this proposal, which isn't surprising -- discussions
of version-string schemes, much like those of language syntax, often tend
to degenerate into bike-sheds [3][4].  That's due, in part, to the use of
version strings -- across the software industry and for several decades
now -- to encode multiple not-quite-orthogonal axes of information, which
can answer different but often related questions:

  - Compatibility -- "Will my code break if I upgrade to this release?"

  - Significance -- "How different is this release from what I have now?"

  - Security -- "Does this release contain new security fixes?"

  - Support -- "For how long will this release be supported?"

  - Identity -- "On exactly which build was this bug reported against?"

  - Time -- "When did this release ship?  How far behind am I?"

Convention dictates that the principal part of a version string, i.e.,
the version number, be a sequence of numerals separated by period
characters.  (Let's ignore, for now, optional information such as
pre-release status and build numbers.)  Convention also dictates that
version numbers be pointwise totally ordered, that they increase
monotonically over time, and that the version number of a feature
release be a prefix of the version numbers of its update releases.

Given these conventions and a strict, time-based release cadence, which
of the above axes are both important and appropriate to encode into
version numbers?  Which are practical to encode into version numbers?
Which should have more weight, i.e., be encoded in the earlier numerals
of version numbers, and which should have less weight, i.e., be encoded
in the later numerals?

Some considerations for each axis, in turn:

  - Compatibility is obviously important -- it's one of the core values
    of the Java Platform, after all -- but it's problematic in at least
    two respects and hence not a sound basis for version numbers.

    First: Compatibility is, itself, multi-dimensional and therefore
    difficult to encode into a simple sequence of numerals.  What counts
    as an incompatible change?

    Some cases are obvious, e.g., a language change after which some old
    source files no longer compile, a JVM change after which some old
    class files are no longer valid, or an API change that removes an
    existing module, package, type, or element thereof.

    Many cases are, however, less than obvious, e.g., a language change
    after which some previously-rejected source files do compile, a bug
    fix that changes the element order of an array returned by an API,
    an enhancement that allows a command-line option to accept some
    previously-rejected arguments, or an optimization that removes an
    internal API.

    It might be practical to encode compatibility information into a two-
    or three-numeral version number for something as simple as a single
    library whose only interface with the outside world is its API [5].
    It's far from clear how to do that, though, in a way that's easy for
    everyone to understand for something as complex as the Java Platform
    itself, and implementations thereof.

    Second: The compatibility of a particular release with any of its
    predecessors depends upon the set of features in that release.  In a
    time-based release model, however, the set of features is not known
    until late in each release cycle, after the final feature is merged.
    This complicates discussions of any specific release and the tracking
    of changes in JIRA and related systems.  If, e.g., we use the leading
    numerals of version numbers to encode compatibility in the usual way,
    with the first numeral increasing only when incompatible changes are
    made, then would the March 2018 release be version 9.1, or 10?  We
    can't know until some time in December 2017, when the release closes
    for stabilization.

    We could address this problem by establishing secondary, time-based
    labels for releases, but that would be awkward and could lead to even
    more confusion.

  - Significance is even harder to measure than compatibility, and like
    compatibility it depends upon the set of features in a release and
    hence can't be known until late in a release cycle.  The best we can
    do for significance is insist that, over time, differences in version
    numbers roughly reflect differences in release content.  An increment
    of the first numeral of a version number should indicate a greater
    amount of change than increments of later numerals.

  - Security is important, but the security level can't be encoded in
    one of the earlier numerals of a version number since it evolves at
    a rate that's unpredictable relative to all the other axes and would
    therefore violate the monotonicity constraint.

    (JEP 223 [6] solved this problem by using the third numeral of a
     version number to record the security level of a release within a
     particular major-release family, resetting that number only at the
     next major release.  That scheme was, however, designed under the
     assumption of multi-year major releases, each of which could have
     several simultaneous update-release lines.  If security fixes are
     routinely delivered in one stream of update releases per feature
     release, as envisioned in this new model [7], then there's less
     reason to encode the security level in the version number.)

  - The support lifetime of a release is useful information, but it's not
    appropriate to encode that into the version number of the Java SE
    Platform or the JDK.  The version number should be identical in all
    implementations of a given release, but the support lifetime of a
    release may vary from implementor to implementor.  Oracle might
    choose, e.g., to offer support to its customers for twenty years on
    releases three years apart, but another implementor might offer
    support for ten years on releases two years apart.

  - Identity is important, especially for use in bug reports, but it need
    not be encoded in the version number itself.  It's reasonable to ask
    that bug reports include the full version string, so it suffices to
    include a build identifier or other implementation information after
    the version number itself (e.g., 9+181, the full version string of
    JDK 9 GA).

These considerations leave us with the final axis, time, as the leading
candidate for the primary basis of Java SE and JDK version numbers.

This would be a departure from past releases, in which we've used version
numbers that roughly encode both compatibility and significance.  It is,
however, a better fit for a strict, time-based release model since the
version number of any particular release is known well in advance.  The
compatibility level of a release would still be indicated by the length
of its version number, since we'll continue our long-standing practice
of making obviously-incompatible changes only in feature releases.  The
security level of an update release would, similarly, be reflected in its
version number as a whole, since a later release will always contain more
security fixes than its predecessor.

The main remaining question, then, is that of how to encode time in
version numbers: As an absolute value, derived from the date of the
release, or as a relative value, measuring the amount of time since
the previous release of the same type?

In the abstract, absolute times have three attractive properties:

  - Absolute times reflect release dates, so they make it clear to all
    involved -- both developers of the JDK and users of the JDK -- that
    these are time-based releases.  There can be no question of delaying
    a release in order to add "just one more feature" to it.

  - Absolute times make it easy to figure out how old a release is, so
    that as a user you can understand how far behind you are.  Relative
    times require you to know what the time units are, and when these
    time-based version numbers were adopted.

  - Absolute times are independent of the release cadence.  If in a few
    years we switch to an even faster cadence, say every three months,
    then an absolute scheme would need no change but a relative scheme
    would need to be revised with a new time unit and starting point.

Now, at last, for some concrete alternatives:

  (A) Absolute: $YY$MM, padding the month number with a zero as needed,
      and $YY$MM.$AGE for update releases, where $AGE is the number of
      months since $YY$MM.

  (B) Absolute: $YY.$M as proposed, without padding the month number,
      and $YY.$M.$AGE for update releases, where $AGE is as above.

  (C) Relative: $N, where $N is the number of half-years since JDK 9 GA
      (September 2017) plus nine, and $N.$AGE for update releases, where
      $AGE is as above.  ($AGE is more useful than another incrementing
      counter since it leaves room for emergency update releases without
      having to renumber subsequent update releases that are already in
      development.)

Examples of these alternatives, for the next two feature releases and
their first two update releases:

                               (A)       (B)       (C)
    GA (March 2018)            1803      18.3      10
    First update (April)       1803.1    18.3.1    10.1
    Second update (July)       1803.4    18.3.4    10.4
    GA (September 2018)        1809      18.9      11
    First update (October)     1809.1    18.9.1    11.1
    Second update (January)    1809.4    18.9.4    11.4

Some pros (+) and cons (-) of each alternative (some of these points are
subjective, but so it goes with this topic):

  (A) $YY$MM, with $YY$MM.$AGE for updates

    (+) Has the three advantages of absolute times.

    (+) The `Runtime.Version` API introduced by JEP 223 can be adapted
        fairly easily.  Code that parses raw version strings would need
        little or no change (as long as it already does so correctly!).

    (-) On the surface 1803 is an enormous leap from 9, is likely to
        cause confusion, and has connotations of being very old [8].

  (B) $YY.$M, with $YY.$M.$AGE for updates

    (+) Has the three advantages of absolute times.

    (+) Similar to some other significant platforms, e.g., Ubuntu Linux,
        and less shocking in appearance than (A).

    (-) People unfamiliar with the scheme could conflate 18.3 and 18.9 as
        being minor releases of JDK 18, which isn't the case.  There is
        some evidence of similar confusion around Ubuntu releases [9].

    (-) The logical "major" version number is now a pair of numbers, year
        and month.  We could mitigate this in the `Runtime.Version` API
        by encoding the year and month as $YY$MM in the existing major
        number, and adding new methods that return the year and month.
        Code that parses raw version strings will likely require change,
        including code not just in the JDK itself but in existing tools
        and CI systems [a].

  (C) $N, with $N.$AGE for updates

    (+) The most straightforward and least-surprising option, and
        familiar from other rapidly-evolving projects such as Firefox
        and Chrome.

    (+) The `Runtime.Version` API can be adapted very easily, and code
        that (correctly) parses raw version strings would need no change.

    (-) Lacks the three advantages of absolute times.

    (-) If we ever switch to an even faster cadence then we could
        eventually have very large version numbers, as in (A).  In the
        limit we could wind up in a situation like that of CoreOS, whose
        latest stable release is numbered 1520.6 [b].

These are three plausible alternatives; there are countless others, but
I suspect that many if not most are minor variants of these three.  To
mention just two examples:

  - We could simplify our grandchildren's lives and represent the year
    with four digits rather than two. That would, however, lead to even
    longer version numbers.

  - We could zero-pad the month number in (B) so as to be exactly like
    Ubuntu ($YY.$MM) which might make it a bit more obvious that JDK
    18.03 isn't an update release of JDK 18.  This would only work,
    though, so long as we never ship a feature release after September
    in any particular year.  (Ubuntu ships in April (04) and October
    (10), so zero-padding really only helps them half the time.)

                                  * * *

If you've read this far, my question to you now is not the question that
you might expect.  Please don't say which version-number scheme you
prefer for Java SE and the JDK.  Instead, please only communicate any
additional information that's relevant to the choice of such a scheme.
In particular:

  - Are there additional pros and cons to the alternatives listed above?

  - Are there additional alternatives worth considering, and if so what
    are their pros and cons?

  - Are there specific experiences with other projects or products that
    can inform this choice?

In order to discourage this from devolving into another version-numbering
bike-shed discussion I'll give much greater weight to your first reply to
this message than to any other, so please think and write carefully
before you post.  I'll also ignore replies-to-replies -- if you really
want to argue with someone else about one scheme vs. another then I won't
stop you, but I don't think that's a useful use of most readers' time.
Finally, I'll heavily discount replies that quote more text from this
message than add new text of their own, so please quote just the text
that's actually needed to provide context for your reply.

In a week or so I'll summarize any new information received, and then
make a specific proposal.

- Mark

[1] https://mreinhold.org/blog/forward-faster
[2] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004281.html
[3] http://bikeshed.org/
[4] https://wiki.haskell.org/Wadler's_Law
[5] http://semver.org
[6] http://openjdk.java.net/jeps/223
[7] https://mreinhold.org/blog/forward-faster#Proposal
[8] https://en.wikipedia.org/wiki/1803
[9] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004429.html
[a] http://mail.openjdk.java.net/pipermail/discuss/2017-September/004352.html
[b] https://coreos.com/releases