Missing issue: Version string format
David M. Lloyd
david.lloyd at redhat.com
Mon Mar 21 13:49:22 UTC 2016
On 03/11/2016 04:13 PM, David M. Lloyd wrote:
> The current java.lang.module.ModuleDescriptor.Version class contains the
> comment:
>
> "Vaguely Debian-like version strings, for now.
> "This will, eventually, change."
>
> At some point the syntax and semantics of version designators has to be
> worked out and agreed upon. Ideally the scheme would be compatible with
> as many existing widely deployed schemes as possible in terms of allowed
> syntax, and as much as possible, collation order (at least within the
> context of other modules from the same versioning scheme).
Judging from the lack of response, I assume that nobody has done any
work on this, so I have a proposal.
*** PLEASE review this in detail and post responses and criticisms ASAP.
I am interpreting silence as agreement/approval! ***
In particular, the syntax and collation rules could use some discussion!
•Version Requirements•
Versions must abide by a consistent, easily describable syntax.
Versions must support as many widely-used versioning schemes as
possible, in a manner which is as interoperable as possible.
Versions must collate in a manner consistent with expectations in terms
of existing systems, to the maximum extent possible.
•Version Syntax•
I propose that a version conform to the following EBNF syntax:
alpha = ? all Unicode letters (open for discussion) ?
number = ? all Unicode digits (open for discussion) ?
separator = "-" | "+" | "_" | "." | ? alpha-to-number transition ? |
? number-to-alpha transition ?
part = number* | alpha*
version = part { separator part }
The special transitions mean that strings such as "8u12" will count as a
three-part version "8" (sep) "u" (sep) "12" and would collate as such.
•Unicode considerations•
All version components would be normalized in NFKC form, in order to
ensure consistent collation.
•Collation•
Versions shall abide by the following collation rules.
Each part and separator in the version contributes to collation order.
Since a version is comprised of strictly alternating parts and
separators, there is no sensible or defined collation order between
parts and separators.
Number parts shall sort before alpha parts.
The sort order for separators should be as follows:
• transitions sort highest (first)
• underscores "_" sort next
• pluses "+" sort next
• hyphens "-" sort next
• dots "." sort lowest (last)
•Compatibility•
OpenJDK and Oracle JDK versions follow a few different mildly complex
schemes but can be more simply characterized by a few examples which are
valid in different contexts:
• 1.3.0
• 1.3.1-beta
• 1.3.1_05-ea
• 1.8.0_66-b17
• 8u66
• 9-ea
All of these examples will parse and collate in a manner that seems
consistent with expectations.
OSGi versions are in the form: number "." number "." number [ "." ? any
string ? ]. Due to the arbitrary nature of the optional final
(qualifier) segment, there exist a set of OSGi versions which are not
strictly compatible with this scheme, and a set of OSGi versions which
are compatible but whose collation order might be affected by this scheme.
Maven versions are highly under-specified, but using the
org.sonatype.aether.util.version.GenericVersion class as a reference
indicates that Maven is employing a similar scheme, including empty
"transition" separators, with the exception that all separators appear
to be considered equal. This may cause certain projects to collate
differently, for example in the event that the separator was switched
from "-" to "." along a branch's development lifecycle. In addition,
certain strings such as "alpha", "beta" etc. are specially detected and
ordered. However, other than the "ga" or "final" string, these strings
already collate naturally, and it is a fairly common practice to rely on
natural collation regardless, which may mitigate interoperability issues.
Debian versions allow ":" and "~" characters, and also allow parts to be
empty, both of which are extensions that could be applied to this scheme
if desired, as long as collation rules could be worked out for them.
•Implementation•
Two implementation approaches seem obvious.
The first approach uses an internal linked list comprised of alternating
segments of parts and separators. Parts and separators have collation
methods which consider the current part or separator, then fall back to
the next link (if any). This approach is simple and elegant, however
has substantial memory overhead due to the number of objects required
(for example, the string "1.8.0_66-b17" requires six parts and five
separators for a total of 11 objects, which seems excessive).
The second approach simply stores the content as a string, and uses an
internal tokenizer to parse, validate, and collate. This approach may
be slightly more verbose in implementation but should be far more
memory-efficient, generally requiring one or two temporary object
allocations per parse/collate operation, and otherwise only requiring
the memory necessary to hold the String object of the version plus the
memory requirements of the Version object itself.
The existing Version class is designed more for simplicity than
efficiency, using Lists of boxed objects internally and so forth. While
this is adequate for prototyping, I think the latter String/tokenizer
based design is a better long-term solution and that is what I will
pursue unless there is a strong argument otherwise.
Looking forward to discussion,
--
- DML
More information about the jpms-spec-experts
mailing list