Missing issue: Version string format

Mon Mar 21 15:43:52 UTC 2016

On 03/21/2016 08:49 AM, David M. Lloyd wrote:
> On 03/11/2016 04:13 PM, David M. Lloyd wrote:
>> The current java.lang.module.ModuleDescriptor.Version class contains the
>> comment:
>>
>> "Vaguely Debian-like version strings, for now.
>> "This will, eventually, change."
>>
>> At some point the syntax and semantics of version designators has to be
>> worked out and agreed upon.  Ideally the scheme would be compatible with
>> as many existing widely deployed schemes as possible in terms of allowed
>> syntax, and as much as possible, collation order (at least within the
>> context of other modules from the same versioning scheme).
>
> Judging from the lack of response, I assume that nobody has done any
> work on this, so I have a proposal.

I just updated to the latest code and it looks like last week the scheme 
was updated.

I would reframe this discussion into a proposal a few changes from the 
existing code.

• Addition of a few more separator characters: the new code supports 
"+", "-", and ".", and also the transition types.  It does not support 
"_" which I believe to be in fairly wide use.  I would propose that this 
character be added as a valid separator.
• The current code uses multiple "classes" of separators, which also 
depend on their location within the version string.  I suggest that if 
possible, a version sequence be agnostic to scheme, or else make 
versions layer-specific (similar to the #ModuleNameCharacters issue); I 
think the scheme I outlined would support all of the currently supported 
schemes with little or no adjustment.
• The implementation is probably considerably heavier than it needs to 
be in terms of objects.  A zero-object representation is possible, as is 
a one- or zero-object tokenizer/validator.

Unless someone has an early strong disagreement, I will prepare a 
prototype to illustrate the changes.

>
> *** PLEASE review this in detail and post responses and criticisms ASAP.
>   I am interpreting silence as agreement/approval! ***
>
> In particular, the syntax and collation rules could use some discussion!
>
> •Version Requirements•
>
> Versions must abide by a consistent, easily describable syntax.
>
> Versions must support as many widely-used versioning schemes as
> possible, in a manner which is as interoperable as possible.
>
> Versions must collate in a manner consistent with expectations in terms
> of existing systems, to the maximum extent possible.
>
> •Version Syntax•
>
> I propose that a version conform to the following EBNF syntax:
>
>     alpha = ? all Unicode letters (open for discussion) ?
>     number = ? all Unicode digits (open for discussion) ?
>     separator = "-" | "+" | "_" | "." | ? alpha-to-number transition ? |
> ? number-to-alpha transition ?
>     part = number* | alpha*
>     version = part { separator part }
>
> The special transitions mean that strings such as "8u12" will count as a
> three-part version "8" (sep) "u" (sep) "12" and would collate as such.
>
> •Unicode considerations•
>
> All version components would be normalized in NFKC form, in order to
> ensure consistent collation.
>
> •Collation•
>
> Versions shall abide by the following collation rules.
>
> Each part and separator in the version contributes to collation order.
> Since a version is comprised of strictly alternating parts and
> separators, there is no sensible or defined collation order between
> parts and separators.
>
> Number parts shall sort before alpha parts.
>
> The sort order for separators should be as follows:
>    • transitions sort highest (first)
>    • underscores "_" sort next
>    • pluses "+" sort next
>    • hyphens "-" sort next
>    • dots "." sort lowest (last)
>
> •Compatibility•
>
> OpenJDK and Oracle JDK versions follow a few different mildly complex
> schemes but can be more simply characterized by a few examples which are
> valid in different contexts:
>   • 1.3.0
>   • 1.3.1-beta
>   • 1.3.1_05-ea
>   • 1.8.0_66-b17
>   • 8u66
>   • 9-ea
>
> All of these examples will parse and collate in a manner that seems
> consistent with expectations.
>
> OSGi versions are in the form: number "." number "." number [ "." ? any
> string ? ].  Due to the arbitrary nature of the optional final
> (qualifier) segment, there exist a set of OSGi versions which are not
> strictly compatible with this scheme, and a set of OSGi versions which
> are compatible but whose collation order might be affected by this scheme.
>
> Maven versions are highly under-specified, but using the
> org.sonatype.aether.util.version.GenericVersion class as a reference
> indicates that Maven is employing a similar scheme, including empty
> "transition" separators, with the exception that all separators appear
> to be considered equal.  This may cause certain projects to collate
> differently, for example in the event that the separator was switched
> from "-" to "." along a branch's development lifecycle.  In addition,
> certain strings such as "alpha", "beta" etc. are specially detected and
> ordered.  However, other than the "ga" or "final" string, these strings
> already collate naturally, and it is a fairly common practice to rely on
> natural collation regardless, which may mitigate interoperability issues.
>
> Debian versions allow ":" and "~" characters, and also allow parts to be
> empty, both of which are extensions that could be applied to this scheme
> if desired, as long as collation rules could be worked out for them.
>
> •Implementation•
>
> Two implementation approaches seem obvious.
>
> The first approach uses an internal linked list comprised of alternating
> segments of parts and separators.  Parts and separators have collation
> methods which consider the current part or separator, then fall back to
> the next link (if any).  This approach is simple and elegant, however
> has substantial memory overhead due to the number of objects required
> (for example, the string "1.8.0_66-b17" requires six parts and five
> separators for a total of 11 objects, which seems excessive).
>
> The second approach simply stores the content as a string, and uses an
> internal tokenizer to parse, validate, and collate.  This approach may
> be slightly more verbose in implementation but should be far more
> memory-efficient, generally requiring one or two temporary object
> allocations per parse/collate operation, and otherwise only requiring
> the memory necessary to hold the String object of the version plus the
> memory requirements of the Version object itself.
>
> The existing Version class is designed more for simplicity than
> efficiency, using Lists of boxed objects internally and so forth.  While
> this is adequate for prototyping, I think the latter String/tokenizer
> based design is a better long-term solution and that is what I will
> pursue unless there is a strong argument otherwise.
>
> Looking forward to discussion,

-- 
- DML