A couple of questions on 8160954: (spec) Runtime.Version regex and $PRE/$OPT issues

Pavel Rappo pavel.rappo at oracle.com
Thu Mar 23 21:22:05 UTC 2017


[This is not an RFR]

I’ve been investigating JDK-8160954 [1] which mentions two separate problems
within the specification of the Runtime.Version class.

The first is about boundary matchers (^ and $) in the regular expression
that defines the version number ($VNUM):

   ^[1-9][0-9]*(((\.0)*\.[1-9][0-9]*)*)*$

The reporter claims a difference between this regex and the actual behaviour
of Version.parse method, namely that:

    Version.parse("1.1.1" + System.lineSeparator());            (*)

throws an exception even though it should not. I've repeated the exercise and
could not confirm the reporter's claim.

Here, Version.parse method's behavior is consistent with the regex in $VNUM:

    System.out.println(
            ("1.1.1" + System.lineSeparator())
                    .matches("^[1-9][0-9]*(((\\.0)*\\.[1-9][0-9]*)*)*$")
    );

outputs:

    false

And my reading of javadoc for java.util.regex.Pattern explains why exactly.

However, I would suggest we still tweak the the $VNUM definition. The reason is
that the regular expression used in $VNUM is more advanced than is actually
required to specify the $VNUM. Maybe we can state it in prose a bit clearer:

     /**
      * A representation of a version string for an implementation of the
-     * Java SE Platform.  A version string contains a version number
+     * Java SE Platform.  A version string consists of a version number
      * optionally followed by pre-release and build information.
      *
      * <h2><a name="verNum">Version numbers</a></h2>
@@ -960,7 +960,7 @@
      * </p>
      *
      * <blockquote><pre>
-     *     ^[1-9][0-9]*(((\.0)*\.[1-9][0-9]*)*)*$
+     *     [1-9][0-9]*(((\.0)*\.[1-9][0-9]*)*)*
      * </pre></blockquote>
      *
      * <p> The sequence may be of arbitrary length but the first three

My reading is that any of the components ($VNUM, $PRE, $BUILD and $OPT) of the
version string are not meant to span across multiple lines. Thus, specifying
boundary matchers (^ and $) inside a pattern for a single $VNUM component looks
a bit odd. If there's any place in the spec that could possibly benefit from
these matchers it is the definition of $VSTR:

    $VNUM(-$PRE)?(\+($BUILD)?(-$OPT)?)?

But even there it's pretty clear that it's the whole input that is matched
against this regex.

The second problem is about the optional build number ($BUILD) and the '+'
separator as defined in the format for the version string ($VSTR):

    $VNUM(-$PRE)?(\+($BUILD)?(-$OPT)?)?

The reporter claims both cases

    Version.parse("1.1.1-ea+--abc");
    Version.parse("1.1.1-ea+");

throw IllegalArgumentException even though the $VSTR permits them. Once again
I've repeated the exercise and this time I *could* confirm the behaviour.

From the quasi-regex above it is expected that '+' separator might appear
without the build number that follows it. And it makes a perfect sense.
Otherwise a version string, say, "1.1.1-pqr" would have been ambiguous.
What is "pqr" exactly? Is it an $OPT or a $PRE component?

So I guess it probably means the "regex" for $VSTR is correct and the
implementation of parse method is not. However, if that is the case, I don't
particularly like how Version.toString behaves:

    /**
     * Returns a string representation of this version.
     *
     * @return  The version string
     */
    @Override
    public String toString() {
        StringBuilder sb
            = new StringBuilder(version.stream()
                .map(Object::toString)
                .collect(Collectors.joining(".")));

        pre.ifPresent(v -> sb.append("-").append(v));

        if (build.isPresent()) {
            sb.append("+").append(build.get());
            if (optional.isPresent())
                sb.append("-").append(optional.get());
        } else {
            if (optional.isPresent()) {
                sb.append(pre.isPresent() ? "-" : "+-");       (**)
                sb.append(optional.get());
            }
        }

        return sb.toString();
    }

(**) means that if $PRE and $OPT are present but $BUILD is absent, then
toString will produce something like this:

    1.1.1-pqr-stu

which will not be a valid version string according to $VSTR. And in this case we
should either clearly state that toString's output is not something that can be
parsed back as a version string (which would be super odd), or fix toString.

I would appreciate any comments on this.

Thanks,
-Pavel

--------------------------------------------------------------------------------
[1] https://bugs.openjdk.java.net/browse/JDK-8160954



More information about the core-libs-dev mailing list