[10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Roger Riggs Roger.Riggs at Oracle.com
Mon Jul 24 14:42:56 UTC 2017


Hi Ivan,

Thanks for bringing this up again.

Some initial comments, not a full review.

Though the enhancement says that no consideration is given to sign 
characters they
may produce puzzling results for strings like "-2000" and "-2001" where 
the strings appear
to be signed numbers but the comparison will be for the "-" prefix and 
the rest unsigned.
Including the word 'unsigned' in the description might help reinforce 
the semantics.

Also, I don't see test cases for strings like:  "abc200123" and 
"abc20000123" which share
a prefix part of which is numeric but differ in the number of "leading" 
zeros after the prefix.

What about strings that include more than 1 numeric segment: 
abc2003def0001 and abc02003def001
in the leading zero case?

Though the test adds the @key randomness, it would be useful to use the 
test library
mechanisms to report the seed and be able to run the test with a seed, 
if case they fail.
(As might be provided by the test library jdk.test.lib.RandomFactory).


Comparator.java:
576: "the common prefix is skipped" is problematic when considering 
leading zeros.
    The common prefix may contain non-zero digits.
585: it is not clear whether the "different number of leading zeros" is 
applied regardless
    of the common prefix or only starting after the common prefix.

550, 586: for many readers, it is easier to read 'for example' than 
"e.g." or "i.e.".

562, 598:  editorial: "is to compare" -> "compares"

Comparators.java:

192: @param for param o is missing;  (the param name "o" usually refers 
to an object, not a string).
200:   Can skipLeadingZeros be coded to correctly work if cnt == 0; it 
would be more robust
      SkipLeading zeros works correctly only if pos is given the first 
numeric digit in the subsequence
      so the numStart1 and numStart2 must be first digit in each string.

compare():

Line 223: if strings typically have non-numeric prefixes, it might 
perform slightly better
to detect the end of the common prefix and then scan back to find the 
beginning of the numeric part.
(Avoiding checking every char for isDigit).

Line 224: If assigned for every digit, it will hold the last equal digit 
in the common prefix, not the first digit.

239, 240:  chars at o1(index) and o2(index) are already known to be 
digits, can it be avoided checking them twice
by starting at index+1?

$.02, Roger


On 7/19/2017 4:41 AM, Ivan Gerasimov wrote:
> Hello!
>
> It is a proposal to provide a String comparator, which will pay 
> attention to the numbers embedded into the strings (should they present).
>
> This proposal was initially discussed back in 2014 and seemed to bring 
> some interest from the community:
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html 
>
>
> In the latest webrev two methods are added to the public API:
> j.u.Comparator.comparingNumerically() and
> j.u.Comparator.comparingNumericallyLeadingZerosAhead().
>
> The regression test is extended to exercise this new comparator.
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
> WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/
>
> Comments, suggestions are very welcome!
>



More information about the core-libs-dev mailing list