[10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator
Roger Riggs
Roger.Riggs at Oracle.com
Mon Jul 24 14:42:56 UTC 2017
Hi Ivan,
Thanks for bringing this up again.
Some initial comments, not a full review.
Though the enhancement says that no consideration is given to sign
characters they
may produce puzzling results for strings like "-2000" and "-2001" where
the strings appear
to be signed numbers but the comparison will be for the "-" prefix and
the rest unsigned.
Including the word 'unsigned' in the description might help reinforce
the semantics.
Also, I don't see test cases for strings like: "abc200123" and
"abc20000123" which share
a prefix part of which is numeric but differ in the number of "leading"
zeros after the prefix.
What about strings that include more than 1 numeric segment:
abc2003def0001 and abc02003def001
in the leading zero case?
Though the test adds the @key randomness, it would be useful to use the
test library
mechanisms to report the seed and be able to run the test with a seed,
if case they fail.
(As might be provided by the test library jdk.test.lib.RandomFactory).
Comparator.java:
576: "the common prefix is skipped" is problematic when considering
leading zeros.
The common prefix may contain non-zero digits.
585: it is not clear whether the "different number of leading zeros" is
applied regardless
of the common prefix or only starting after the common prefix.
550, 586: for many readers, it is easier to read 'for example' than
"e.g." or "i.e.".
562, 598: editorial: "is to compare" -> "compares"
Comparators.java:
192: @param for param o is missing; (the param name "o" usually refers
to an object, not a string).
200: Can skipLeadingZeros be coded to correctly work if cnt == 0; it
would be more robust
SkipLeading zeros works correctly only if pos is given the first
numeric digit in the subsequence
so the numStart1 and numStart2 must be first digit in each string.
compare():
Line 223: if strings typically have non-numeric prefixes, it might
perform slightly better
to detect the end of the common prefix and then scan back to find the
beginning of the numeric part.
(Avoiding checking every char for isDigit).
Line 224: If assigned for every digit, it will hold the last equal digit
in the common prefix, not the first digit.
239, 240: chars at o1(index) and o2(index) are already known to be
digits, can it be avoided checking them twice
by starting at index+1?
$.02, Roger
On 7/19/2017 4:41 AM, Ivan Gerasimov wrote:
> Hello!
>
> It is a proposal to provide a String comparator, which will pay
> attention to the numbers embedded into the strings (should they present).
>
> This proposal was initially discussed back in 2014 and seemed to bring
> some interest from the community:
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-December/030343.html
>
>
> In the latest webrev two methods are added to the public API:
> j.u.Comparator.comparingNumerically() and
> j.u.Comparator.comparingNumericallyLeadingZerosAhead().
>
> The regression test is extended to exercise this new comparator.
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
> WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/
>
> Comments, suggestions are very welcome!
>
More information about the core-libs-dev
mailing list