RFR 8134512 : provide Alpha-Numeric (logical) Comparator
peter.levart at gmail.com
Sat Jul 29 17:22:33 UTC 2017
On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:
> Hi Peter!
> Thank a lot for looking into this!
> On 7/28/17 7:32 AM, Peter Levart wrote:
>> Hi Ivan,
>> In the light of what Stuart Marks wrote, then what do you think about
>> a parameterized comparator (would be sub-optimal, I know) where one
>> would supply
>> 2 Comparator(s) which would be used to compare Ax and Nx
>> sub-sequences respectively as described below...
> Yes. Inspired by what Stuart suggested I made a draft of such a
> comparator (see below).
> It works, but as you've said it's not that efficient (mostly due to
> expensive substrings) and a bit harder to use in a simple case.
> Now I need to think about how to combine two approaches.
There is a bug in MyComparator. Comparing "1" with "2" gives 0 (equal).
You must not return "equal" when both alpha prefixes are empty.
>> For Nx sub-sequences, one would need a comparator comparing the
>> reversed sequence lexicographically,
> I'm not sure I understand why they need to be reversed.
Scrap that. I got confused. :-o
>> but for Ax sub-sequences, one could choose from a plethora of
>> case-(in)sensitive comparator(s) and collators already available on
>> the platform.
> Yes. In the example below I used compareToIgnoreCase to compare alpha
Substrings are by copy. There is another approach. Originally you
created the NumericComparator to compare CharSequence(s) - not String(s)
as you did with MyComparator. So why not keeping that?
While talking about types, you don't need to have a generic parameter:
NumericComparator<T extends CharSequence> implements CharSequence<T>
NumericComparator implements Comparator<CharSequence>
is suitable for comparing any subtype of CharSequence as use-sites
should accept Comparator<? super String> or Comparator<? super
CharSequence>. For example:
Stream<T> sorted(Comparator<? super T> comparator);
You can use Comparator<CharSequence> to sort Stream<String> as well as
Stream<StringBuilder> or Stream<CharSequence>...
I tried an approach where sub-sequences are CharSequence objects by
delegation and use sub-comparators that don't convert CharSequence(s) to
String(s). Here's what this looks like:
with supporting sub-comparator classes:
I created a JMH benchmark comparing your 02/webrev
with a modified variant of your MyComparator that works correctly
and my (peter.AlphadecimalComparator). The later two are tested with
case-sensitive (CS) and case-insensitive (CIS) variants for comparing
Here are the results:
With -prof gc, it can be seen that JIT is able to optimize-away
allocation of sub-sequence CharSequence(s). When the hot code is JIT-ed,
at least with provided helper sub-comparators, no allocation is taking
place (not so with MyComparator which creates substrings):
The performance of AlphadecimalComparator is about 50% slower than your
specialized NumericComparator, but it allows custom parametrization.
Perhaps the decimal part of variations could be hidden behind a boolean
or an enum parameter (there is a leading-zeroes-greater,
leading-zeroes-less, but also a leading-zeroes-equal variant that makes
sense). I tried to do that by making DecimalComparator an enum and
restricting the decimalComparato parameter to that type, but maybe
specializing code around a boolean flag might yield better performance
and DecimalComparator as Comparator<CharSequence> doesn't make much
sense outside the AlphadecimalComparator.
So what do you think? Also, what do you think about the name?
More information about the core-libs-dev