[10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Peter Levart peter.levart at gmail.com
Sat Jul 29 17:22:33 UTC 2017

Hi Ivan,

On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:
> Hi Peter!
> Thank a lot for looking into this!
> On 7/28/17 7:32 AM, Peter Levart wrote:
>> Hi Ivan,
>> In the light of what Stuart Marks wrote, then what do you think about 
>> a parameterized comparator (would be sub-optimal, I know) where one 
>> would supply
>> 2 Comparator(s) which would be used to compare Ax and Nx 
>> sub-sequences respectively as described below...
> Yes.  Inspired by what Stuart suggested I made a draft of such a 
> comparator (see below).
> It works, but as you've said it's not that efficient (mostly due to 
> expensive substrings) and a bit harder to use in a simple case.
> Now I need to think about how to combine two approaches.

There is a bug in MyComparator. Comparing "1" with "2" gives 0 (equal). 
You must not return "equal" when both alpha prefixes are empty.

>> For Nx sub-sequences, one would need a comparator comparing the 
>> reversed sequence lexicographically,
> I'm not sure I understand why they need to be reversed.

Scrap that. I got confused.  :-o

>> but for Ax sub-sequences, one could choose from a plethora of 
>> case-(in)sensitive comparator(s) and collators already available on 
>> the platform.
> Yes. In the example below I used compareToIgnoreCase to compare alpha 
> subsequences.
> Arrays.sort(str,new 
> MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));

Substrings are by copy. There is another approach. Originally you 
created the NumericComparator to compare CharSequence(s) - not String(s) 
as you did with MyComparator. So why not keeping that?

While talking about types, you don't need to have a generic parameter:

     NumericComparator<T extends CharSequence> implements CharSequence<T>


     NumericComparator implements Comparator<CharSequence>

is suitable for comparing any subtype of CharSequence as use-sites 
should accept Comparator<? super String> or Comparator<? super 
CharSequence>. For example:

     Stream<T> sorted(Comparator<? super T> comparator);

You can use Comparator<CharSequence> to sort Stream<String> as well as 
Stream<StringBuilder> or Stream<CharSequence>...

I tried an approach where sub-sequences are CharSequence objects by 
delegation and use sub-comparators that don't convert CharSequence(s) to 
String(s). Here's what this looks like:


with supporting sub-comparator classes:


I created a JMH  benchmark comparing your 02/webrev 


with a modified variant of your MyComparator that works correctly 


and my (peter.AlphadecimalComparator). The later two are tested with 
case-sensitive (CS) and case-insensitive (CIS) variants for comparing 
alpha sub-sequences.

Here are the results:


With -prof gc, it can be seen that JIT is able to optimize-away 
allocation of sub-sequence CharSequence(s). When the hot code is JIT-ed, 
at least with provided helper sub-comparators, no allocation is taking 
place (not so with MyComparator which creates substrings):


The performance of AlphadecimalComparator is about 50% slower than your 
specialized NumericComparator, but it allows custom parametrization. 
Perhaps the decimal part of variations could be hidden behind a boolean 
or an enum parameter (there is a leading-zeroes-greater, 
leading-zeroes-less, but also a leading-zeroes-equal variant that makes 
sense). I tried to do that by making DecimalComparator an enum and 
restricting the decimalComparato parameter to that type, but maybe 
specializing code around a boolean flag might yield better performance 
and DecimalComparator as Comparator<CharSequence> doesn't make much 
sense outside the AlphadecimalComparator.

So what do you think? Also, what do you think about the name?

Regards, Peter

More information about the core-libs-dev mailing list