RFR 8071477: Better Spliterator implementations for String.chars() and String.codePoints()

Fri Jan 23 18:22:10 UTC 2015

On 01/23/2015 09:00 AM, Paul Sandoz wrote:
> Hi,
>
> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071477-String-spliterators/webrev/
>
> This patch implements better spliterators for String/Buffer/Builder.chars/codePoints than those provided by the default methods on CharSequence.
>
> The test java/lang/CharSequence/DefaultTest.java is removed as i now pass the spliterators through the grinder of the spliterator-and-traversing tests.
>
> Thanks,
> Paul.

I'm a little confused at following logic.

2997             // Mid-point is a high-surrogate
2998             // Or mid-point and the previous are low-surrogates
2999             if (Character.isHighSurrogate(array[mid]) ||
3000                 Character.isLowSurrogate(array[midOneLess = (mid - 1)]))
3001                 return new CodePointsSpliterator(array, lo, index = mid, cs);

Shouldn't it be something like

if (!Character.isLowSurrogate(array[mid]) ||
     !Character.isHighSurrogate(array[midOneLess = (mid -1)])) {
     return new CodePointsSpliterator(array, lo, index = mid, cs);
}

For example, in case both "mid" and "midOneLess" are normal non-surrogate
character, I would assume the trySplit() should return [lo, index=mid) as wekk?

or something like

...
if (Character.isLowSurrogate(array[mid]) ||
     Character.isHighSurrogate(array[midOneLess = (mid -1)])) {
     if (lo >= midOneLess)
         return null;
     return new CodePointsSpliterator(array, lo, index = midOneless, cs);
}
return new CodePointsSpliterator(array, lo, index = mid, cs);
...

means, we only return [lo, midOneless), if mid is in the middle of a surrogate
pair (midOneLess is hiSurr, mid is hoSurr)?

btw, is it worth having a "nextCodePoint()" to be shared by forEachRemaining
and tryAdvance()?

-sherman