RFR 8071477: Better Spliterator implementations for String.chars() and String.codePoints()

Fri Jan 23 20:26:49 UTC 2015

The webrev looks good.

However I have a question regarding the spec of CharsSequence.chars/codePoints(). It says
      *
      * <p>If the sequence is mutated while the stream is being read, the
      * result is undefined.

But it looks like ABS.chars/codePoints() "predefine/fix" the size of the resulting stream when
the stream is being created/initialized. The possible corner case is that the "size/length" of
ABS is changed between the stream is created  and read?

I would assume the CharSequence default implementation will go after the real size, even
a "length()" is passed in as an estimated size during creation, but the optimized implementation
will only return the length/size of char/cp when the stream is created. A behavior change?

-Sherman

On 01/23/2015 11:19 AM, Paul Sandoz wrote:
> On Jan 23, 2015, at 6:22 PM, Xueming Shen<xueming.shen at oracle.com>  wrote:
>
>> On 01/23/2015 09:00 AM, Paul Sandoz wrote:
>>> Hi,
>>>
>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071477-String-spliterators/webrev/
>>>
>>> This patch implements better spliterators for String/Buffer/Builder.chars/codePoints than those provided by the default methods on CharSequence.
>>>
>>> The test java/lang/CharSequence/DefaultTest.java is removed as i now pass the spliterators through the grinder of the spliterator-and-traversing tests.
>>>
>>> Thanks,
>>> Paul.
>> I'm a little confused at following logic.
>>
>> 2997             // Mid-point is a high-surrogate
>> 2998             // Or mid-point and the previous are low-surrogates
>> 2999             if (Character.isHighSurrogate(array[mid]) ||
>> 3000                 Character.isLowSurrogate(array[midOneLess = (mid - 1)]))
>> 3001                 return new CodePointsSpliterator(array, lo, index = mid, cs);
>>
>>
>> Shouldn't it be something like
>>
>> if (!Character.isLowSurrogate(array[mid]) ||
>>     !Character.isHighSurrogate(array[midOneLess = (mid -1)])) {
>>     return new CodePointsSpliterator(array, lo, index = mid, cs);
>> }
>>
>> For example, in case both "mid" and "midOneLess" are normal non-surrogate
>> character, I would assume the trySplit() should return [lo, index=mid) as wekk?
>>
>> or something like
>>
>> ...
>> if (Character.isLowSurrogate(array[mid]) ||
> &&
>
>>     Character.isHighSurrogate(array[midOneLess = (mid -1)])) {
>>     if (lo>= midOneLess)
>>         return null;
>>     return new CodePointsSpliterator(array, lo, index = midOneless, cs);
>> }
>> return new CodePointsSpliterator(array, lo, index = mid, cs);
>> ...
>>
>> means, we only return [lo, midOneless), if mid is in the middle of a surrogate
>> pair (midOneLess is hiSurr, mid is hoSurr)?
>>
> Doh! thanks i dunno what i was thinking (isLowSurrogate was the negation of isHighSurrogate perhaps...)
>
>
>> btw, is it worth having a "nextCodePoint()" to be shared by forEachRemaining
>> and tryAdvance()?
>>
> Nice suggestion. I created a static method "advance".
>
> I also noticed that test/TEST.groups needs updating to remove a reference to the removed char sequence test.
>
> Webrev updated in place.
>
> Thanks,
> Paul.
>