Regex Point Lambdafication Patch

Mon Mar 11 10:49:29 PDT 2013

I've tried this:

    private static class MatcherIterator implements Iterator<CharSequence> {
	private final Matcher curMatcher;
        private final CharSequence input;
	private int current = 0;

	MatcherIterator(CharSequence in, Matcher m) {
	    input = in;
	    curMatcher = m;
	}

	public CharSequence next() {
	    CharSequence nextChunk = input.subSequence(current, curMatcher.start());
	    current = curMatcher.end();
	    return nextChunk;
	}

	public boolean hasNext() {
	    return curMatcher.find();
	}
    }

    public Stream<CharSequence> splitAsStream(final CharSequence input) {
	return Streams.stream(Spliterators.spliteratorUnknownSize(new
MatcherIterator(input, matcher(input)), Spliterator.ORDERED));
    }

But it seems to lead to an off-by-one error in my tests - and I'm not
sure why (maybe just inexperience with the Spliterators methods).

Any ideas?

Thanks,

Ben

On Mon, Mar 11, 2013 at 4:22 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> Even simpler:
>
> Spliterator<String> s
>     = Spliterators.spliteratorUnknownSize(new CharSeqSpliterator(),
>                                           ORDERED));
>
> Just write the iterator and leave the spliterating to the library.
>
>
> On 3/11/2013 12:19 PM, Ben Evans wrote:
>>
>> On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com>
>> wrote:
>>>
>>>
>>> Because you're not overriding trySplit, this spliterator will not allow
>>> streams to be parallelized at all, even if the downstream operations
>>> could
>>> benefit from such, such as in:
>>>
>>>    pattern.splitAsStream(bigString)
>>>           .parallel()
>>>           .map(expensiveTransform)
>>>           .forEach(...);
>>>
>>> Even though the above pipeline will be limited by the sequential regex
>>> splitting at its source, if the downstream operations are expensive
>>> enough,
>>> they could still benefit from parallelization.  But the spliterator, as
>>> written, won't permit that -- it is strictly sequential.
>>
>>
>> OK, makes sense.
>>
>>> You can fix this easily by creating an Iterator<String> and wrapping that
>>> with Spliterators.spliteratorUnknownSize(iterator, characteristics)
>>> instead
>>> of writing a sequential-only iterator.
>>
>>
>> This part I'm not sure I fully understand.
>>
>> Did you mean an implementation something like this:
>>
>>      private static class CharSequenceSpliterator implements
>> Spliterator<String> {
>>          private CharSequence input;
>>          private final HelperIterator it;
>>         private int current = 0;
>>
>>         CharSequenceSpliterator(CharSequence in, Matcher m) {
>>             input = in;
>>             it = new HelperIterator(m);
>>         }
>>
>>         private class HelperIterator implements Iterator<String> {
>>             private final Matcher curMatcher;
>>
>>             HelperIterator(Matcher m) {
>>                 curMatcher = m;
>>             }
>>
>>             public String next() {
>>                 String nextChunk = input.subSequence(current,
>> curMatcher.start()).toString();
>>                 current = curMatcher.end();
>>                 return nextChunk;
>>             }
>>
>>             public boolean hasNext() {
>>                 return curMatcher.find();
>>             }
>>         }
>>
>>         public boolean tryAdvance(Consumer<? super String> action) {
>>             if (it.hasNext()) {
>>                 action.accept(it.next());
>>                 // Match the behaviour of Pattern::split
>>                 if (current == input.length()) return false;
>>                 return true;
>>             }
>>
>>             action.accept(input.subSequence(current,
>> input.length()).toString());
>>             return false;
>>         }
>>
>>         public Spliterator<String> trySplit() {
>>             return Spliterators.spliteratorUnknownSize(it,
>> Spliterator.ORDERED);
>>         }
>>
>>         public int characteristics() {
>>             return Spliterator.ORDERED;
>>         }
>>      }
>>
>> Note that as currently written, the HelperIterator in the above needs
>> to be able to see current in the enclosing class.
>>
>> Thanks,
>>
>> Ben
>>
>