Regex Point Lambdafication Patch

Mon Mar 11 10:54:54 PDT 2013

I think the problem is that hasNext calls find() unconditionally, so 
multiple calls to hasNext can result in skipping?  You need to maintain 
a boolean of "found", set it when find() finds something, and clear it 
when next() consumes it.  This is a typical iterator pitfall.

On 3/11/2013 1:49 PM, Ben Evans wrote:
> I've tried this:
>
>      private static class MatcherIterator implements Iterator<CharSequence> {
> 	private final Matcher curMatcher;
>          private final CharSequence input;
> 	private int current = 0;
> 	
> 	MatcherIterator(CharSequence in, Matcher m) {
> 	    input = in;
> 	    curMatcher = m;
> 	}
> 	
> 	public CharSequence next() {
> 	    CharSequence nextChunk = input.subSequence(current, curMatcher.start());
> 	    current = curMatcher.end();
> 	    return nextChunk;
> 	}
>
> 	public boolean hasNext() {
> 	    return curMatcher.find();
> 	}
>      }
>
>      public Stream<CharSequence> splitAsStream(final CharSequence input) {
> 	return Streams.stream(Spliterators.spliteratorUnknownSize(new
> MatcherIterator(input, matcher(input)), Spliterator.ORDERED));
>      }
>
> But it seems to lead to an off-by-one error in my tests - and I'm not
> sure why (maybe just inexperience with the Spliterators methods).
>
> Any ideas?
>
> Thanks,
>
> Ben
>
> On Mon, Mar 11, 2013 at 4:22 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>> Even simpler:
>>
>> Spliterator<String> s
>>      = Spliterators.spliteratorUnknownSize(new CharSeqSpliterator(),
>>                                            ORDERED));
>>
>> Just write the iterator and leave the spliterating to the library.
>>
>>
>> On 3/11/2013 12:19 PM, Ben Evans wrote:
>>>
>>> On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com>
>>> wrote:
>>>>
>>>>
>>>> Because you're not overriding trySplit, this spliterator will not allow
>>>> streams to be parallelized at all, even if the downstream operations
>>>> could
>>>> benefit from such, such as in:
>>>>
>>>>     pattern.splitAsStream(bigString)
>>>>            .parallel()
>>>>            .map(expensiveTransform)
>>>>            .forEach(...);
>>>>
>>>> Even though the above pipeline will be limited by the sequential regex
>>>> splitting at its source, if the downstream operations are expensive
>>>> enough,
>>>> they could still benefit from parallelization.  But the spliterator, as
>>>> written, won't permit that -- it is strictly sequential.
>>>
>>>
>>> OK, makes sense.
>>>
>>>> You can fix this easily by creating an Iterator<String> and wrapping that
>>>> with Spliterators.spliteratorUnknownSize(iterator, characteristics)
>>>> instead
>>>> of writing a sequential-only iterator.
>>>
>>>
>>> This part I'm not sure I fully understand.
>>>
>>> Did you mean an implementation something like this:
>>>
>>>       private static class CharSequenceSpliterator implements
>>> Spliterator<String> {
>>>           private CharSequence input;
>>>           private final HelperIterator it;
>>>          private int current = 0;
>>>
>>>          CharSequenceSpliterator(CharSequence in, Matcher m) {
>>>              input = in;
>>>              it = new HelperIterator(m);
>>>          }
>>>
>>>          private class HelperIterator implements Iterator<String> {
>>>              private final Matcher curMatcher;
>>>
>>>              HelperIterator(Matcher m) {
>>>                  curMatcher = m;
>>>              }
>>>
>>>              public String next() {
>>>                  String nextChunk = input.subSequence(current,
>>> curMatcher.start()).toString();
>>>                  current = curMatcher.end();
>>>                  return nextChunk;
>>>              }
>>>
>>>              public boolean hasNext() {
>>>                  return curMatcher.find();
>>>              }
>>>          }
>>>
>>>          public boolean tryAdvance(Consumer<? super String> action) {
>>>              if (it.hasNext()) {
>>>                  action.accept(it.next());
>>>                  // Match the behaviour of Pattern::split
>>>                  if (current == input.length()) return false;
>>>                  return true;
>>>              }
>>>
>>>              action.accept(input.subSequence(current,
>>> input.length()).toString());
>>>              return false;
>>>          }
>>>
>>>          public Spliterator<String> trySplit() {
>>>              return Spliterators.spliteratorUnknownSize(it,
>>> Spliterator.ORDERED);
>>>          }
>>>
>>>          public int characteristics() {
>>>              return Spliterator.ORDERED;
>>>          }
>>>       }
>>>
>>> Note that as currently written, the HelperIterator in the above needs
>>> to be able to see current in the enclosing class.
>>>
>>> Thanks,
>>>
>>> Ben
>>>
>>