Regex Point Lambdafication Patch

Tue Mar 12 14:38:27 PDT 2013

Yes, this looks much better!

As a testing strategy, I would also compare the result of 
splitAsStream().toArray(String[]::new) with the result of split(). 
Being able to use existing behavior as a testing anchor raises confidence.

On 3/12/2013 5:33 PM, Ben Evans wrote:
> Thanks Brian.
>
> Does this webrev address the current comments?
>
> http://www.java7developer.com/webrev-kittylyst.002/
>
> By the way, I've noticed that something in the webrev process appears
> to be mangling the Unicode chars in some of my tests. Is this OK? Is
> it known behaviour?
>
> Thanks,
>
> Ben
>
> On Mon, Mar 11, 2013 at 5:54 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>> I think the problem is that hasNext calls find() unconditionally, so
>> multiple calls to hasNext can result in skipping?  You need to maintain a
>> boolean of "found", set it when find() finds something, and clear it when
>> next() consumes it.  This is a typical iterator pitfall.
>>
>>
>> On 3/11/2013 1:49 PM, Ben Evans wrote:
>>>
>>> I've tried this:
>>>
>>>       private static class MatcherIterator implements
>>> Iterator<CharSequence> {
>>>          private final Matcher curMatcher;
>>>           private final CharSequence input;
>>>          private int current = 0;
>>>
>>>          MatcherIterator(CharSequence in, Matcher m) {
>>>              input = in;
>>>              curMatcher = m;
>>>          }
>>>
>>>          public CharSequence next() {
>>>              CharSequence nextChunk = input.subSequence(current,
>>> curMatcher.start());
>>>              current = curMatcher.end();
>>>              return nextChunk;
>>>          }
>>>
>>>          public boolean hasNext() {
>>>              return curMatcher.find();
>>>          }
>>>       }
>>>
>>>       public Stream<CharSequence> splitAsStream(final CharSequence input) {
>>>          return Streams.stream(Spliterators.spliteratorUnknownSize(new
>>> MatcherIterator(input, matcher(input)), Spliterator.ORDERED));
>>>       }
>>>
>>> But it seems to lead to an off-by-one error in my tests - and I'm not
>>> sure why (maybe just inexperience with the Spliterators methods).
>>>
>>> Any ideas?
>>>
>>> Thanks,
>>>
>>> Ben
>>>
>>> On Mon, Mar 11, 2013 at 4:22 PM, Brian Goetz <brian.goetz at oracle.com>
>>> wrote:
>>>>
>>>> Even simpler:
>>>>
>>>> Spliterator<String> s
>>>>       = Spliterators.spliteratorUnknownSize(new CharSeqSpliterator(),
>>>>                                             ORDERED));
>>>>
>>>> Just write the iterator and leave the spliterating to the library.
>>>>
>>>>
>>>> On 3/11/2013 12:19 PM, Ben Evans wrote:
>>>>>
>>>>>
>>>>> On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Because you're not overriding trySplit, this spliterator will not allow
>>>>>> streams to be parallelized at all, even if the downstream operations
>>>>>> could
>>>>>> benefit from such, such as in:
>>>>>>
>>>>>>      pattern.splitAsStream(bigString)
>>>>>>             .parallel()
>>>>>>             .map(expensiveTransform)
>>>>>>             .forEach(...);
>>>>>>
>>>>>> Even though the above pipeline will be limited by the sequential regex
>>>>>> splitting at its source, if the downstream operations are expensive
>>>>>> enough,
>>>>>> they could still benefit from parallelization.  But the spliterator, as
>>>>>> written, won't permit that -- it is strictly sequential.
>>>>>
>>>>>
>>>>>
>>>>> OK, makes sense.
>>>>>
>>>>>> You can fix this easily by creating an Iterator<String> and wrapping
>>>>>> that
>>>>>> with Spliterators.spliteratorUnknownSize(iterator, characteristics)
>>>>>> instead
>>>>>> of writing a sequential-only iterator.
>>>>>
>>>>>
>>>>>
>>>>> This part I'm not sure I fully understand.
>>>>>
>>>>> Did you mean an implementation something like this:
>>>>>
>>>>>        private static class CharSequenceSpliterator implements
>>>>> Spliterator<String> {
>>>>>            private CharSequence input;
>>>>>            private final HelperIterator it;
>>>>>           private int current = 0;
>>>>>
>>>>>           CharSequenceSpliterator(CharSequence in, Matcher m) {
>>>>>               input = in;
>>>>>               it = new HelperIterator(m);
>>>>>           }
>>>>>
>>>>>           private class HelperIterator implements Iterator<String> {
>>>>>               private final Matcher curMatcher;
>>>>>
>>>>>               HelperIterator(Matcher m) {
>>>>>                   curMatcher = m;
>>>>>               }
>>>>>
>>>>>               public String next() {
>>>>>                   String nextChunk = input.subSequence(current,
>>>>> curMatcher.start()).toString();
>>>>>                   current = curMatcher.end();
>>>>>                   return nextChunk;
>>>>>               }
>>>>>
>>>>>               public boolean hasNext() {
>>>>>                   return curMatcher.find();
>>>>>               }
>>>>>           }
>>>>>
>>>>>           public boolean tryAdvance(Consumer<? super String> action) {
>>>>>               if (it.hasNext()) {
>>>>>                   action.accept(it.next());
>>>>>                   // Match the behaviour of Pattern::split
>>>>>                   if (current == input.length()) return false;
>>>>>                   return true;
>>>>>               }
>>>>>
>>>>>               action.accept(input.subSequence(current,
>>>>> input.length()).toString());
>>>>>               return false;
>>>>>           }
>>>>>
>>>>>           public Spliterator<String> trySplit() {
>>>>>               return Spliterators.spliteratorUnknownSize(it,
>>>>> Spliterator.ORDERED);
>>>>>           }
>>>>>
>>>>>           public int characteristics() {
>>>>>               return Spliterator.ORDERED;
>>>>>           }
>>>>>        }
>>>>>
>>>>> Note that as currently written, the HelperIterator in the above needs
>>>>> to be able to see current in the enclosing class.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ben
>>>>>
>>>>
>>