Regex Point Lambdafication Patch

Ben Evans benjamin.john.evans at gmail.com
Mon Mar 11 09:19:21 PDT 2013


On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
> Because you're not overriding trySplit, this spliterator will not allow
> streams to be parallelized at all, even if the downstream operations could
> benefit from such, such as in:
>
>   pattern.splitAsStream(bigString)
>          .parallel()
>          .map(expensiveTransform)
>          .forEach(...);
>
> Even though the above pipeline will be limited by the sequential regex
> splitting at its source, if the downstream operations are expensive enough,
> they could still benefit from parallelization.  But the spliterator, as
> written, won't permit that -- it is strictly sequential.

OK, makes sense.

> You can fix this easily by creating an Iterator<String> and wrapping that
> with Spliterators.spliteratorUnknownSize(iterator, characteristics) instead
> of writing a sequential-only iterator.

This part I'm not sure I fully understand.

Did you mean an implementation something like this:

    private static class CharSequenceSpliterator implements
Spliterator<String> {
        private CharSequence input;
        private final HelperIterator it;
	private int current = 0;

	CharSequenceSpliterator(CharSequence in, Matcher m) {
	    input = in;
	    it = new HelperIterator(m);
	}

	private class HelperIterator implements Iterator<String> {
	    private final Matcher curMatcher;
	
	    HelperIterator(Matcher m) {
		curMatcher = m;
	    }

	    public String next() {
		String nextChunk = input.subSequence(current, curMatcher.start()).toString();
		current = curMatcher.end();
		return nextChunk;
	    }

	    public boolean hasNext() {
		return curMatcher.find();
	    }
	}

	public boolean tryAdvance(Consumer<? super String> action) {
	    if (it.hasNext()) {
		action.accept(it.next());
		// Match the behaviour of Pattern::split
		if (current == input.length()) return false;
		return true;
	    }
			
	    action.accept(input.subSequence(current, input.length()).toString());
	    return false;
	}

	public Spliterator<String> trySplit() {
	    return Spliterators.spliteratorUnknownSize(it, Spliterator.ORDERED);
	}

	public int characteristics() {
	    return Spliterator.ORDERED;
	}
    }

Note that as currently written, the HelperIterator in the above needs
to be able to see current in the enclosing class.

Thanks,

Ben


More information about the lambda-dev mailing list