Regex Point Lambdafication Patch
Ben Evans
benjamin.john.evans at gmail.com
Mon Mar 11 10:49:29 PDT 2013
I've tried this:
private static class MatcherIterator implements Iterator<CharSequence> {
private final Matcher curMatcher;
private final CharSequence input;
private int current = 0;
MatcherIterator(CharSequence in, Matcher m) {
input = in;
curMatcher = m;
}
public CharSequence next() {
CharSequence nextChunk = input.subSequence(current, curMatcher.start());
current = curMatcher.end();
return nextChunk;
}
public boolean hasNext() {
return curMatcher.find();
}
}
public Stream<CharSequence> splitAsStream(final CharSequence input) {
return Streams.stream(Spliterators.spliteratorUnknownSize(new
MatcherIterator(input, matcher(input)), Spliterator.ORDERED));
}
But it seems to lead to an off-by-one error in my tests - and I'm not
sure why (maybe just inexperience with the Spliterators methods).
Any ideas?
Thanks,
Ben
On Mon, Mar 11, 2013 at 4:22 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> Even simpler:
>
> Spliterator<String> s
> = Spliterators.spliteratorUnknownSize(new CharSeqSpliterator(),
> ORDERED));
>
> Just write the iterator and leave the spliterating to the library.
>
>
> On 3/11/2013 12:19 PM, Ben Evans wrote:
>>
>> On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com>
>> wrote:
>>>
>>>
>>> Because you're not overriding trySplit, this spliterator will not allow
>>> streams to be parallelized at all, even if the downstream operations
>>> could
>>> benefit from such, such as in:
>>>
>>> pattern.splitAsStream(bigString)
>>> .parallel()
>>> .map(expensiveTransform)
>>> .forEach(...);
>>>
>>> Even though the above pipeline will be limited by the sequential regex
>>> splitting at its source, if the downstream operations are expensive
>>> enough,
>>> they could still benefit from parallelization. But the spliterator, as
>>> written, won't permit that -- it is strictly sequential.
>>
>>
>> OK, makes sense.
>>
>>> You can fix this easily by creating an Iterator<String> and wrapping that
>>> with Spliterators.spliteratorUnknownSize(iterator, characteristics)
>>> instead
>>> of writing a sequential-only iterator.
>>
>>
>> This part I'm not sure I fully understand.
>>
>> Did you mean an implementation something like this:
>>
>> private static class CharSequenceSpliterator implements
>> Spliterator<String> {
>> private CharSequence input;
>> private final HelperIterator it;
>> private int current = 0;
>>
>> CharSequenceSpliterator(CharSequence in, Matcher m) {
>> input = in;
>> it = new HelperIterator(m);
>> }
>>
>> private class HelperIterator implements Iterator<String> {
>> private final Matcher curMatcher;
>>
>> HelperIterator(Matcher m) {
>> curMatcher = m;
>> }
>>
>> public String next() {
>> String nextChunk = input.subSequence(current,
>> curMatcher.start()).toString();
>> current = curMatcher.end();
>> return nextChunk;
>> }
>>
>> public boolean hasNext() {
>> return curMatcher.find();
>> }
>> }
>>
>> public boolean tryAdvance(Consumer<? super String> action) {
>> if (it.hasNext()) {
>> action.accept(it.next());
>> // Match the behaviour of Pattern::split
>> if (current == input.length()) return false;
>> return true;
>> }
>>
>> action.accept(input.subSequence(current,
>> input.length()).toString());
>> return false;
>> }
>>
>> public Spliterator<String> trySplit() {
>> return Spliterators.spliteratorUnknownSize(it,
>> Spliterator.ORDERED);
>> }
>>
>> public int characteristics() {
>> return Spliterator.ORDERED;
>> }
>> }
>>
>> Note that as currently written, the HelperIterator in the above needs
>> to be able to see current in the enclosing class.
>>
>> Thanks,
>>
>> Ben
>>
>
More information about the lambda-dev
mailing list