Regex Point Lambdafication Patch
Brian Goetz
brian.goetz at oracle.com
Mon Mar 11 10:54:54 PDT 2013
I think the problem is that hasNext calls find() unconditionally, so
multiple calls to hasNext can result in skipping? You need to maintain
a boolean of "found", set it when find() finds something, and clear it
when next() consumes it. This is a typical iterator pitfall.
On 3/11/2013 1:49 PM, Ben Evans wrote:
> I've tried this:
>
> private static class MatcherIterator implements Iterator<CharSequence> {
> private final Matcher curMatcher;
> private final CharSequence input;
> private int current = 0;
>
> MatcherIterator(CharSequence in, Matcher m) {
> input = in;
> curMatcher = m;
> }
>
> public CharSequence next() {
> CharSequence nextChunk = input.subSequence(current, curMatcher.start());
> current = curMatcher.end();
> return nextChunk;
> }
>
> public boolean hasNext() {
> return curMatcher.find();
> }
> }
>
> public Stream<CharSequence> splitAsStream(final CharSequence input) {
> return Streams.stream(Spliterators.spliteratorUnknownSize(new
> MatcherIterator(input, matcher(input)), Spliterator.ORDERED));
> }
>
> But it seems to lead to an off-by-one error in my tests - and I'm not
> sure why (maybe just inexperience with the Spliterators methods).
>
> Any ideas?
>
> Thanks,
>
> Ben
>
> On Mon, Mar 11, 2013 at 4:22 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>> Even simpler:
>>
>> Spliterator<String> s
>> = Spliterators.spliteratorUnknownSize(new CharSeqSpliterator(),
>> ORDERED));
>>
>> Just write the iterator and leave the spliterating to the library.
>>
>>
>> On 3/11/2013 12:19 PM, Ben Evans wrote:
>>>
>>> On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com>
>>> wrote:
>>>>
>>>>
>>>> Because you're not overriding trySplit, this spliterator will not allow
>>>> streams to be parallelized at all, even if the downstream operations
>>>> could
>>>> benefit from such, such as in:
>>>>
>>>> pattern.splitAsStream(bigString)
>>>> .parallel()
>>>> .map(expensiveTransform)
>>>> .forEach(...);
>>>>
>>>> Even though the above pipeline will be limited by the sequential regex
>>>> splitting at its source, if the downstream operations are expensive
>>>> enough,
>>>> they could still benefit from parallelization. But the spliterator, as
>>>> written, won't permit that -- it is strictly sequential.
>>>
>>>
>>> OK, makes sense.
>>>
>>>> You can fix this easily by creating an Iterator<String> and wrapping that
>>>> with Spliterators.spliteratorUnknownSize(iterator, characteristics)
>>>> instead
>>>> of writing a sequential-only iterator.
>>>
>>>
>>> This part I'm not sure I fully understand.
>>>
>>> Did you mean an implementation something like this:
>>>
>>> private static class CharSequenceSpliterator implements
>>> Spliterator<String> {
>>> private CharSequence input;
>>> private final HelperIterator it;
>>> private int current = 0;
>>>
>>> CharSequenceSpliterator(CharSequence in, Matcher m) {
>>> input = in;
>>> it = new HelperIterator(m);
>>> }
>>>
>>> private class HelperIterator implements Iterator<String> {
>>> private final Matcher curMatcher;
>>>
>>> HelperIterator(Matcher m) {
>>> curMatcher = m;
>>> }
>>>
>>> public String next() {
>>> String nextChunk = input.subSequence(current,
>>> curMatcher.start()).toString();
>>> current = curMatcher.end();
>>> return nextChunk;
>>> }
>>>
>>> public boolean hasNext() {
>>> return curMatcher.find();
>>> }
>>> }
>>>
>>> public boolean tryAdvance(Consumer<? super String> action) {
>>> if (it.hasNext()) {
>>> action.accept(it.next());
>>> // Match the behaviour of Pattern::split
>>> if (current == input.length()) return false;
>>> return true;
>>> }
>>>
>>> action.accept(input.subSequence(current,
>>> input.length()).toString());
>>> return false;
>>> }
>>>
>>> public Spliterator<String> trySplit() {
>>> return Spliterators.spliteratorUnknownSize(it,
>>> Spliterator.ORDERED);
>>> }
>>>
>>> public int characteristics() {
>>> return Spliterator.ORDERED;
>>> }
>>> }
>>>
>>> Note that as currently written, the HelperIterator in the above needs
>>> to be able to see current in the enclosing class.
>>>
>>> Thanks,
>>>
>>> Ben
>>>
>>
More information about the lambda-dev
mailing list