Regex Point Lambdafication Patch
Ben Evans
benjamin.john.evans at gmail.com
Tue Mar 12 14:33:33 PDT 2013
Thanks Brian.
Does this webrev address the current comments?
http://www.java7developer.com/webrev-kittylyst.002/
By the way, I've noticed that something in the webrev process appears
to be mangling the Unicode chars in some of my tests. Is this OK? Is
it known behaviour?
Thanks,
Ben
On Mon, Mar 11, 2013 at 5:54 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> I think the problem is that hasNext calls find() unconditionally, so
> multiple calls to hasNext can result in skipping? You need to maintain a
> boolean of "found", set it when find() finds something, and clear it when
> next() consumes it. This is a typical iterator pitfall.
>
>
> On 3/11/2013 1:49 PM, Ben Evans wrote:
>>
>> I've tried this:
>>
>> private static class MatcherIterator implements
>> Iterator<CharSequence> {
>> private final Matcher curMatcher;
>> private final CharSequence input;
>> private int current = 0;
>>
>> MatcherIterator(CharSequence in, Matcher m) {
>> input = in;
>> curMatcher = m;
>> }
>>
>> public CharSequence next() {
>> CharSequence nextChunk = input.subSequence(current,
>> curMatcher.start());
>> current = curMatcher.end();
>> return nextChunk;
>> }
>>
>> public boolean hasNext() {
>> return curMatcher.find();
>> }
>> }
>>
>> public Stream<CharSequence> splitAsStream(final CharSequence input) {
>> return Streams.stream(Spliterators.spliteratorUnknownSize(new
>> MatcherIterator(input, matcher(input)), Spliterator.ORDERED));
>> }
>>
>> But it seems to lead to an off-by-one error in my tests - and I'm not
>> sure why (maybe just inexperience with the Spliterators methods).
>>
>> Any ideas?
>>
>> Thanks,
>>
>> Ben
>>
>> On Mon, Mar 11, 2013 at 4:22 PM, Brian Goetz <brian.goetz at oracle.com>
>> wrote:
>>>
>>> Even simpler:
>>>
>>> Spliterator<String> s
>>> = Spliterators.spliteratorUnknownSize(new CharSeqSpliterator(),
>>> ORDERED));
>>>
>>> Just write the iterator and leave the spliterating to the library.
>>>
>>>
>>> On 3/11/2013 12:19 PM, Ben Evans wrote:
>>>>
>>>>
>>>> On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Because you're not overriding trySplit, this spliterator will not allow
>>>>> streams to be parallelized at all, even if the downstream operations
>>>>> could
>>>>> benefit from such, such as in:
>>>>>
>>>>> pattern.splitAsStream(bigString)
>>>>> .parallel()
>>>>> .map(expensiveTransform)
>>>>> .forEach(...);
>>>>>
>>>>> Even though the above pipeline will be limited by the sequential regex
>>>>> splitting at its source, if the downstream operations are expensive
>>>>> enough,
>>>>> they could still benefit from parallelization. But the spliterator, as
>>>>> written, won't permit that -- it is strictly sequential.
>>>>
>>>>
>>>>
>>>> OK, makes sense.
>>>>
>>>>> You can fix this easily by creating an Iterator<String> and wrapping
>>>>> that
>>>>> with Spliterators.spliteratorUnknownSize(iterator, characteristics)
>>>>> instead
>>>>> of writing a sequential-only iterator.
>>>>
>>>>
>>>>
>>>> This part I'm not sure I fully understand.
>>>>
>>>> Did you mean an implementation something like this:
>>>>
>>>> private static class CharSequenceSpliterator implements
>>>> Spliterator<String> {
>>>> private CharSequence input;
>>>> private final HelperIterator it;
>>>> private int current = 0;
>>>>
>>>> CharSequenceSpliterator(CharSequence in, Matcher m) {
>>>> input = in;
>>>> it = new HelperIterator(m);
>>>> }
>>>>
>>>> private class HelperIterator implements Iterator<String> {
>>>> private final Matcher curMatcher;
>>>>
>>>> HelperIterator(Matcher m) {
>>>> curMatcher = m;
>>>> }
>>>>
>>>> public String next() {
>>>> String nextChunk = input.subSequence(current,
>>>> curMatcher.start()).toString();
>>>> current = curMatcher.end();
>>>> return nextChunk;
>>>> }
>>>>
>>>> public boolean hasNext() {
>>>> return curMatcher.find();
>>>> }
>>>> }
>>>>
>>>> public boolean tryAdvance(Consumer<? super String> action) {
>>>> if (it.hasNext()) {
>>>> action.accept(it.next());
>>>> // Match the behaviour of Pattern::split
>>>> if (current == input.length()) return false;
>>>> return true;
>>>> }
>>>>
>>>> action.accept(input.subSequence(current,
>>>> input.length()).toString());
>>>> return false;
>>>> }
>>>>
>>>> public Spliterator<String> trySplit() {
>>>> return Spliterators.spliteratorUnknownSize(it,
>>>> Spliterator.ORDERED);
>>>> }
>>>>
>>>> public int characteristics() {
>>>> return Spliterator.ORDERED;
>>>> }
>>>> }
>>>>
>>>> Note that as currently written, the HelperIterator in the above needs
>>>> to be able to see current in the enclosing class.
>>>>
>>>> Thanks,
>>>>
>>>> Ben
>>>>
>>>
>
More information about the lambda-dev
mailing list