Regex Point Lambdafication Patch
Brian Goetz
brian.goetz at oracle.com
Tue Mar 12 14:38:27 PDT 2013
Yes, this looks much better!
As a testing strategy, I would also compare the result of
splitAsStream().toArray(String[]::new) with the result of split().
Being able to use existing behavior as a testing anchor raises confidence.
On 3/12/2013 5:33 PM, Ben Evans wrote:
> Thanks Brian.
>
> Does this webrev address the current comments?
>
> http://www.java7developer.com/webrev-kittylyst.002/
>
> By the way, I've noticed that something in the webrev process appears
> to be mangling the Unicode chars in some of my tests. Is this OK? Is
> it known behaviour?
>
> Thanks,
>
> Ben
>
> On Mon, Mar 11, 2013 at 5:54 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>> I think the problem is that hasNext calls find() unconditionally, so
>> multiple calls to hasNext can result in skipping? You need to maintain a
>> boolean of "found", set it when find() finds something, and clear it when
>> next() consumes it. This is a typical iterator pitfall.
>>
>>
>> On 3/11/2013 1:49 PM, Ben Evans wrote:
>>>
>>> I've tried this:
>>>
>>> private static class MatcherIterator implements
>>> Iterator<CharSequence> {
>>> private final Matcher curMatcher;
>>> private final CharSequence input;
>>> private int current = 0;
>>>
>>> MatcherIterator(CharSequence in, Matcher m) {
>>> input = in;
>>> curMatcher = m;
>>> }
>>>
>>> public CharSequence next() {
>>> CharSequence nextChunk = input.subSequence(current,
>>> curMatcher.start());
>>> current = curMatcher.end();
>>> return nextChunk;
>>> }
>>>
>>> public boolean hasNext() {
>>> return curMatcher.find();
>>> }
>>> }
>>>
>>> public Stream<CharSequence> splitAsStream(final CharSequence input) {
>>> return Streams.stream(Spliterators.spliteratorUnknownSize(new
>>> MatcherIterator(input, matcher(input)), Spliterator.ORDERED));
>>> }
>>>
>>> But it seems to lead to an off-by-one error in my tests - and I'm not
>>> sure why (maybe just inexperience with the Spliterators methods).
>>>
>>> Any ideas?
>>>
>>> Thanks,
>>>
>>> Ben
>>>
>>> On Mon, Mar 11, 2013 at 4:22 PM, Brian Goetz <brian.goetz at oracle.com>
>>> wrote:
>>>>
>>>> Even simpler:
>>>>
>>>> Spliterator<String> s
>>>> = Spliterators.spliteratorUnknownSize(new CharSeqSpliterator(),
>>>> ORDERED));
>>>>
>>>> Just write the iterator and leave the spliterating to the library.
>>>>
>>>>
>>>> On 3/11/2013 12:19 PM, Ben Evans wrote:
>>>>>
>>>>>
>>>>> On Sun, Mar 10, 2013 at 6:26 PM, Brian Goetz <brian.goetz at oracle.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Because you're not overriding trySplit, this spliterator will not allow
>>>>>> streams to be parallelized at all, even if the downstream operations
>>>>>> could
>>>>>> benefit from such, such as in:
>>>>>>
>>>>>> pattern.splitAsStream(bigString)
>>>>>> .parallel()
>>>>>> .map(expensiveTransform)
>>>>>> .forEach(...);
>>>>>>
>>>>>> Even though the above pipeline will be limited by the sequential regex
>>>>>> splitting at its source, if the downstream operations are expensive
>>>>>> enough,
>>>>>> they could still benefit from parallelization. But the spliterator, as
>>>>>> written, won't permit that -- it is strictly sequential.
>>>>>
>>>>>
>>>>>
>>>>> OK, makes sense.
>>>>>
>>>>>> You can fix this easily by creating an Iterator<String> and wrapping
>>>>>> that
>>>>>> with Spliterators.spliteratorUnknownSize(iterator, characteristics)
>>>>>> instead
>>>>>> of writing a sequential-only iterator.
>>>>>
>>>>>
>>>>>
>>>>> This part I'm not sure I fully understand.
>>>>>
>>>>> Did you mean an implementation something like this:
>>>>>
>>>>> private static class CharSequenceSpliterator implements
>>>>> Spliterator<String> {
>>>>> private CharSequence input;
>>>>> private final HelperIterator it;
>>>>> private int current = 0;
>>>>>
>>>>> CharSequenceSpliterator(CharSequence in, Matcher m) {
>>>>> input = in;
>>>>> it = new HelperIterator(m);
>>>>> }
>>>>>
>>>>> private class HelperIterator implements Iterator<String> {
>>>>> private final Matcher curMatcher;
>>>>>
>>>>> HelperIterator(Matcher m) {
>>>>> curMatcher = m;
>>>>> }
>>>>>
>>>>> public String next() {
>>>>> String nextChunk = input.subSequence(current,
>>>>> curMatcher.start()).toString();
>>>>> current = curMatcher.end();
>>>>> return nextChunk;
>>>>> }
>>>>>
>>>>> public boolean hasNext() {
>>>>> return curMatcher.find();
>>>>> }
>>>>> }
>>>>>
>>>>> public boolean tryAdvance(Consumer<? super String> action) {
>>>>> if (it.hasNext()) {
>>>>> action.accept(it.next());
>>>>> // Match the behaviour of Pattern::split
>>>>> if (current == input.length()) return false;
>>>>> return true;
>>>>> }
>>>>>
>>>>> action.accept(input.subSequence(current,
>>>>> input.length()).toString());
>>>>> return false;
>>>>> }
>>>>>
>>>>> public Spliterator<String> trySplit() {
>>>>> return Spliterators.spliteratorUnknownSize(it,
>>>>> Spliterator.ORDERED);
>>>>> }
>>>>>
>>>>> public int characteristics() {
>>>>> return Spliterator.ORDERED;
>>>>> }
>>>>> }
>>>>>
>>>>> Note that as currently written, the HelperIterator in the above needs
>>>>> to be able to see current in the enclosing class.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ben
>>>>>
>>>>
>>
More information about the lambda-dev
mailing list