Getting rid of pull

Wed Dec 19 03:00:38 PST 2012

On 12/19/2012 05:19 AM, Brian Goetz wrote:
> Currently, stream pipelines can operate in one of two modes: push or 
> pull. (This distinction is invisible to the user; the choice is made 
> by the library based on a number of factors.)
>
> Given a pipeline:
>
>   stream.filter(...).map(...).reduce(...)
>
> A "pull" traversal involves taking an Iterator for the stream, 
> wrapping a filtering iterator around it, wrapping a mapping iterator 
> around that, and having the reduce loop pull elements from the 
> upstream iterator and accumulate the result.
>
> A "push" traversal involves creating a Sink for the reduction stage, 
> wrapping a mapping sink around that, wrapping a filtering sink around 
> that, and forEach'ing the elements from the source into the filtering 
> sink, and then asking the reducing sink at the end for its result.
>
> Currently, all operations are required to operated in both modes; to 
> wrap a Sink or an Iterator.
>
> Short-circuiting operations such as "findFirst" operate in "pull" 
> mode, so as to be usable and performant even on infinite streams.
>
> Most other operations operate in push mode, because it is more 
> efficient.  Iterators have more per-element overhead; you have to call 
> two methods (hasNext and next), which generally must be defensively 
> coded, requiring duplicated activity between these two methods.  Pull 
> mode also means more heap write traffic (and therefore bad 
> interactions with cardmarking) than push mode, since iterators are 
> nearly always stateful.  So the above chain in "pull" mode requires 
> three stateful iterators (and the accumulation is done in a local), 
> whereas in "push" mode is two stateless sinks and a stateful sink.  
> (Pipelines ending in forEach are fully stateless, though the forEach 
> target will have side-effects.)  So we try hard to use push mode 
> wherever possible.
>
> Over the past few months, we've made a series of simplifications in 
> the stream model, including restrictions against stream reuse. This 
> has eliminated some of the use cases where pull was required, such as:
>
>   Iterator<T> it = stream.iterator();
>   T first = it.next();
>   stream.forEach(block); // process cdr of stream
>
> This used to be legal, and forced the forEach into pull mode because 
> the source iterator has already been partially consumed. Now this case 
> (and many of its relatives, such as forked streams) is illegal, 
> eliminating one of the use cases for pull.  These simplifications 
> enable us to consider getting rid of direct support for pull 
> entirely.  If we eliminate support for push in the framework, huge 
> chunks of code dealing with Iterators simply goes away.  This is a 
> really desirable outcome -- iterator code is bulky and ugly and 
> error-prone.
>
>
> The remaining use cases for "pull" are:
>
>  - Short-circuit operations, such as findFirst/findAny.
>  - Escape-hatch operations, whereby the client code requests an 
> Iterator (or Spliterator) for the stream.
>
> Examples of "escape hatch" usage might be "give me every second 
> element" or "give me the third element after the first blue element."  
> While these are important use cases to continue to support, they are 
> likely to be a small percentage of actual traversals.
>
>
> If we got rid of support for iterator wrapping, we could still 
> simulate an Iterator as follows:
>
>  - Construct a terminal sink which accumulates elements into a buffer 
> (such as that used in flatMap)
>  - Construct a push-oriented sink chain which flows into this buffer
>  - Get an Iterator for the source
>  - When more elements are needed, pull elements from the source, push 
> them into the sink, and continue until out of elements or until 
> something flows into the terminal buffering sink.  Then consume 
> elements from the buffering sink until gone, and if there are more 
> elements remaining in the source, continue until the pipeline is dry.  
> (Think of this as a "pushme-pullyou adapter.")
>
> This has slightly higher overhead, but it seems pretty reasonable, and 
> doesn't get used that often.  And then, all the code having to do with 
> wrapping iterators can go away, and iterators are then only used as 
> sources and escape hatches (and internally within short-circuit 
> operations.)  Pretty nice.

Yes, get ride of the pull mode please,

Here is a small prototype that uses push only
http://igm.univ-mlv.fr/~forax/tmp/pushonlypipeline/src/java/util/stream/

as you can see you don't even need to propagate an exception to 
implement findFirst().

>
>
> We can even go one step farther, and eliminate iterators completely, 
> but it involves some ugliness.  The ugliness involves use of an 
> exception to signal from a forEach target through the forEach source 
> to the initiator of the forEach to about dispensing of elements.  (We 
> can eliminate most of the cost of the exception by not gathering the 
> stack trace, which is by far the most expensive part.)  Then, 
> Spliterator need not even support an iterator() method, just a forEach 
> method that doesn't fall apart if you throw an exception through it.  
> (Similarly, we don't expose an iterator() method, just a version of 
> forEach that lets the target say "no mas" which causes the exception 
> to be thrown.) This would eliminate even more low-value code (and 
> simplify the Spliterator spec) that the above approach.  But the 
> exception thing is kind of stinky.

You can not do that easily, because you need one forEach, find, and 
reduce by type specialization.

>
>
> I don't see any reason to not do at least the first one.
>

Rémi