Getting rid of pull

Howard Lovatt howard.lovatt at gmail.com
Wed Dec 19 02:39:26 PST 2012


I used pre-made exceptions, breakException and continueException, in my parallel collections library. Worked great, and really useful for converting existing code. 

Sent from my iPad

On 19/12/2012, at 3:19 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Currently, stream pipelines can operate in one of two modes: push or pull.  (This distinction is invisible to the user; the choice is made by the library based on a number of factors.)
> 
> Given a pipeline:
> 
>  stream.filter(...).map(...).reduce(...)
> 
> A "pull" traversal involves taking an Iterator for the stream, wrapping a filtering iterator around it, wrapping a mapping iterator around that, and having the reduce loop pull elements from the upstream iterator and accumulate the result.
> 
> A "push" traversal involves creating a Sink for the reduction stage, wrapping a mapping sink around that, wrapping a filtering sink around that, and forEach'ing the elements from the source into the filtering sink, and then asking the reducing sink at the end for its result.
> 
> Currently, all operations are required to operated in both modes; to wrap a Sink or an Iterator.
> 
> Short-circuiting operations such as "findFirst" operate in "pull" mode, so as to be usable and performant even on infinite streams.
> 
> Most other operations operate in push mode, because it is more efficient.  Iterators have more per-element overhead; you have to call two methods (hasNext and next), which generally must be defensively coded, requiring duplicated activity between these two methods.  Pull mode also means more heap write traffic (and therefore bad interactions with cardmarking) than push mode, since iterators are nearly always stateful.  So the above chain in "pull" mode requires three stateful iterators (and the accumulation is done in a local), whereas in "push" mode is two stateless sinks and a stateful sink.  (Pipelines ending in forEach are fully stateless, though the forEach target will have side-effects.)  So we try hard to use push mode wherever possible.
> 
> Over the past few months, we've made a series of simplifications in the stream model, including restrictions against stream reuse.  This has eliminated some of the use cases where pull was required, such as:
> 
>  Iterator<T> it = stream.iterator();
>  T first = it.next();
>  stream.forEach(block); // process cdr of stream
> 
> This used to be legal, and forced the forEach into pull mode because the source iterator has already been partially consumed.  Now this case (and many of its relatives, such as forked streams) is illegal, eliminating one of the use cases for pull.  These simplifications enable us to consider getting rid of direct support for pull entirely.  If we eliminate support for push in the framework, huge chunks of code dealing with Iterators simply goes away.  This is a really desirable outcome -- iterator code is bulky and ugly and error-prone.
> 
> 
> The remaining use cases for "pull" are:
> 
> - Short-circuit operations, such as findFirst/findAny.
> - Escape-hatch operations, whereby the client code requests an Iterator (or Spliterator) for the stream.
> 
> Examples of "escape hatch" usage might be "give me every second element" or "give me the third element after the first blue element."  While these are important use cases to continue to support, they are likely to be a small percentage of actual traversals.
> 
> 
> If we got rid of support for iterator wrapping, we could still simulate an Iterator as follows:
> 
> - Construct a terminal sink which accumulates elements into a buffer (such as that used in flatMap)
> - Construct a push-oriented sink chain which flows into this buffer
> - Get an Iterator for the source
> - When more elements are needed, pull elements from the source, push them into the sink, and continue until out of elements or until something flows into the terminal buffering sink.  Then consume elements from the buffering sink until gone, and if there are more elements remaining in the source, continue until the pipeline is dry.  (Think of this as a "pushme-pullyou adapter.")
> 
> This has slightly higher overhead, but it seems pretty reasonable, and doesn't get used that often.  And then, all the code having to do with wrapping iterators can go away, and iterators are then only used as sources and escape hatches (and internally within short-circuit operations.)  Pretty nice.
> 
> 
> We can even go one step farther, and eliminate iterators completely, but it involves some ugliness.  The ugliness involves use of an exception to signal from a forEach target through the forEach source to the initiator of the forEach to about dispensing of elements.  (We can eliminate most of the cost of the exception by not gathering the stack trace, which is by far the most expensive part.)  Then, Spliterator need not even support an iterator() method, just a forEach method that doesn't fall apart if you throw an exception through it.  (Similarly, we don't expose an iterator() method, just a version of forEach that lets the target say "no mas" which causes the exception to be thrown.)  This would eliminate even more low-value code (and simplify the Spliterator spec) that the above approach.  But the exception thing is kind of stinky.
> 
> 
> I don't see any reason to not do at least the first one.
> 


More information about the lambda-libs-spec-observers mailing list