Characterizing stream operation

Fri Feb 15 12:04:14 PST 2013

We've divided stream operations as follows:

Intermediate operations.  Always lazy.  Always produce another stream.

Stateful operations.  A kind of intermediate operation.  Currently 
always transforms to the same stream type (e.g., Stream<T> to 
Stream<T>), though this could conceivably change (we haven't found any, 
though).  Must provide their own parallel implementation.  Parallel 
pipelines containing stateful operations are implicitly "sliced" into 
segments on stateful operation boundaries, and executed in segments.

Terminal operations.  The only thing that kicks off stream computation. 
  Produces a non-stream result (value or side-effects.)

For each of these, once you perform an operation on a stream 
(intermediate or terminal), the stream is *consumed* and no more 
operations can be performed on that stream.  (Not entirely true, as the 
TCK team will almost certainly point out to us eventually; there are 
some ops that are no-ops and probably will succeed unless we add 
consumed checks.)

These names are fine from the perspective of the implementation; when 
implementing an operation, you will be implementing one of these three 
types, and conveniently there is a base type for each to subclass.

 From the user perspective, though, they may not be as helpful as some 
alternative taxonomies, such as:

  - lazy operation -- what we now call intermediate operation
  - stateful lazy operation -- what we now call stateful
  - consuming operation -- what we now call terminal

These are good in that they keep a key characteristic -- when the 
computation happens -- in full view.  However, they also create less 
clean boundaries.  For example, iterator() is a consuming operation from 
the perspective of the stream, but from the perspective of the user, may 
be thought of as lazy.

Thoughts on how to adjust this naming to be more intuitive to users?