size computation in SkipOp

Sat Sep 22 07:53:18 PDT 2012

Thanks for reading carefully!

Its unfortunately not quite as simple as this.  While these things are 
still in flux (and poorly documented), the current interpretation of 
SIZED for an intermediate operation is not just "knows its size", but 
"preserves the upstream size."

Backing up, the reason it is valuable to know the size is so that you 
can exact-size the target in pipelines like:

   list.stream().map(...).toArray()

and avoid an array copy.  Since list knows its size, and map preserves 
that size, this is a possibility.  This trick also works in the parallel 
case if we have perfect information about the decomposition; if we know 
the exact offset and size of each chunk, we can allocate a big correctly 
sized array at the beginning of the operation, and instruct each chunk 
to write the mapped results into the exact right offset of the target 
array.  Big win.  And for common cases (such as arrays, ArrayList, and 
some balanced trees) we do have predictable decomposition.

For the trick you suggest to work, there needs to be an additional way 
to ask the operation 'what would your size be when done'.  Currently, we 
don't have that, so we're currently restricting ourselves to 
interpreting SIZED as size-preserving.  This is something on our list to 
explore, but we're wary of introducing extra up-front computation into 
the pipeline setup, because that's all on the wrong size of Amdahl's Law 
(increasing the serial fraction.)

On 9/22/2012 8:28 AM, Arne Siegel wrote:
> I'd expect the following computation in SkipOp.wrapSink() makes some sense:
>
>          return new Sink.ChainedValue<T>(sink) {
> ...
>              @Override
>              public void begin(int size) {
>                  downstream.begin(size < 0 ? size : size >= skip ? size - skip : 0);
>              }
>
> If stream size gets computed in this way, FLAG_SIZED in SkipOp.getStreamFlags() doesn't
> need to be cleared.
>
> Similarly for MapSkipOp.
>
> Regards
> Arne Siegel
>