size computation in SkipOp
Brian Goetz
brian.goetz at oracle.com
Sat Sep 22 07:53:18 PDT 2012
Thanks for reading carefully!
Its unfortunately not quite as simple as this. While these things are
still in flux (and poorly documented), the current interpretation of
SIZED for an intermediate operation is not just "knows its size", but
"preserves the upstream size."
Backing up, the reason it is valuable to know the size is so that you
can exact-size the target in pipelines like:
list.stream().map(...).toArray()
and avoid an array copy. Since list knows its size, and map preserves
that size, this is a possibility. This trick also works in the parallel
case if we have perfect information about the decomposition; if we know
the exact offset and size of each chunk, we can allocate a big correctly
sized array at the beginning of the operation, and instruct each chunk
to write the mapped results into the exact right offset of the target
array. Big win. And for common cases (such as arrays, ArrayList, and
some balanced trees) we do have predictable decomposition.
For the trick you suggest to work, there needs to be an additional way
to ask the operation 'what would your size be when done'. Currently, we
don't have that, so we're currently restricting ourselves to
interpreting SIZED as size-preserving. This is something on our list to
explore, but we're wary of introducing extra up-front computation into
the pipeline setup, because that's all on the wrong size of Amdahl's Law
(increasing the serial fraction.)
On 9/22/2012 8:28 AM, Arne Siegel wrote:
> I'd expect the following computation in SkipOp.wrapSink() makes some sense:
>
> return new Sink.ChainedValue<T>(sink) {
> ...
> @Override
> public void begin(int size) {
> downstream.begin(size < 0 ? size : size >= skip ? size - skip : 0);
> }
>
> If stream size gets computed in this way, FLAG_SIZED in SkipOp.getStreamFlags() doesn't
> need to be cleared.
>
> Similarly for MapSkipOp.
>
> Regards
> Arne Siegel
>
More information about the lambda-dev
mailing list