Suggestion: buffer(int size) as the bulk operation

Aleksey Shipilev aleksey.shipilev at oracle.com
Tue Sep 18 13:06:22 PDT 2012


Hi Paul,

On 09/18/2012 04:33 PM, Paul Sandoz wrote:
> Is it a good idea for the developer to select a size, will they ever
> select a good size?

Yes, I think it is a good idea. We wouldn't go away and recommend using
this in all the pipelines, but it would come handy in corner cases.

>> BTW, this suggestion really shines if we can then have limited
>> push traversal in the framework. That means, forEach which would
>> dump not the entire stream contents, but only the next $limit
>> elements. In this example we can then do limited forEach(10) pushes
>> to buffer, and the let buffer provide us "lazy" pulls, doing bulk
>> forEach(10) each time it drains out.
>> 
> 
> Not sure i quite understand, can you provide a more explicit example?

Ok, here's what I was thinking.

 stream.map(op).into(...);

Imagine is $op is rather hard to compute, but it can exploit some
locality if executed in bulk. I.e. reuses the large chunk of memory, or
even queries application LRU cache. Then, we have the benefit of using
pushes and forEach() into the destination.

Now the tricky part. Assume you are dealing with a very large stream
(i.e. network pipe, large file, etc), where you can't afford to maintain
the destination in memory. In this case, you want to go lazy:

 stream.map(op).iterator();

...and poll elements one by one. Now, you have the trouble of exploiting
the locality in $op. What is suggested is having:

 stream.map(op).buffer($size).iterator();

...and buffer will eagerly process chunks of $size elements, and then
get those to iterator. Once iterator drains all the $size elements,
buffer early refills with $size new elements. This buffer().iterator()
subchain can probably be folded in buffered iterator.

> How would one synchronise the push to pull?

Probably the push is not indeed required.

-Aleksey.


More information about the lambda-dev mailing list