Two questions about forEach

Ryan Musgrave ryanm128 at gmail.com
Wed Aug 8 22:54:49 PDT 2012


In your example:

bar.stream()
   .forEach( { f -> f.refresh();})
   .filter( Foo::IsActive )
   .findFirst();

I would expect the operations to apply in order. It shouldn't matter
whether or not they are lazy, eager, or if some steps can be optimised
out as long as the results are correct.

The important part is the semantics. A static analysis tool could
quite easily warn a developer that they are using an eager operation
in the middle of lazy operations on a stream.

I realise this is how C# does it, but I don't see why this needs to be
restricted.

The .into() example could be the same too:

foo.stream()
   .map(x -> x.convertToXml())
   .into(xmlResults)
   .map(XlstHelper::convertToFormatA)
   .filter(x -> x != null)
   .findFirst();

if(LOG.isDebug()) {
  xmlResults.LogResults();
}

Is the only reason that this is prevented because the goal is to
prevent bad practice/multiple iteration of a collection?

Regards,
Ryan

On Thu, Aug 9, 2012 at 10:47 AM, Mike Duigou <mike.duigou at oracle.com> wrote:
>
> On Aug 8 2012, at 03:51 , Jose wrote:
>
>> 2. Why forEach() returns void?. In this way the pipeline is sealed at the
>> end, no more pieces can be attached to it.
>>  It would better returning the incomming iterable (its elements maybe
>> modified by the lambda)
>
> The problem is that forEach(), like into(), is an eager operation. For consistency with other eager operations forEach "consumes" the elements. If you find you are wanting to put forEach() in the middle of a pipeline you *probably* want to be using map instead which is properly lazy. forEach in the middle of a pipeline would remove the opportunity for lazy behaviour later in the pipeline.
>
> Consider (imagining that forEach returned the stream) :
>     Set<ServerConnection> bar;
>
>     ServerConnection server = bar.stream().forEach(f-> f.refresh();}).filter(Foo::isActive).findFirst();
>
> This snippet starts with a pool of server connections, refreshes each connection, filters out the active ones, and returns the first active.
>
> Using forEach for the refresh() step means that *every* server is refreshed before even finding the first active server. This is probably a huge waste of effort. Rather than calling forEach it would be better to make the filter stage do the refresh (yes, it's a more complicated lambda, but the benefit is that only a smaller set of servers must be refresh()ed).
>
> If there are examples of forEach in the middle of a pipeline that do make sense to you, feel free to ask for refactoring suggestions. :-)
>
> Mike
>


More information about the lambda-dev mailing list