Two questions about forEach

Thu Aug 9 00:22:47 PDT 2012

On Aug 8 2012, at 22:54 , Ryan Musgrave wrote:

> In your example:
> 
> bar.stream()
>   .forEach( { f -> f.refresh();})
>   .filter( Foo::IsActive )
>   .findFirst();
> 
> I would expect the operations to apply in order. It shouldn't matter
> whether or not they are lazy, eager, or if some steps can be optimised
> out as long as the results are correct.

Does every item get refreshed in order to complete findFirst? Do you really want findFirst to inspect every element? The semantics and efficiency are important. forEach is intended to provide the semantic which matches it's name, the lambda is executed for each item unconditionally and eagerly. I don't want to relie upon static analysis tools to "maybe" find this problem. There are other fluent alternatives to using forEach in the middle of a stream.

If you do have examples of situation you would use a lazy forEach (not a map or a filter) I'd like to hear them. In general though we're trying to structure the api such that both the sources of elements and the elements are immutable through all sequences of operations.

I would like to be able to return something from forEach but the source stream ain't it. Originally forEach did return the stream but that was only because methods with void returns make me "itchy"--methods should return *something* if they can do so cheaply rather than void. Returning the stream obscures the eager nature of forEach. Element count has been suggested since it would be quite cheap and potentially useful.  

Another option is to abuse map() by providing a mapper that returns the same object it is provided but also mutates them as desired much like what a lazy forEach might do.

> The important part is the semantics. A static analysis tool could
> quite easily warn a developer that they are using an eager operation
> in the middle of lazy operations on a stream.

Typing, ie. not returning a stream, has the same effect. forEach is as consistent as we can make it with the other eager ops.

> The .into() example could be the same too:
> 
> foo.stream()
>   .map(x -> x.convertToXml())
>   .into(xmlResults)
>   .map(XlstHelper::convertToFormatA)
>   .filter(x -> x != null)
>   .findFirst();
> if(LOG.isDebug()) {
>  xmlResults.LogResults();
> }
> 
> Is the only reason that this is prevented because the goal is to
> prevent bad practice/multiple iteration of a collection?

into()'s return result is intended for collecting the result of a new operation. So you say:

List<XmlDocument> docs = foo.stream()
  .map(x -> x.convertToXml())
  .into(ArrayList<>::new);

XmlDocument someDoc = docs.stream().map(XlstHelper::convertToFormatA)
  .filter(x -> x != null)
  .findFirst().getOrThrow(() -> new NoSuchElementException);

Having to do the ArrayList creation in a separate step before the stream may be slightly annoying. If you prefer a single long statement you can always do:

List<XmlDocument> docs = new ArrayList<>();
XmlDocument someDoc =  foo.stream()
  .map(x -> x.convertToXml())
  .into(docs)
  .stream() // turn result back into a stream.
  .map(XlstHelper::convertToFormatA)
  .filter(x -> x != null)
  .findFirst().getOrThrow(() -> new NoSuchElementException);

This form has "less meat, more bun" it's more verbose and still two statements. This will work but I don't imagine people will use it.

Mike

> 
> Regards,
> Ryan
> 
> On Thu, Aug 9, 2012 at 10:47 AM, Mike Duigou <mike.duigou at oracle.com> wrote:
>> 
>> On Aug 8 2012, at 03:51 , Jose wrote:
>> 
>>> 2. Why forEach() returns void?. In this way the pipeline is sealed at the
>>> end, no more pieces can be attached to it.
>>> It would better returning the incomming iterable (its elements maybe
>>> modified by the lambda)
>> 
>> The problem is that forEach(), like into(), is an eager operation. For consistency with other eager operations forEach "consumes" the elements. If you find you are wanting to put forEach() in the middle of a pipeline you *probably* want to be using map instead which is properly lazy. forEach in the middle of a pipeline would remove the opportunity for lazy behaviour later in the pipeline.
>> 
>> Consider (imagining that forEach returned the stream) :
>>    Set<ServerConnection> bar;
>> 
>>    ServerConnection server = bar.stream().forEach(f-> f.refresh();}).filter(Foo::isActive).findFirst();
>> 
>> This snippet starts with a pool of server connections, refreshes each connection, filters out the active ones, and returns the first active.
>> 
>> Using forEach for the refresh() step means that *every* server is refreshed before even finding the first active server. This is probably a huge waste of effort. Rather than calling forEach it would be better to make the filter stage do the refresh (yes, it's a more complicated lambda, but the benefit is that only a smaller set of servers must be refresh()ed).
>> 
>> If there are examples of forEach in the middle of a pipeline that do make sense to you, feel free to ask for refactoring suggestions. :-)
>> 
>> Mike
>>