Collecion as collector

Paul Sandoz paul.sandoz at oracle.com
Fri Apr 19 03:42:26 PDT 2013


On Apr 19, 2013, at 11:55 AM, Jose <jgetino at telefonica.net> wrote:

> Brian, thanks for your comments and your time.
> 
> It's clear that the collector generated by the framework is safe por
> parallel proccesing and it should be the first choice when there is no a
> collection around. 
> But on the other hand I feel that overprotecting the programmer from himself
> can cause tensions in the framework that makes the API less fluent.
> 
> For this case in particular:
> 
> - I think it should be the programmer's resposability to use a concurrent
> collection if he wants to collect a parallel stream. 
>  I already have to do it in many places of my code that has nothing to do
> with streams. At the very moment I get a 
>  ConcurrentModificationException I know where the problem is.
> 

The framework *manages* the concurrency when collecting a parallel stream using reduction. That is quite an important thing to do. 

Consistent results are produced when collecting sequential and parallel pipelines, and the developer does not need to be concerned with threads and concurrency.


> - All the limitations imposed to the API to prevent a buggy recollection of
> the elments of the stream would fail in their objetive if the user writes
> this
>  very simple code:
> 
>                     stream.forEach{myCollection::add}.  
> 

forEach is special and documented as such, so has to be used with care. In this case it is the programmers responsibility to do the right thing as you stated above.

--

There are two general parallel evaluation concepts:

- reduction/folding (we call collecting), that can safely operate on non-concurrent collections and preserve encounter order; and

- forEach'ing where by elements are concurrently reported, in no particular order, to the consumer.

Paul.

> I want to note that I have nothing against the Collector interface, at
> contrary, once undertod the concept I find it is a fundamental abstraction,
> at the same level of Iterable. 
> Now, in my mental representation of a Collection, I see an Iterable to get
> elements form it and a Collector to put them inside.
> 
> 
> 
> 
> 
> 
> 
> -----Mensaje original-----
> De: Brian Goetz [mailto:brian.goetz at oracle.com] 
> Enviado el: jueves, 18 de abril de 2013 23:29
> Para: Jose
> CC: lambda-dev at openjdk.java.net
> Asunto: Re: Collecion as collector
> 
> We started here (with into(collection)) and discovered that approach had
> many flaws.
> 
> To name one, there's no way to make it parallel without guessing at the
> thread-safety of the target.
> 
> A Collector embodies information as to how to *create* a target collection.
> In a parallel reduction, it may in fact create multiple smaller collections,
> and then merges them into one, which can be done safely even if the
> collections are not thread-safe (due to serial
> thread-confinement.)  This would be a sequential-only idiom, and we've
> worked very hard to make all the operations on streams work well either
> sequentially or parallel.
> 
> On 4/18/2013 5:07 PM, Jose wrote:
>> 
>> I'm wonder why the Collection interface don't extend Collector using 
>> default methods. Al least a collection is the most obvious Collector 
>> you can imagine.
>> 
>> This would allow adding elements to an existing collection using a 
>> straightforward idiom:
>> 
>> Collection myCollection=..
>> 
>> stram1.collect(myCollection);
>> stram1.collect(myCollection);
>> 
>> I have done this in custom classes that wrap a Collection and I feel 
>> it makes code more readable.
>> 
>> 
>> 
>> 
> 
> 



More information about the lambda-dev mailing list