Into

Brian Goetz brian.goetz at oracle.com
Wed Dec 26 10:38:54 PST 2012


Let's try to separate some things here.

There's lots of defending of into() because it is (a) useful and (b) 
safe.  That's all good.  But let's see if we can think of these more as 
functional requirements than as mandating a specific API (whether one 
that happens to be already implemented, like into(), or the newly 
proposed ones like toEveryKindOfCollection().)

Into as currently implemented has many negatives, including:
  - Adds conceptual and API surface area -- destinations have to 
implement Destination, the semantics of into are weird and unique to into
  - Will likely parallelize terribly
  - Doesn't provide the user enough control over how the into'ing is 
done (seq vs par, order-sensitive vs not)

So let's step back and talk requirements.

I think the only clear functional requirement is that it should be easy 
to accumulate the result of a stream into a collection or similar 
container.  It should be easy to customize what kind of collection, but 
also easy to say "give me a reasonable default."  Additionally, the 
performance characteristics should be transparent; users should be able 
to figure out what's going to happen.

There are lots of other nice-to-haves, such as:
  - Minimize impact on Collection implementations
  - Minimize magic/guessing about the user's intent
  - Support destinations that aren't collections
  - Minimize tight coupling of Stream API to existing Collection APIs

The current into() fails on nearly all of these.

At the risk of being a broken record, there are really two cases here:

  - Reduce-like.  Aggregate values into groups at the leaves of the 
tree, and then combine the groups somehow.  This preserves encounter 
order, but has merging overhead.  Merging overhead ranges from small 
(build a conc-tree) to large (add the elements of the right subresult 
individually to the left subresult) depending on the chosen data 
structure.

  - Foreach-like.  Have each leaf shovel its values into a single shared 
concurrent container (imagine a ConcurrentVector class.)  This ignores 
encounter order, but a well-written concurrent destination might be able 
to outperform the merging behavior.

In earlier implementations we tried to guess between the two modes based 
on the ordering charcteristics of the source and the order-preserving 
characteristics of the intermediate ops.  This is both risky and harder 
for the user to control (hence hacks like .unordered()).  I think this 
general approach is a loser for all but the most special cases.

Since we can't read the user's mind about whether they care about 
encounter order or not (e.g., they may use a List because there's no 
Multiset implementation handy), I think we need to provide ways of 
aggregating that let the user explicitly choose between order-preserving 
aggregation and concurrent aggregation.  I think having the word 
"concurrent" in the code somewhere isn't a bad clue.



On 12/26/2012 1:02 PM, Remi Forax wrote:
> On 12/26/2012 06:07 PM, Doug Lea wrote:
>> On 12/26/12 11:52, Remi Forax wrote:
>>
>>> No, I think it's better to have only toList() and toSet(),
>>> the result of stream.sorted().toSet() will return a
>>> NavigableSet/SortedSet.
>>> The idea is that the method to* will choose the best implementation
>>> using the
>>> property of the pipeline.
>>>
>>> If you want a specific implementation, then use into().
>>
>> Sorry, I still don't buy it. If you want a specific implementation,
>> then my sense is that you will end up writing something like
>> the following anyway:
>>
>>   Stream s = ...;
>>   SomeCollection dest = ...
>>   // add s to dest via (par/seq) forEach or loop or whatever
>
> again, letting people to do the copy will create a lot of non thread
> safe codes.
> I see forEach is a necessary evil, not as something that people should
> use every days.
>
>>
>> so why bother adding all the support code that people will probably
>> not use anyway in custom situations because, well, they are custom
>> situations. So to me, Reducers etc are in the maybe-nice-to-have
>> category.
>
> while I agree that custom reducers have to fly by themselves,
> we need to provide an operation that pull all elements from a parallel
> stream and put them in any collections in a thread safe manner that
> doesn't require 10 eyeballs to look at the code.
>
>>
>> -Doug
>>
>>
>
> Rémi
>


More information about the lambda-libs-spec-observers mailing list