Into
Brian Goetz
brian.goetz at oracle.com
Wed Dec 26 10:38:54 PST 2012
Let's try to separate some things here.
There's lots of defending of into() because it is (a) useful and (b)
safe. That's all good. But let's see if we can think of these more as
functional requirements than as mandating a specific API (whether one
that happens to be already implemented, like into(), or the newly
proposed ones like toEveryKindOfCollection().)
Into as currently implemented has many negatives, including:
- Adds conceptual and API surface area -- destinations have to
implement Destination, the semantics of into are weird and unique to into
- Will likely parallelize terribly
- Doesn't provide the user enough control over how the into'ing is
done (seq vs par, order-sensitive vs not)
So let's step back and talk requirements.
I think the only clear functional requirement is that it should be easy
to accumulate the result of a stream into a collection or similar
container. It should be easy to customize what kind of collection, but
also easy to say "give me a reasonable default." Additionally, the
performance characteristics should be transparent; users should be able
to figure out what's going to happen.
There are lots of other nice-to-haves, such as:
- Minimize impact on Collection implementations
- Minimize magic/guessing about the user's intent
- Support destinations that aren't collections
- Minimize tight coupling of Stream API to existing Collection APIs
The current into() fails on nearly all of these.
At the risk of being a broken record, there are really two cases here:
- Reduce-like. Aggregate values into groups at the leaves of the
tree, and then combine the groups somehow. This preserves encounter
order, but has merging overhead. Merging overhead ranges from small
(build a conc-tree) to large (add the elements of the right subresult
individually to the left subresult) depending on the chosen data
structure.
- Foreach-like. Have each leaf shovel its values into a single shared
concurrent container (imagine a ConcurrentVector class.) This ignores
encounter order, but a well-written concurrent destination might be able
to outperform the merging behavior.
In earlier implementations we tried to guess between the two modes based
on the ordering charcteristics of the source and the order-preserving
characteristics of the intermediate ops. This is both risky and harder
for the user to control (hence hacks like .unordered()). I think this
general approach is a loser for all but the most special cases.
Since we can't read the user's mind about whether they care about
encounter order or not (e.g., they may use a List because there's no
Multiset implementation handy), I think we need to provide ways of
aggregating that let the user explicitly choose between order-preserving
aggregation and concurrent aggregation. I think having the word
"concurrent" in the code somewhere isn't a bad clue.
On 12/26/2012 1:02 PM, Remi Forax wrote:
> On 12/26/2012 06:07 PM, Doug Lea wrote:
>> On 12/26/12 11:52, Remi Forax wrote:
>>
>>> No, I think it's better to have only toList() and toSet(),
>>> the result of stream.sorted().toSet() will return a
>>> NavigableSet/SortedSet.
>>> The idea is that the method to* will choose the best implementation
>>> using the
>>> property of the pipeline.
>>>
>>> If you want a specific implementation, then use into().
>>
>> Sorry, I still don't buy it. If you want a specific implementation,
>> then my sense is that you will end up writing something like
>> the following anyway:
>>
>> Stream s = ...;
>> SomeCollection dest = ...
>> // add s to dest via (par/seq) forEach or loop or whatever
>
> again, letting people to do the copy will create a lot of non thread
> safe codes.
> I see forEach is a necessary evil, not as something that people should
> use every days.
>
>>
>> so why bother adding all the support code that people will probably
>> not use anyway in custom situations because, well, they are custom
>> situations. So to me, Reducers etc are in the maybe-nice-to-have
>> category.
>
> while I agree that custom reducers have to fly by themselves,
> we need to provide an operation that pull all elements from a parallel
> stream and put them in any collections in a thread safe manner that
> doesn't require 10 eyeballs to look at the code.
>
>>
>> -Doug
>>
>>
>
> Rémi
>
More information about the lambda-libs-spec-observers
mailing list