Re: Additional method on Stream

28 Apr 2015

      On Apr 28, 2015, at 10:22 AM, Kasper Nielsen <kasperni@gmail.com> wrote:
...
On Tue, Apr 28, 2015 at 9:16 AM, Paul Sandoz <paul.sandoz@oracle.com> wrote:
On Apr 27, 2015, at 10:34 PM, Kasper Nielsen <kasperni@gmail.com> wrote:
...
The other default function I would like to see is stream.toList() (I can
live with collectToList) which is short for s.collect(Collectors.toList()).
50 % of my terminal functions are s.collect(Collectors.toList()).
Can you live with a static import and:
s.collect(toList())
? which is rather close to "collectToList".
When designing j.u,s.Stream we made a conscious decision to not bind it to j.u collection types. A Stream could be integrated with other forms of collections (e.g. GS Collections).
First, if you are using any kind of modern IDE you have some kind of intelligent completion which will suggest collectToList the moment you press the 'c' in stream.c...
The other one you have to create an additional import, and invoke a static method (Yes I know most IDEs makes this easy as well)
But we are still talking about something like 1 second vs 5 seconds.
Not much in it then :-) perhaps IDEs might even be enhanced able to suggest collect(toList()) upfront? 

(After writing the above i saw your email with a link to IntelliJ Collector templates, neat!)
...
Second, s.collect(toList()) is just not naturally for most users. Sure readers on this list understands how the collect method works. But I don't think many novice/intermediate users will.
It's a trick that only has to be learnt once.
...
Third, Yes there are a lot of different collections that a stream can be integrated with. But we are talking about roughly 50 % of the usage.
And then someone wants Set, then Map etc. etc. We are also designing Stream for the future where perhaps we might have alternative collection types (not committing to that... :-)) and then someone wants that list type etc. etc.
...
Fourth, this hasn't actually anything to do with ease of use but performance.
Ah, so this is your actual motivation! masquerading in an ease-of-use disguise :-)

You raise a good point about the current limitations of Collectors not being able to leverage more information of the pipeline, such as known size.
...
But I have a very fast stream implementation where I would like to provide a fast (and easy) way to return the stream elements as a list. This is mainly in situations where I know the number of elements in the result (which is quite often if you don't use filters). By having a toList() method I can implement, I can avoid the array list resizings in s.collect(toList()). 
This is actually also why I would prefer if it was called toList() and not collectToList() as I think it is implementation detail how the list generation is done.
This does not mean we need to expose a Stream.toList.  How about we improve Collectors instead? [*]. 

For parallel execution we need to work out how better to merge lists and maps. Propagating size information at the root and for each leaf is important for pre-sizing. We can use growable arrays to avoid resizing.

--

Separately, the best way to expose a mechanism where you want to do your own stuff is to provide an operation SPI. We are just not ready to do that for 9 given what post 9 will bring.

Paul.

[*] We could of course take advantage of internal details, that might be a reasonable short-cut for 9, but it would be nice to solve this via a public API.