Additional method on Stream

Paul Sandoz paul.sandoz at oracle.com
Tue Apr 28 09:06:36 UTC 2015


On Apr 28, 2015, at 10:22 AM, Kasper Nielsen <kasperni at gmail.com> wrote:

> On Tue, Apr 28, 2015 at 9:16 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> On Apr 27, 2015, at 10:34 PM, Kasper Nielsen <kasperni at gmail.com> wrote:
> > The other default function I would like to see is stream.toList() (I can
> > live with collectToList) which is short for s.collect(Collectors.toList()).
> > 50 % of my terminal functions are s.collect(Collectors.toList()).
> 
> Can you live with a static import and:
> 
>   s.collect(toList())
> 
> ? which is rather close to "collectToList".
> 
> When designing j.u,s.Stream we made a conscious decision to not bind it to j.u collection types. A Stream could be integrated with other forms of collections (e.g. GS Collections).
> 
> First, if you are using any kind of modern IDE you have some kind of intelligent completion which will suggest collectToList the moment you press the 'c' in stream.c...
> The other one you have to create an additional import, and invoke a static method (Yes I know most IDEs makes this easy as well)
> But we are still talking about something like 1 second vs 5 seconds.
> 

Not much in it then :-) perhaps IDEs might even be enhanced able to suggest collect(toList()) upfront? 

(After writing the above i saw your email with a link to IntelliJ Collector templates, neat!)


> Second, s.collect(toList()) is just not naturally for most users. Sure readers on this list understands how the collect method works. But I don't think many novice/intermediate users will.
> 

It's a trick that only has to be learnt once.


> Third, Yes there are a lot of different collections that a stream can be integrated with. But we are talking about roughly 50 % of the usage.
> 

And then someone wants Set, then Map etc. etc. We are also designing Stream for the future where perhaps we might have alternative collection types (not committing to that... :-)) and then someone wants that list type etc. etc. 


> Fourth, this hasn't actually anything to do with ease of use but performance.

Ah, so this is your actual motivation! masquerading in an ease-of-use disguise :-)

You raise a good point about the current limitations of Collectors not being able to leverage more information of the pipeline, such as known size. 


> But I have a very fast stream implementation where I would like to provide a fast (and easy) way to return the stream elements as a list. This is mainly in situations where I know the number of elements in the result (which is quite often if you don't use filters). By having a toList() method I can implement, I can avoid the array list resizings in s.collect(toList()). 
> This is actually also why I would prefer if it was called toList() and not collectToList() as I think it is implementation detail how the list generation is done.
> 

This does not mean we need to expose a Stream.toList.  How about we improve Collectors instead? [*]. 

For parallel execution we need to work out how better to merge lists and maps. Propagating size information at the root and for each leaf is important for pre-sizing. We can use growable arrays to avoid resizing.

--

Separately, the best way to expose a mechanism where you want to do your own stuff is to provide an operation SPI. We are just not ready to do that for 9 given what post 9 will bring.

Paul.

[*] We could of course take advantage of internal details, that might be a reasonable short-cut for 9, but it would be nice to solve this via a public API.



More information about the core-libs-dev mailing list