RFR: 8180352: Add Stream.toList() method
Brian Goetz
brian.goetz at oracle.com
Fri Feb 5 15:27:49 UTC 2021
> I have been reading previous threads, the original bug request, and exploring the javadoc and implementation of toList() on Stream in JDK 16. I don’t want to waste time rehashing previous discussions, but I want to understand and prioritize the motivation of this change, and propose what I believe is a safer alternative name for this method based on the current implementation: Stream.toUnmodifiableList().
Big +1 to "let's not rehash previous discussions, but help us understand
the motivation." Stewarding the core libraries is a complex task, and
there are rarely hard-and-fast rules for doing the Right Thing.
Your question seems to have two main aspects:
- Why this method, why not others, and why now
- Why take such a strong anti-mutability position with this method
The desire for a Stream::toList method has a long history; when we first
did streams, it was one of the first convenience methods to be
"requested". We resisted then, for good reasons, but we knew this saga
was not over.
"Convenience" methods are a constant challenge in the JDK. On the one
hand, they are, well, convenient, and we want Java to be easy and
pleasant to program in. On the other, the number of potentially-useful
imaginable convenience methods is infinite, and the widespread
perception is that they are so easy, that all that is needed is for
someone to propose the idea. (The (admittedly soft) criteria we use for
judging whether a convenience method meets the bar is an interesting
one, which we can have separately.)
There are basically two stable points with respect to convenience
methods in API design; zero tolerance, and "don't worry, be happy". In
the former, the methods of an API are like a basis (ideally, an
orthonormal one) of a vector space; the minimum number of API points
from which you can derive all possible usages. At the other extreme,
every reasonable combination of methods gets its own special form of
expression. Of course, both are extremes (Stream::count and
IntStream::sum are conveniences for reduce, and even Haskell's Monad has
multiple ways to represent bind), but APIs tend to align themselves in
one direction or another. And, as the JDK APIs go, Streams treats
sparsity and orthogonality as virtues to be striven for.
Eclipse Collections chooses a different (and also valid!) philosophy:
completeness, and it walks the walk. (Having 81 (template-generated)
implementations of HashMap is proof.) Similarly, Tagir's StreamEx is an
example of an extension to Stream that takes the other approach. And
both are great! But also, they are not how the JDK rolls. Which is
fine; it's a big ecosystem, and there's room for multiple philosophies,
and each can find its fans and detractors.
The calls for a convenience for Stream::toList have come pretty much
continuously since we first resisted it (but, we knew even then that if
we had a lifetime budget for just one convenience method, it would end
up being toList.) We knew then that there would be questions to ask
about what the ideal dial settings would be for toList, and were not yet
ready to confront the question, nor did we want to add fuel to the
demands for more convenience methods ("No toSet? Inconsistent!")
When an API is new, and all things are possible, we tend to be in
"imagine everything we could put into it" mode, and streams was no
different. It is wise to resist this temptation -- and maybe even
over-rotate in the other direction -- to allow for some time for the
spirit of what you've built to make itself clear; even creators are not
always immediately clear on the nuances of their creation. So we tried
hard to resist the calls for unnecessary methods, knowing that they
could always be added, but not taken away, and also, allowing for the
true gaps to emerge from usage. (The first method to be added,
takeWhile(), was the very opposite of a convenience; it represented a
reasonable use case that the original design didn't support.)
So, why toList now? Well, a number of reasons. Collecting to a list is
one of the most common terminal operations, so any small irritant (like
a clumsy locution) adds up. And, as has been pointed out, it can be
more efficient if it is brought into the stream core rather than held at
arm's length through Collector. So if we're going to compromise our
principles in one place, after thinking about it for a long time, this
seemed a worthy candidate. (And still, we hesitate, because we knew it
would be firing the starting gun for the "But where's toSet?" arguments.)
So yes, there are lots of good reasons to continue to Just Say No to
conveniences, but, there are also reasonable times to make exceptions --
especially when it is not purely about convenience. And, data suggests
that toList is 5-10x more popular than the next most popular collector,
so there's a clear argument to say that toList is pretty special, and we
can stop there.
> List is a mutable interface.
This is true to an extent (though even the specification of List makes
it clear the mutative methods are strictly optional), but even if it
were absolutely true, I am still not sure how relevant it is to what
streams should do. When I wrote Collectors::toList, ArrayList was
indeed the obvious default implementation choice -- but it was also
obviously not a very good choice. We didn't have an efficient
unmodifiable collection at the time, and wrapping with unmodifiableList
seemed like taxing a lot of well-behaved users for the would-be sins of
the few. But if we had efficient unmodifiable collections then, I would
absolutely, positively have made that choice.
Streams is an API that takes functional principles to heart, sometimes
even in ways that are uncomfortable to Java developers. (For example, it
imposes constraints on the lambdas we pass to its methods, which are the
Java analogues of purity and side-effect freedom -- which are not
necessarily familiar constraints.) Data structures are about managing
and organizing data in memory, but streams are about capturing and
composing behavior, not data. (Obviously, streams consume and produce
data at their extreme points, but it tries to make the fewest possible
assumptions about the form that data takes.) Where Stream meets List,
Stream is allowed to have an opinion about what kinds of lists it likes
better, and an unmodifiable list seems far more in the spirit of
Streams. And of course, collect(toCollection(f)) lets you collect to
whatever sort of collection you like.
> A convention was established in 2014 with Collectors.toList() returning a mutable List (ArrayList).
I am having a hard time expressing just how much I disagree with the
sentiment behind this claim. I knew, when I was writing
Collectors::toList, that I would someday be having this discussion; my
best efforts to head this discussion off were memorialized in the
specification for Collectors::toList:
> There are no guarantees on the type, mutability, serializability, or
> thread-safety of the|List|returned; if more control over the
> returned|List|is required, use|toCollection(Supplier)|
> <https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toCollection-java.util.function.Supplier->.
I'd hope this would be interpreted as: "Dear developer who assumes that,
just because this returns ArrayList today, that somehow it is reasonable
to assume toList will always return an ArrayList: you are wrong, and I
hope you have the good sense to never make this argument out loud."
The reason that this "reasonable-seeming" assumption -- that what the
first implementation does is reasonable to take as normative, even when
the spec says otherwise -- is so toxic, is that it cripples the ability
of the platform to evolve. There's a reason we write specifications for
APIs; because implementations are intrinsically accidental and
contextual, and context changes out from under us. Even when writing
it, I was aware of the degree to which programmers would be
overwhelmingly tempted (despite what I hoped was their better judgment)
to count on the mutability of the returned list if that is what they
wanted. Saying `toCollection(ArrayList::new)`, which guarantees exactly
the characteristics such users would want, is Just Not That Hard. Sure,
saying toList() is easier, but the tradeoff there is you accept whatever
(compliant) List the library wants to serve up, and the library gets
some say in what that is, and which might even vary from tuesday to
wednesday. A toList() method should try to balance the competing
concerns for what is the most reasonable default, and when the JDK
improves in a way that shifts that balance, or the context shifts, the
JDK should be able to improve with it.
So, this "establish a convention" claim is dangerous because it pushes
us towards the assumption that everything the JDK does, even the things
it *clearly specifies as implementation details that might change*, can
never change. Which means we would have to be *even more deliberate*
about anything we do, which means the rate at which we can move forward
is *even slower*.
But, you are making an even stronger claim than that! We're not trying
to change the implementation of Collectors::toList (which the spec makes
clear should might happen.) We're adding _another_ method with that
name, somewhere else. Which makes the above argument even more
dangerous -- essentially, it says "don't use a word in any API ever,
unless you are prepared to interpret it exactly the same way in all
future contexts." Surely, you see how this doesn't lead to a world we
want to live in.
So, what should `Stream::toList` mean? it should mean: return whatever
kind of list that Streams thinks is the best all-around default
implementation to use, based on the best understanding of what typical
users want. This involves balancing a lot of things, and that balance
can move over time.
We could call this toUnmodifiableList, and there's surely a certain
logic to that. But, this is likely to have unintended consequences.
First, the fact that the name is fussier makes it even less attractive
as a convenience, which is an argument to not do it at all. Users who
mostly count characters (which is sadly common) would be more likely to
continue to use collect(toList()), even if the new method is better in
multiple ways. If we have Stream::toUnmodifiableList, it is *even more
likely* to generate demands for other toXxxList conveniences. Worse, it
would likely generate arguments for a toList that works the same as
collect(toList()) -- which takes an existing "accidental mutability"
problem and guarantees that problem into the infinite future. It's bad
enough that collect(toList()) yields a mutable list -- it would be even
worse for Stream::toList to do the same. Most users don't need
mutability, and are better off not getting it if they don't need it;
they should ask for it if they need it.
> [1] Example usages of Eclipse Collections toList:
>
>
> // toList result is mutable for all of these usages with Eclipse Collections
> List list1 = mutableSet.toList();
> List list2 = mutableSet.asLazy().toList();
> List list3 = mutableSet.asParallel(Executors.newWorkStealingPool(), 10).toList();
> List list4 = mutableSet.stream().collect(Collectors.toList());
> List list5 = mutableSet.stream().collect(Collectors2.toList());
These are nice, but there's a subtle difference here that is salient.
Eclipse Collections attempts to integrate data management and behavioral
composition into a single library. This is a fine goal, but it does
mean that the behavioral methods have more responsibility to fit with
the data-management side of the story.
Streams took an almost opposite interpretation -- one reason NOT to do a
Stream::toList method was that it overly coupled Streams to
Collections. Laundering stream-to-List via a specific collector (which
is clearly more of a "plug in" than core functionality) seemed
preferable. We chose more of an arms-length relationship between Stream
and Collections. Again, different philosophies. (Adding Stream::toList
goes back on that a bit, after thinking about it for a bunch of years,
and deciding it was OK in this case.)
The primary cost here is a seeming "inconsistency", because people have
been able to convince themselves that `toList()` means "to ArrayList",
and now, there will be cases where that is not true. Given the choice
between catering to explicitly wrong assumptions (the spec even says
"don't make this assumption"!), and improving the platform over time, I
choose the latter. Consistency is a good baseline goal, but
consistencies can be taken to foolish extremes.
More information about the core-libs-dev
mailing list