New convenience methods on Stream
Stuart Marks
stuart.marks at oracle.com
Wed May 5 06:12:10 UTC 2021
Hi Don,
When evaluating new APIs I always find it helpful and educational to look for cases
in real code where they might be applied, and what effect the API has at the call
site. The search need not be exhaustive, but it's probably sufficient to find a
number of representative examples. This does take some effort, though. For now I'll
take a look at some examples where your first item can be applied:
> 1. Stream contents into a mutable collection created by a Supplier.
>
> default <R extends Collection<T>> R toCollection(Supplier<R> supplier)
> {
> return this.collect(Collectors.toCollection(supplier));
> }
>
> Usage Examples:
>
> HashSet<String> set = stream.toCollection(HashSet::new);
> TreeSet<String> sortedSet = stream.toCollection(TreeSet::new);
> ArrayDeque<String> deque = stream.toCollection(ArrayDeque::new);
Since I have the JDK handy I searched it for 'toCollection('. There are around 60
hits -- but note that 2/3 of these are in java.sql.rowset and refer to an unrelated
API. Some are in comments, and some are the implementation of
Collectors.toCollection itself, which leaves just 14 cases. Let's look at a few.
# src/jdk.compiler/share/classes/com/sun/tools/javac/api/JavacTaskPool.java:164
List<String> opts =
StreamSupport.stream(options.spliterator(), false)
.collect(Collectors.toCollection(ArrayList::new));
Using the proposed API here would result in:
List<String> opts =
StreamSupport.stream(options.spliterator(), false)
.toCollection(ArrayList::new));
This makes the code a little bit nicer. A static import would also haved helped,
though not quite as much as the new API:
List<String> opts =
StreamSupport.stream(options.spliterator(), false)
.collect(toCollection(ArrayList::new)));
I also note that after some analysis of the usage of the resulting List, it's never
modified -- indeed, it's used as the key of a Map -- so this could be replaced with
the recently-added Stream::toList.
# src/jdk.compiler/share/classes/com/sun/tools/javac/main/Option.java:381
Set<String> platforms = StreamSupport.stream(providers.spliterator(),
false)
.flatMap(provider ->
StreamSupport.stream(provider.getSupportedPlatformNames()
.spliterator(),
false))
.collect(Collectors.toCollection(LinkedHashSet :: new));
(Sorry for line wrapping. This file has some long lines.) Again, using the proposal
API would shorten things a bit, but it doesn't really make much difference within
the overall context:
Set<String> platforms = StreamSupport.stream(providers.spliterator(),
false)
.flatMap(provider ->
StreamSupport.stream(provider.getSupportedPlatformNames()
.spliterator(),
false))
.toCollection(LinkedHashSet :: new));
# src/java.logging/share/classes/java/util/logging/LogManager.java:2138
final Map<String, TreeSet<String>> loggerConfigs =
allKeys.collect(Collectors.groupingBy(ConfigProperty::getLoggerName,
TreeMap::new,
Collectors.toCollection(TreeSet::new)));
This is an interesting case, as the toCollection() method is being used to produce a
"downstream" collector passed to groupingBy(). Since the proposed API is on stream,
it can't be used here. There are a few other cases like this where toCollection is
used as a downstream collector, not as an argument to Stream::collect.
#
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/HtmlConfiguration.java:377
public List<DocPath> getAdditionalStylesheets() {
return options.additionalStylesheets().stream()
.map(ssf -> DocFile.createFileForInput(this, ssf))
.map(file -> DocPath.create(file.getName()))
.collect(Collectors.toCollection(ArrayList::new));
}
This is another place where the proposed API can be used straightforwardly:
public List<DocPath> getAdditionalStylesheets() {
return options.additionalStylesheets().stream()
.map(ssf -> DocFile.createFileForInput(this, ssf))
.map(file -> DocPath.create(file.getName()))
.toCollection(ArrayList::new));
}
#
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/toolkit/util/IndexBuilder.java:220
This is a slightly different case, as it uses a lambda to pass a comparator to a
constructor instead of using a constructor reference. Before:
public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
Objects.requireNonNull(kind);
return itemsByCategory.getOrDefault(IndexItem.Category.TAGS,
Collections.emptySortedSet()).stream()
.filter(i -> i.isKind(kind))
.collect(Collectors.toCollection(() -> new TreeSet<>(mainComparator)));
}
After:
public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
Objects.requireNonNull(kind);
return itemsByCategory.getOrDefault(IndexItem.Category.TAGS,
Collections.emptySortedSet()).stream()
.filter(i -> i.isKind(kind))
.toCollection(() -> new TreeSet<>(mainComparator)));
}
*****
There are a few other cases in the JDK but they don't seem to lead to any new insights.
Some observations.
- Using this API saves 19 characters compared to Collectors::toCollection, but it
saves only eight characters compared to Collectors::toCollection with a static import.
- Using this API doesn't relieve the calling code of any burden of tedious or
error-prone code. The code it replaces is quite straightforward.
- There don't appear to be any opportunities for optimization. In order to handle
the parallel case, this pretty much is required to delegate to the collector. I
suppose the serial case could be handled specially, but it boils down to
constructing the destination and then calling add() on it repeatedly, which is
pretty much what the collector ends up doing in the serial case anyway.
- Cases seem to occur quite infrequently compared to others such as
Collectors::toList and Stream::toList.
- Some cases of Collectors::toCollection are used as "downstream" collectors, that
is, passed to other collectors instead of Stream::collect. This narrows the range of
possible uses of the API still further.
- There is a recurring pattern
Collectors.toCollection(ArrayList::new)
This is useful in place of Collectors::toList for places where the code wants to
guarantee the result to be an ArrayList. (Even though Collectors::toList returns an
ArrayList, it isn't specified to do so.) But there are cases (such as the one I
looked at above) where the return list isn't actually modified -- and indeed it
would be an error if it were modified -- so Stream::toList could be used just as
well for those.
- The JDK is not necessarily a representative code base, but frequencies here do
seem to mirror what I've seen in open source: Collectors::toCollection is much less
frequent than Collectors::toList.
- There doesn't appear to be any semantic difference between the proposed
Stream::toCollection and the existing Collectors::toCollection.
Based on these observations I'm having a hard time mustering much enthusiasm for
this API.
You might ask, hasn't the JDK added other convenience APIs? There have probably been
a few one-liners, but we are really trying to keep them to a minimum. Mainly,
convenience APIs are indeed convenient, but in many cases they add a lot of value in
other ways as well. Here are some examples.
- Stream::toList. We discussed this recently, so I won't restate everything.
Briefly, though, this can be used as a replacement for Collectors::toList, which is
used VERY frequently, it provides stronger semantics, and it's faster because it
avoids extra array allocation and copying.
- String::repeat. Repeating a String is a simple for-loop. However, if you look at
the implementation [1] there really is a lot going on here. It's a lot faster than
the straightforward code, because it peels off a few special cases, it uses a clever
doubling algorithm to call arraycopy a minimal number of times, and it deals with
things at the byte level, so it can create a String without any copying or any
codeset conversion. In addition to convenience, the value here is that it's much
faster than a simple for-loop that one might write instead. It also leverages the
JDK-internal String representation, which means that it does less allocation and
copying than other utility libraries would.
[1]
https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/lang/String.java#L3560
- InputStream::transferTo. This is mostly a straightforward loop [2], but the
details are devilishly hard to get right. If you look at user code that does copying
like this, it usually gets some edge case wrong, for example, not handling partial
reads. The value provided here is not that it's faster than the code it would
replace, but that it relieves the caller of the responsibility of writing a loop
that's easy to get wrong (or to form a dependency on another library that has this
utility method).
[2]
https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/io/InputStream.java#L777
Anyway, this is the sort of analysis and justification that I'd like to see for
convenience APIs. Such APIs need to be more than just a shorthand for something a
bit longer.
s'marks
More information about the core-libs-dev
mailing list