New convenience methods on Stream

Stuart Marks stuart.marks at oracle.com
Wed May 5 06:12:10 UTC 2021


Hi Don,

When evaluating new APIs I always find it helpful and educational to look for cases 
in real code where they might be applied, and what effect the API has at the call 
site. The search need not be exhaustive, but it's probably sufficient to find a 
number of representative examples. This does take some effort, though. For now I'll 
take a look at some examples where your first item can be applied:

> 1. Stream contents into a mutable collection created by a Supplier. 
> 
> default <R extends Collection<T>> R toCollection(Supplier<R> supplier)
> {
>     return this.collect(Collectors.toCollection(supplier));
> }
> 
> Usage Examples:
> 
> HashSet<String> set = stream.toCollection(HashSet::new);
> TreeSet<String> sortedSet = stream.toCollection(TreeSet::new);
> ArrayDeque<String> deque = stream.toCollection(ArrayDeque::new);

Since I have the JDK handy I searched it for 'toCollection('. There are around 60 
hits -- but note that 2/3 of these are in java.sql.rowset and refer to an unrelated 
API. Some are in comments, and some are the implementation of 
Collectors.toCollection itself, which leaves just 14 cases. Let's look at a few.


# src/jdk.compiler/share/classes/com/sun/tools/javac/api/JavacTaskPool.java:164


         List<String> opts =
                 StreamSupport.stream(options.spliterator(), false)
                              .collect(Collectors.toCollection(ArrayList::new));

Using the proposed API here would result in:

         List<String> opts =
                 StreamSupport.stream(options.spliterator(), false)
                              .toCollection(ArrayList::new));

This makes the code a little bit nicer. A static import would also haved helped, 
though not quite as much as the new API:

         List<String> opts =
                 StreamSupport.stream(options.spliterator(), false)
                              .collect(toCollection(ArrayList::new)));

I also note that after some analysis of the usage of the resulting List, it's never 
modified -- indeed, it's used as the key of a Map -- so this could be replaced with 
the recently-added Stream::toList.


# src/jdk.compiler/share/classes/com/sun/tools/javac/main/Option.java:381


             Set<String> platforms = StreamSupport.stream(providers.spliterator(), 
false)
                                                  .flatMap(provider -> 
StreamSupport.stream(provider.getSupportedPlatformNames()
 
                 .spliterator(),
 
         false))
 
.collect(Collectors.toCollection(LinkedHashSet :: new));

(Sorry for line wrapping. This file has some long lines.) Again, using the proposal 
API would shorten things a bit, but it doesn't really make much difference within 
the overall context:

             Set<String> platforms = StreamSupport.stream(providers.spliterator(), 
false)
                                                  .flatMap(provider -> 
StreamSupport.stream(provider.getSupportedPlatformNames()
 
                 .spliterator(),
 
         false))
                                                  .toCollection(LinkedHashSet :: new));


# src/java.logging/share/classes/java/util/logging/LogManager.java:2138


             final Map<String, TreeSet<String>> loggerConfigs =
 
allKeys.collect(Collectors.groupingBy(ConfigProperty::getLoggerName,
                                     TreeMap::new,
                                     Collectors.toCollection(TreeSet::new)));

This is an interesting case, as the toCollection() method is being used to produce a 
"downstream" collector passed to groupingBy(). Since the proposed API is on stream, 
it can't be used here. There are a few other cases like this where toCollection is 
used as a downstream collector, not as an argument to Stream::collect.


# 
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/HtmlConfiguration.java:377


     public List<DocPath> getAdditionalStylesheets() {
         return options.additionalStylesheets().stream()
                 .map(ssf -> DocFile.createFileForInput(this, ssf))
                 .map(file -> DocPath.create(file.getName()))
                 .collect(Collectors.toCollection(ArrayList::new));
     }

This is another place where the proposed API can be used straightforwardly:

     public List<DocPath> getAdditionalStylesheets() {
         return options.additionalStylesheets().stream()
                 .map(ssf -> DocFile.createFileForInput(this, ssf))
                 .map(file -> DocPath.create(file.getName()))
                 .toCollection(ArrayList::new));
     }


# 
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/toolkit/util/IndexBuilder.java:220


This is a slightly different case, as it uses a lambda to pass a comparator to a 
constructor instead of using a constructor reference. Before:

     public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
         Objects.requireNonNull(kind);
         return itemsByCategory.getOrDefault(IndexItem.Category.TAGS, 
Collections.emptySortedSet()).stream()
                 .filter(i -> i.isKind(kind))
                 .collect(Collectors.toCollection(() -> new TreeSet<>(mainComparator)));
     }

After:

     public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
         Objects.requireNonNull(kind);
         return itemsByCategory.getOrDefault(IndexItem.Category.TAGS, 
Collections.emptySortedSet()).stream()
                 .filter(i -> i.isKind(kind))
                 .toCollection(() -> new TreeSet<>(mainComparator)));
     }


*****


There are a few other cases in the JDK but they don't seem to lead to any new insights.

Some observations.

  - Using this API saves 19 characters compared to Collectors::toCollection, but it 
saves only eight characters compared to Collectors::toCollection with a static import.

  - Using this API doesn't relieve the calling code of any burden of tedious or 
error-prone code. The code it replaces is quite straightforward.

  - There don't appear to be any opportunities for optimization. In order to handle 
the parallel case, this pretty much is required to delegate to the collector. I 
suppose the serial case could be handled specially, but it boils down to 
constructing the destination and then calling add() on it repeatedly, which is 
pretty much what the collector ends up doing in the serial case anyway.

  - Cases seem to occur quite infrequently compared to others such as 
Collectors::toList and Stream::toList.

  - Some cases of Collectors::toCollection are used as "downstream" collectors, that 
is, passed to other collectors instead of Stream::collect. This narrows the range of 
possible uses of the API still further.

  - There is a recurring pattern

      Collectors.toCollection(ArrayList::new)

This is useful in place of Collectors::toList for places where the code wants to 
guarantee the result to be an ArrayList. (Even though Collectors::toList returns an 
ArrayList, it isn't specified to do so.) But there are cases (such as the one I 
looked at above) where the return list isn't actually modified -- and indeed it 
would be an error if it were modified -- so Stream::toList could be used just as 
well for those.

  - The JDK is not necessarily a representative code base, but frequencies here do 
seem to mirror what I've seen in open source: Collectors::toCollection is much less 
frequent than Collectors::toList.

  - There doesn't appear to be any semantic difference between the proposed 
Stream::toCollection and the existing Collectors::toCollection.

Based on these observations I'm having a hard time mustering much enthusiasm for 
this API.

You might ask, hasn't the JDK added other convenience APIs? There have probably been 
a few one-liners, but we are really trying to keep them to a minimum. Mainly, 
convenience APIs are indeed convenient, but in many cases they add a lot of value in 
other ways as well. Here are some examples.

  - Stream::toList. We discussed this recently, so I won't restate everything. 
Briefly, though, this can be used as a replacement for Collectors::toList, which is 
used VERY frequently, it provides stronger semantics, and it's faster because it 
avoids extra array allocation and copying.

  - String::repeat. Repeating a String is a simple for-loop. However, if you look at 
the implementation [1] there really is a lot going on here. It's a lot faster than 
the straightforward code, because it peels off a few special cases, it uses a clever 
doubling algorithm to call arraycopy a minimal number of times, and it deals with 
things at the byte level, so it can create a String without any copying or any 
codeset conversion. In addition to convenience, the value here is that it's much 
faster than a simple for-loop that one might write instead. It also leverages the 
JDK-internal String representation, which means that it does less allocation and 
copying than other utility libraries would.

[1] 
https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/lang/String.java#L3560

  - InputStream::transferTo. This is mostly a straightforward loop [2], but the 
details are devilishly hard to get right. If you look at user code that does copying 
like this, it usually gets some edge case wrong, for example, not handling partial 
reads. The value provided here is not that it's faster than the code it would 
replace, but that it relieves the caller of the responsibility of writing a loop 
that's easy to get wrong (or to form a dependency on another library that has this 
utility method).

[2] 
https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/io/InputStream.java#L777

Anyway, this is the sort of analysis and justification that I'd like to see for 
convenience APIs. Such APIs need to be more than just a shorthand for something a 
bit longer.

s'marks



More information about the core-libs-dev mailing list