New convenience methods on Stream
    dfranken.jdk at gmail.com 
    dfranken.jdk at gmail.com
       
    Sun May  9 17:33:47 UTC 2021
    
    
  
>From my own limited experience, I've seen .collect(Supplier) often when
an explicitly mutable collection is needed, such as with ArrayList::new
or HashSet::new.
Even though you could in theory use Stream.toList() for the ArrayList
version, I don't think this is advisable as toList isn't guaranteed to
return an ArrayList. It may even return a different type at runtime
based on the input size for instance, that's all up to the
implementation and we should not care about it.
Given these use cases where explicitly modifiable collections are
wanted, it could be more useful to add methods such as
.toModifiableList() and .toModifiableSet() rather than a shorthand
collector.
But these methods really belong to the Collectors API.
Note that for instance for Collectors.toList "there are no guarantees
on the type, mutability, serializability, or thread-safety of the List
returned" and there exists a Collectors.toUnmodifiableList() which does
guarantee an unmodifiable list. Adding explicit .toModifiableList() and
.toModifiableSet() with their respective guarantees might be helpful as
we would create more symmetry between modifiable and unmodifiable
collectors.
Adding these methods also allows returning more specialized
implementations, but it also frees up any optimizations we might want
to do for `Collectors.toList()` because we can now offer a more
explicit choice between modifiable and unmodifiable variants.
Kind regards,
Dave Franken
On Tue, 2021-05-04 at 23:12 -0700, Stuart Marks wrote:
> Hi Don,
> 
> When evaluating new APIs I always find it helpful and educational to
> look for cases 
> in real code where they might be applied, and what effect the API has
> at the call 
> site. The search need not be exhaustive, but it's probably sufficient
> to find a 
> number of representative examples. This does take some effort,
> though. For now I'll 
> take a look at some examples where your first item can be applied:
> 
> > 
> > default <R extends Collection<T>> R toCollection(Supplier<R>
> > supplier)
> > {
> >     return this.collect(Collectors.toCollection(supplier));
> > }
> > 
> > Usage Examples:
> > 
> > HashSet<String> set = stream.toCollection(HashSet::new);
> > TreeSet<String> sortedSet = stream.toCollection(TreeSet::new);
> > ArrayDeque<String> deque = stream.toCollection(ArrayDeque::new);
> 
> Since I have the JDK handy I searched it for 'toCollection('. There
> are around 60 
> hits -- but note that 2/3 of these are in java.sql.rowset and refer
> to an unrelated 
> API. Some are in comments, and some are the implementation of 
> Collectors.toCollection itself, which leaves just 14 cases. Let's
> look at a few.
> 
> 
> #
> src/jdk.compiler/share/classes/com/sun/tools/javac/api/JavacTaskPool.
> java:164
> 
> 
>          List<String> opts =
>                  StreamSupport.stream(options.spliterator(), false)
>                              
> .collect(Collectors.toCollection(ArrayList::new));
> 
> Using the proposed API here would result in:
> 
>          List<String> opts =
>                  StreamSupport.stream(options.spliterator(), false)
>                               .toCollection(ArrayList::new));
> 
> This makes the code a little bit nicer. A static import would also
> haved helped, 
> though not quite as much as the new API:
> 
>          List<String> opts =
>                  StreamSupport.stream(options.spliterator(), false)
>                              
> .collect(toCollection(ArrayList::new)));
> 
> I also note that after some analysis of the usage of the resulting
> List, it's never 
> modified -- indeed, it's used as the key of a Map -- so this could be
> replaced with 
> the recently-added Stream::toList.
> 
> 
> #
> src/jdk.compiler/share/classes/com/sun/tools/javac/main/Option.java:3
> 81
> 
> 
>              Set<String> platforms =
> StreamSupport.stream(providers.spliterator(), 
> false)
>                                                   .flatMap(provider -
> > 
> StreamSupport.stream(provider.getSupportedPlatformNames()
>  
>                  .spliterator(),
>  
>          false))
>  
> .collect(Collectors.toCollection(LinkedHashSet :: new));
> 
> (Sorry for line wrapping. This file has some long lines.) Again,
> using the proposal 
> API would shorten things a bit, but it doesn't really make much
> difference within 
> the overall context:
> 
>              Set<String> platforms =
> StreamSupport.stream(providers.spliterator(), 
> false)
>                                                   .flatMap(provider -
> > 
> StreamSupport.stream(provider.getSupportedPlatformNames()
>  
>                  .spliterator(),
>  
>          false))
>                                                  
> .toCollection(LinkedHashSet :: new));
> 
> 
> #
> src/java.logging/share/classes/java/util/logging/LogManager.java:2138
> 
> 
>              final Map<String, TreeSet<String>> loggerConfigs =
>  
> allKeys.collect(Collectors.groupingBy(ConfigProperty::getLoggerName,
>                                      TreeMap::new,
>                                     
> Collectors.toCollection(TreeSet::new)));
> 
> This is an interesting case, as the toCollection() method is being
> used to produce a 
> "downstream" collector passed to groupingBy(). Since the proposed API
> is on stream, 
> it can't be used here. There are a few other cases like this where
> toCollection is 
> used as a downstream collector, not as an argument to
> Stream::collect.
> 
> 
> # 
> src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/ht
> ml/HtmlConfiguration.java:377
> 
> 
>      public List<DocPath> getAdditionalStylesheets() {
>          return options.additionalStylesheets().stream()
>                  .map(ssf -> DocFile.createFileForInput(this, ssf))
>                  .map(file -> DocPath.create(file.getName()))
>                  .collect(Collectors.toCollection(ArrayList::new));
>      }
> 
> This is another place where the proposed API can be used
> straightforwardly:
> 
>      public List<DocPath> getAdditionalStylesheets() {
>          return options.additionalStylesheets().stream()
>                  .map(ssf -> DocFile.createFileForInput(this, ssf))
>                  .map(file -> DocPath.create(file.getName()))
>                  .toCollection(ArrayList::new));
>      }
> 
> 
> # 
> src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/toolkit/ut
> il/IndexBuilder.java:220
> 
> 
> This is a slightly different case, as it uses a lambda to pass a
> comparator to a 
> constructor instead of using a constructor reference. Before:
> 
>      public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
>          Objects.requireNonNull(kind);
> Collections.emptySortedSet()).stream()
>                  .filter(i -> i.isKind(kind))
>                  .collect(Collectors.toCollection(() -> new
> TreeSet<>(mainComparator)));
>      }
> 
> After:
> 
>      public SortedSet<IndexItem> getItems(DocTree.Kind kind) {
>          Objects.requireNonNull(kind);
> Collections.emptySortedSet()).stream()
>                  .filter(i -> i.isKind(kind))
>                  .toCollection(() -> new TreeSet<>(mainComparator)));
>      }
> 
> 
> *****
> 
> 
> There are a few other cases in the JDK but they don't seem to lead to
> any new insights.
> 
> Some observations.
> 
>   - Using this API saves 19 characters compared to
> Collectors::toCollection, but it 
> saves only eight characters compared to Collectors::toCollection with
> a static import.
> 
>   - Using this API doesn't relieve the calling code of any burden of
> tedious or 
> error-prone code. The code it replaces is quite straightforward.
> 
>   - There don't appear to be any opportunities for optimization. In
> order to handle 
> the parallel case, this pretty much is required to delegate to the
> collector. I 
> suppose the serial case could be handled specially, but it boils down
> to 
> constructing the destination and then calling add() on it repeatedly,
> which is 
> pretty much what the collector ends up doing in the serial case
> anyway.
> 
> Collectors::toList and Stream::toList.
> 
>   - Some cases of Collectors::toCollection are used as "downstream"
> collectors, that 
> is, passed to other collectors instead of Stream::collect. This
> narrows the range of 
> possible uses of the API still further.
> 
>   - There is a recurring pattern
> 
>       Collectors.toCollection(ArrayList::new)
> 
> This is useful in place of Collectors::toList for places where the
> code wants to 
> guarantee the result to be an ArrayList. (Even though
> Collectors::toList returns an 
> ArrayList, it isn't specified to do so.) But there are cases (such as
> the one I 
> looked at above) where the return list isn't actually modified -- and
> indeed it 
> would be an error if it were modified -- so Stream::toList could be
> used just as 
> well for those.
> 
>   - The JDK is not necessarily a representative code base, but
> frequencies here do 
> seem to mirror what I've seen in open source:
> Collectors::toCollection is much less 
> frequent than Collectors::toList.
> 
>   - There doesn't appear to be any semantic difference between the
> proposed 
> Stream::toCollection and the existing Collectors::toCollection.
> 
> Based on these observations I'm having a hard time mustering much
> enthusiasm for 
> this API.
> 
> You might ask, hasn't the JDK added other convenience APIs? There
> have probably been 
> a few one-liners, but we are really trying to keep them to a minimum.
> Mainly, 
> convenience APIs are indeed convenient, but in many cases they add a
> lot of value in 
> other ways as well. Here are some examples.
> 
>   - Stream::toList. We discussed this recently, so I won't restate
> everything. 
> Briefly, though, this can be used as a replacement for
> Collectors::toList, which is 
> used VERY frequently, it provides stronger semantics, and it's faster
> because it 
> avoids extra array allocation and copying.
> 
>   - String::repeat. Repeating a String is a simple for-loop. However,
> if you look at 
> the implementation [1] there really is a lot going on here. It's a
> lot faster than 
> the straightforward code, because it peels off a few special cases,
> it uses a clever 
> doubling algorithm to call arraycopy a minimal number of times, and
> it deals with 
> things at the byte level, so it can create a String without any
> copying or any 
> codeset conversion. In addition to convenience, the value here is
> that it's much 
> faster than a simple for-loop that one might write instead. It also
> leverages the 
> JDK-internal String representation, which means that it does less
> allocation and 
> copying than other utility libraries would.
> 
> [1] 
> https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/lang/String.java#L3560
> 
>   - InputStream::transferTo. This is mostly a straightforward loop
> [2], but the 
> details are devilishly hard to get right. If you look at user code
> that does copying 
> like this, it usually gets some edge case wrong, for example, not
> handling partial 
> reads. The value provided here is not that it's faster than the code
> it would 
> replace, but that it relieves the caller of the responsibility of
> writing a loop 
> that's easy to get wrong (or to form a dependency on another library
> that has this 
> utility method).
> 
> [2] 
> https://github.com/openjdk/jdk16/blob/master/src/java.base/share/classes/java/io/InputStream.java#L777
> 
> Anyway, this is the sort of analysis and justification that I'd like
> to see for 
> convenience APIs. Such APIs need to be more than just a shorthand for
> something a 
> bit longer.
> 
> s'marks
> 
    
    
More information about the core-libs-dev
mailing list