Additional method on Stream

Mon Apr 27 19:04:38 UTC 2015

On 04/27/2015 05:23 PM, Paul Sandoz wrote:
> On Apr 27, 2015, at 4:56 PM, Stephen Colebourne <scolebourne at joda.org> wrote:
>> Obviously, this is yet another possible workaround. But it is a
>> workaround.
> I don't consider it "just a workaround" :-)
>
>
>> There really aren't that many rough edges with the set of
>> methods added with lambdas, but this is definitely one. That Guava
>> handled it specially is another good indication.
>>
> Tis conjecture, but perhaps it might have been different in post-lambda world?
>
> One issue is there are zillions of possible more specific convenience operations we could add. Everyone has their own favourite. Some static methods were recently added to Stream and Optional in preference to such operations.
>
> There has to be a really good reason to add new operations. I realize this use-case might be more common than others but i am still yet to be convinced that it has sufficient weight given flatMap + lambda + static method.

One reason might be that the workaround creates at least two new objects 
per included element of the stream and the overhead involved for 
executing the flat-map logic. A more general operation might be 
something like the following:

     /**
      * Returns a stream consisting of the non-null results of applying 
the given
      * function to the elements of this stream.
      */
     <R> Stream<R> filterMap(Function<? super T, ? extends R> mapper);

Stephen's example would then read:

     return input.stream()
         .filterMap(t -> t instanceof Foo ? (Foo) t : null)
         .someTerminalOperation();

Combining filtering and mapping in one operation might often be 
desirable to avoid duplicate work (for example when filtering and 
mapping needs to compute some common but costly intermediate result for 
each element). flatMap is admittedly suitable for that too, but has it's 
overhead. At what per-operation cost this overhead pays-off can be seen 
at the end...

I know that null values were a controversial topic when this API was 
being designed and that the decision was made to basically "ignore" 
their presence in stream elements. So making null part of the API 
contract might be out of the question right? So what about Optional? 
Could it be used to make flatMap a little more efficient for the 
combined filter/map case?

For example, could the following composition be written in a more 
concise way?

input.stream()
     .map(t -> t instanceof Foo ? Optional.of((Foo) t) : Optional.empty())
     .filter(Optional::isPresent)
     .map(Optional::get)

Maybe with operation like:

     /**
      * Returns a stream consisting of the "present" unwrapped results 
of applying the given
      * function to the elements of this stream.
      */
     <R> Stream<R> mapOptionally(Function<? super T, Optional<? extends 
R>> mapper);

But that's not what Stephen would like to see, and I personally don't 
mind being a little more verbose if it makes code execute faster. I 
would be pretty confident writing the following:

input.stream()
     .map(t -> t instanceof Foo ? (Foo)t : null)
     .filter(f -> f != null)

To quantify the overheads involved with various approaches, I created a 
little benchmark that shows the following results:

Benchmark                       (opCost)   Mode   Samples Score  Score 
error    Units

j.t.StreamBench.filterThenMap          0   avgt 10        1.186        
0.010    ms/op
j.t.StreamBench.filterThenMap         10   avgt 10        2.642        
0.205    ms/op
j.t.StreamBench.filterThenMap         20   avgt 10        5.254        
0.011    ms/op
j.t.StreamBench.filterThenMap         30   avgt 10        8.187        
0.165    ms/op
j.t.StreamBench.filterThenMap         40   avgt 10       11.525        
0.295    ms/op

j.t.StreamBench.flatMap                0   avgt 10        2.015        
0.188    ms/op
j.t.StreamBench.flatMap               10   avgt 10        3.287        
0.224    ms/op
j.t.StreamBench.flatMap               20   avgt 10        5.275        
0.638    ms/op
j.t.StreamBench.flatMap               30   avgt 10        7.033        
0.209    ms/op
j.t.StreamBench.flatMap               40   avgt 10        9.146        
0.281    ms/op

j.t.StreamBench.mapToNullable          0   avgt 10        1.185        
0.006    ms/op
j.t.StreamBench.mapToNullable         10   avgt 10        2.120        
0.392    ms/op
j.t.StreamBench.mapToNullable         20   avgt 10        3.677        
0.210    ms/op
j.t.StreamBench.mapToNullable         30   avgt 10        5.526        
0.126    ms/op
j.t.StreamBench.mapToNullable         40   avgt 10        7.884        
0.202    ms/op

j.t.StreamBench.mapToOptional          0   avgt 10        1.144        
0.121    ms/op
j.t.StreamBench.mapToOptional         10   avgt 10        2.322        
0.146    ms/op
j.t.StreamBench.mapToOptional         20   avgt 10        4.371        
0.270    ms/op
j.t.StreamBench.mapToOptional         30   avgt 10        6.215        
0.536    ms/op
j.t.StreamBench.mapToOptional         40   avgt 10        8.471        
0.554    ms/op

Comparing .filter(op).map(op) with .flatMap(op) where each operation has 
it's cost, we see there is a tripping point at opCost=20 where flatMap() 
starts to pay off if we can merge the two ops into one with equal cost. 
But we can also see that flatMap has it's cost too, compared to other 
two approaches (mapToNullable/mapToOptional) which is most obvious when 
the operation cost is low.

So the conclusion? No, I don't think we need a new Stream method. I just 
wanted to show that flatMap() is maybe the most universal but not always 
the best (fastest) answer for each problem.

Regards, Peter

P.S. The benchmark source:

package jdk.test;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Stream;

/**
  * Created by peter on 4/27/15.
  */
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, warmups = 0)
@Warmup(iterations = 5)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StreamBench {

     @Param({"0", "10", "20", "30", "40"})
     public long opCost;

     List<Object> objects;

     @Setup
     public void setup() {
         objects = new ArrayList<>(100000);
         ThreadLocalRandom tlr = ThreadLocalRandom.current();
         for (int i = 0; i < 100000; i++) {
             objects.add(tlr.nextBoolean() ? "123" : 123);
         }
     }

     <F, T> Function<F, T> withCost(Function<F, T> function) {
         return f -> {
             Blackhole.consumeCPU(opCost);
             return function.apply(f);
         };
     }

     <T> Predicate<T> withCost(Predicate<T> predicate) {
         return t -> {
             Blackhole.consumeCPU(opCost);
             return predicate.test(t);
         };
     }

     @Benchmark
     public long filterThenMap() {
         return objects.stream()
             .filter(withCost((Object o) -> o instanceof String))
             .map(withCost((Object o) -> (String) o))
             .count();
     }

     @Benchmark
     public long flatMap() {
         return objects.stream()
             .flatMap(withCost((Object o) -> o instanceof String
                 ? Stream.of((String) o) : Stream.empty()))
             .count();
     }

     @Benchmark
     public long mapToOptional() {
         return objects.stream()
             .map(withCost((Object o) -> o instanceof String
                 ? Optional.of((String) o) : Optional.empty()))
             .filter(Optional::isPresent)
             .map(Optional::get)
             .count();
     }

     @Benchmark
     public long mapToNullable() {
         return objects.stream()
             .map(withCost((Object o) -> o instanceof String
                 ? (String) o : null))
             .filter(s -> s != null)
             .count();
     }
}

>
>> BTW, I wait months before making this request to see if it really was
>> common enough a pattern, but I'm confident that it is now.
>>
> Were you aware of the pattern using flatMap during those months?
>
> Paul.
>