Additional method on Stream
Peter Levart
peter.levart at gmail.com
Mon Apr 27 19:04:38 UTC 2015
On 04/27/2015 05:23 PM, Paul Sandoz wrote:
> On Apr 27, 2015, at 4:56 PM, Stephen Colebourne <scolebourne at joda.org> wrote:
>> Obviously, this is yet another possible workaround. But it is a
>> workaround.
> I don't consider it "just a workaround" :-)
>
>
>> There really aren't that many rough edges with the set of
>> methods added with lambdas, but this is definitely one. That Guava
>> handled it specially is another good indication.
>>
> Tis conjecture, but perhaps it might have been different in post-lambda world?
>
> One issue is there are zillions of possible more specific convenience operations we could add. Everyone has their own favourite. Some static methods were recently added to Stream and Optional in preference to such operations.
>
> There has to be a really good reason to add new operations. I realize this use-case might be more common than others but i am still yet to be convinced that it has sufficient weight given flatMap + lambda + static method.
One reason might be that the workaround creates at least two new objects
per included element of the stream and the overhead involved for
executing the flat-map logic. A more general operation might be
something like the following:
/**
* Returns a stream consisting of the non-null results of applying
the given
* function to the elements of this stream.
*/
<R> Stream<R> filterMap(Function<? super T, ? extends R> mapper);
Stephen's example would then read:
return input.stream()
.filterMap(t -> t instanceof Foo ? (Foo) t : null)
.someTerminalOperation();
Combining filtering and mapping in one operation might often be
desirable to avoid duplicate work (for example when filtering and
mapping needs to compute some common but costly intermediate result for
each element). flatMap is admittedly suitable for that too, but has it's
overhead. At what per-operation cost this overhead pays-off can be seen
at the end...
I know that null values were a controversial topic when this API was
being designed and that the decision was made to basically "ignore"
their presence in stream elements. So making null part of the API
contract might be out of the question right? So what about Optional?
Could it be used to make flatMap a little more efficient for the
combined filter/map case?
For example, could the following composition be written in a more
concise way?
input.stream()
.map(t -> t instanceof Foo ? Optional.of((Foo) t) : Optional.empty())
.filter(Optional::isPresent)
.map(Optional::get)
Maybe with operation like:
/**
* Returns a stream consisting of the "present" unwrapped results
of applying the given
* function to the elements of this stream.
*/
<R> Stream<R> mapOptionally(Function<? super T, Optional<? extends
R>> mapper);
But that's not what Stephen would like to see, and I personally don't
mind being a little more verbose if it makes code execute faster. I
would be pretty confident writing the following:
input.stream()
.map(t -> t instanceof Foo ? (Foo)t : null)
.filter(f -> f != null)
To quantify the overheads involved with various approaches, I created a
little benchmark that shows the following results:
Benchmark (opCost) Mode Samples Score Score
error Units
j.t.StreamBench.filterThenMap 0 avgt 10 1.186
0.010 ms/op
j.t.StreamBench.filterThenMap 10 avgt 10 2.642
0.205 ms/op
j.t.StreamBench.filterThenMap 20 avgt 10 5.254
0.011 ms/op
j.t.StreamBench.filterThenMap 30 avgt 10 8.187
0.165 ms/op
j.t.StreamBench.filterThenMap 40 avgt 10 11.525
0.295 ms/op
j.t.StreamBench.flatMap 0 avgt 10 2.015
0.188 ms/op
j.t.StreamBench.flatMap 10 avgt 10 3.287
0.224 ms/op
j.t.StreamBench.flatMap 20 avgt 10 5.275
0.638 ms/op
j.t.StreamBench.flatMap 30 avgt 10 7.033
0.209 ms/op
j.t.StreamBench.flatMap 40 avgt 10 9.146
0.281 ms/op
j.t.StreamBench.mapToNullable 0 avgt 10 1.185
0.006 ms/op
j.t.StreamBench.mapToNullable 10 avgt 10 2.120
0.392 ms/op
j.t.StreamBench.mapToNullable 20 avgt 10 3.677
0.210 ms/op
j.t.StreamBench.mapToNullable 30 avgt 10 5.526
0.126 ms/op
j.t.StreamBench.mapToNullable 40 avgt 10 7.884
0.202 ms/op
j.t.StreamBench.mapToOptional 0 avgt 10 1.144
0.121 ms/op
j.t.StreamBench.mapToOptional 10 avgt 10 2.322
0.146 ms/op
j.t.StreamBench.mapToOptional 20 avgt 10 4.371
0.270 ms/op
j.t.StreamBench.mapToOptional 30 avgt 10 6.215
0.536 ms/op
j.t.StreamBench.mapToOptional 40 avgt 10 8.471
0.554 ms/op
Comparing .filter(op).map(op) with .flatMap(op) where each operation has
it's cost, we see there is a tripping point at opCost=20 where flatMap()
starts to pay off if we can merge the two ops into one with equal cost.
But we can also see that flatMap has it's cost too, compared to other
two approaches (mapToNullable/mapToOptional) which is most obvious when
the operation cost is low.
So the conclusion? No, I don't think we need a new Stream method. I just
wanted to show that flatMap() is maybe the most universal but not always
the best (fastest) answer for each problem.
Regards, Peter
P.S. The benchmark source:
package jdk.test;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Stream;
/**
* Created by peter on 4/27/15.
*/
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, warmups = 0)
@Warmup(iterations = 5)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StreamBench {
@Param({"0", "10", "20", "30", "40"})
public long opCost;
List<Object> objects;
@Setup
public void setup() {
objects = new ArrayList<>(100000);
ThreadLocalRandom tlr = ThreadLocalRandom.current();
for (int i = 0; i < 100000; i++) {
objects.add(tlr.nextBoolean() ? "123" : 123);
}
}
<F, T> Function<F, T> withCost(Function<F, T> function) {
return f -> {
Blackhole.consumeCPU(opCost);
return function.apply(f);
};
}
<T> Predicate<T> withCost(Predicate<T> predicate) {
return t -> {
Blackhole.consumeCPU(opCost);
return predicate.test(t);
};
}
@Benchmark
public long filterThenMap() {
return objects.stream()
.filter(withCost((Object o) -> o instanceof String))
.map(withCost((Object o) -> (String) o))
.count();
}
@Benchmark
public long flatMap() {
return objects.stream()
.flatMap(withCost((Object o) -> o instanceof String
? Stream.of((String) o) : Stream.empty()))
.count();
}
@Benchmark
public long mapToOptional() {
return objects.stream()
.map(withCost((Object o) -> o instanceof String
? Optional.of((String) o) : Optional.empty()))
.filter(Optional::isPresent)
.map(Optional::get)
.count();
}
@Benchmark
public long mapToNullable() {
return objects.stream()
.map(withCost((Object o) -> o instanceof String
? (String) o : null))
.filter(s -> s != null)
.count();
}
}
>
>> BTW, I wait months before making this request to see if it really was
>> common enough a pattern, but I'm confident that it is now.
>>
> Were you aware of the pattern using flatMap during those months?
>
> Paul.
>
More information about the core-libs-dev
mailing list