RFR: JDK-8277095 : Empty streams create too many objects
Michael Bien
duke at openjdk.java.net
Mon Nov 15 12:46:29 UTC 2021
On Sun, 7 Nov 2021 06:53:12 GMT, kabutz <duke at openjdk.java.net> wrote:
>>> The net effect of this change might depend on your workload. If you call stream() on empty collections that have cheap isEmpty(), this change will likely improve performance and reduce waste. However, this same change might do the opposite if some of your collections aren't empty or have costly isEmpty(). It would be good to have benchmarks for different workloads.
>>
>> Yes, I also thought about the cost of isEmpty() on concurrent collections. There are four concurrent collections that have a linear time cost size() method: CLQ, CLD, LTQ and CHM. However, in each of these cases, the isEmpty() method has constant time cost. There might be collections defined outside the JDK where this could be the case.
>>
>> However, I will extend the benchmark to include a few of those cases too, as well as different sizes and collection sizes.
>>
>> Thank you so much for your input.
>
>> wouldn't this make streams no longer lazy if the collection is empty?
>>
>> ```java
>> List<String> list = new ArrayList<>();
>> Stream<String> stream = list.stream();
>>
>> list.addAll(List.of("one", "two", "three"));
>>
>> stream.forEach(System.out::println); // prints one two three
>> ```
>
> I did not consider this case, thank you for bringing it up. I have always found this behaviour a bit strange and have never used it "in the real world". It is also not consistent between collections. Here is an example with four collections: ArrayList, CopyOnWriteArrayList, ConcurrentSkipListSet and ArrayBlockingQueue:
>
>
> import java.util.ArrayList;
> import java.util.Arrays;
> import java.util.Collection;
> import java.util.List;
> import java.util.Objects;
> import java.util.concurrent.ArrayBlockingQueue;
> import java.util.concurrent.ConcurrentSkipListSet;
> import java.util.concurrent.CopyOnWriteArrayList;
> import java.util.function.Supplier;
> import java.util.stream.IntStream;
>
> public class LazyStreamDemo {
> public static void main(String... args) {
> List<Supplier<Collection<String>>> suppliers =
> List.of(ArrayList::new, // fast-fail
> CopyOnWriteArrayList::new, // snapshot
> ConcurrentSkipListSet::new, // weakly-consistent
> () -> new ArrayBlockingQueue<>(10) // weakly-consistent
> );
> for (Supplier<Collection<String>> supplier : suppliers) {
> Collection<String> c = supplier.get();
> System.out.println(c.getClass());
> IntStream stream = c.stream()
> .sorted()
> .filter(Objects::nonNull)
> .mapToInt(String::length)
> .sorted();
>
> c.addAll(List.of("one", "two", "three", "four", "five"));
> System.out.println("stream = " + Arrays.toString(stream.toArray()));
> }
> }
> }
>
>
> The output is:
>
>
> class java.util.ArrayList
> stream = [3, 3, 4, 4, 5]
> class java.util.concurrent.CopyOnWriteArrayList
> stream = []
> class java.util.concurrent.ConcurrentSkipListSet
> stream = []
> class java.util.concurrent.ArrayBlockingQueue
> stream = [3, 3, 4, 4, 5]
>
>
> At least with the EmptyStream we would have consistent output of always []
@kabutz I agree that i wouldn't consider it clean code to use a stream like i put into the example. I only brought it up because it might break existing code, since i think streams are expected to be lazy. Interesting to see that they are in fact not lazy in all situations - i put that into my notes.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6275
More information about the core-libs-dev
mailing list