RFR: JDK-8277095 : Empty streams create too many objects

kabutz duke at openjdk.java.net
Mon Nov 15 12:46:28 UTC 2021


On Sun, 7 Nov 2021 06:26:22 GMT, kabutz <duke at openjdk.java.net> wrote:

>> (immutable collections could override stream() instead, since they don't have that problem)
>
>> The net effect of this change might depend on your workload. If you call stream() on empty collections that have cheap isEmpty(), this change will likely improve performance and reduce waste. However, this same change might do the opposite if some of your collections aren't empty or have costly isEmpty(). It would be good to have benchmarks for different workloads.
> 
> Yes, I also thought about the cost of isEmpty() on concurrent collections. There are four concurrent collections that have a linear time cost size() method: CLQ, CLD, LTQ and CHM. However, in each of these cases, the isEmpty() method has constant time cost. There might be collections defined outside the JDK where this could be the case.
> 
> However, I will extend the benchmark to include a few of those cases too, as well as different sizes and collection sizes.
> 
> Thank you so much for your input.

> wouldn't this make streams no longer lazy if the collection is empty?
> 
> ```java
>         List<String> list = new ArrayList<>();
>         Stream<String> stream = list.stream();
> 
>         list.addAll(List.of("one", "two", "three"));
> 
>         stream.forEach(System.out::println); // prints one two three
> ```

I did not consider this case, thank you for bringing it up. I have always found this behaviour a bit strange and have never used it "in the real world". It is also not consistent between collections. Here is an example with four collections: ArrayList, CopyOnWriteArrayList, ConcurrentSkipListSet and ArrayBlockingQueue:


import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.Objects;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ConcurrentSkipListSet;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.function.Supplier;
import java.util.stream.IntStream;

public class LazyStreamDemo {
    public static void main(String... args) {
        List<Supplier<Collection<String>>> suppliers =
                List.of(ArrayList::new, // fast-fail
                        CopyOnWriteArrayList::new, // snapshot
                        ConcurrentSkipListSet::new, // weakly-consistent
                        () -> new ArrayBlockingQueue<>(10) // weakly-consistent
                );
        for (Supplier<Collection<String>> supplier : suppliers) {
            Collection<String> c = supplier.get();
            System.out.println(c.getClass());
            IntStream stream = c.stream()
                    .sorted()
                    .filter(Objects::nonNull)
                    .mapToInt(String::length)
                    .sorted();

            c.addAll(List.of("one", "two", "three", "four", "five"));
            System.out.println("stream = " + Arrays.toString(stream.toArray()));
        }
    }
}


The output is:


class java.util.ArrayList
stream = [3, 3, 4, 4, 5]
class java.util.concurrent.CopyOnWriteArrayList
stream = []
class java.util.concurrent.ConcurrentSkipListSet
stream = []
class java.util.concurrent.ArrayBlockingQueue
stream = [3, 3, 4, 4, 5]


At least with the EmptyStream we would have consistent output of always []

-------------

PR: https://git.openjdk.java.net/jdk/pull/6275


More information about the core-libs-dev mailing list