RFR: 8342513: Autoboxing Overhead & Inefficient Parallel Processing
wasif kirmani
duke at openjdk.org
Thu Oct 17 19:43:09 UTC 2024
On Thu, 17 Oct 2024 16:21:29 GMT, wasif kirmani <duke at openjdk.org> wrote:
> [JDK-8342513](https://bugs.openjdk.org/browse/JDK-8342513) : Autoboxing Overhead & Inefficient Parallel Processing
Thank you for your detailed feedback and the opportunity to clarify the proposed changes.
The issue raised was not about the primitive streams performing unnecessary boxing, but rather about efficiently handling larger streams with more complex operations (e.g., filtering large datasets).
You're absolutely right: changing a sequential stream to a parallel one without explicit user consent introduces both observable behavior changes and possible performance degradation. Stream behavior (especially around parallelism) is something that users explicitly opt into, and this control should remain in the user's hands.
Plan for Resolution:
I will remove the automatic parallelization logic and ensure that users maintain control over when a stream becomes parallel. Any optimizations for parallel streams should only apply when the stream is explicitly parallelized by the user (via .parallel()).
Thank you for pointing this out. This issue occurs due to my incorrect assumption regarding the spliterator() method. As you correctly noted, invoking spliterator() on a stream is considered a terminal operation, which effectively consumes the stream, making it unusable for further operations.
This problem arises from trying to split the stream based on size and then applying the filter, which in turn attempts to reuse the consumed stream, causing the IllegalStateException.
Plan for Resolution:
I will rework the logic to avoid prematurely consuming the stream by calling spliterator(). Instead of trying to conditionally split the stream based on size, I will focus on optimizing the filtering operation while preserving the sequential or parallel nature of the stream as defined previously.
Additionally, I will review the stream lifecycle and ensure that no terminal operation is mistakenly invoked before all intermediate operations are applied.
I verified the filter by implementing time changes with IntStream and found out that:
long startTime = System.nanoTime();
long count = optimizedIntStream(IntStream.range(0, 1_000_000))
.filter(n -> n % 2 == 0)
.count();
long endTime = System.nanoTime();
Java 23 filter count: 500000
Java 23 filter execution time: 10 ms
Optimized filter count: 500000
Optimized filter execution time: 5 ms
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21566#issuecomment-2420381760
More information about the core-libs-dev
mailing list