Stream.concat with varagrs
Olexandr Rotan
rotanolexandr842 at gmail.com
Sun Sep 28 18:26:30 UTC 2025
After some further work, I have managed to come up with an implementation
<https://github.com/Evemose/nconcat/blob/master/src/main/java/nconcat/NConcatSpliterator.java>
that
beats both guava and standard streams even in size-sensitive operations,
like toList. <https://github.com/Evemose/nconcat/blob/master/results.txt>
Overall, the theoretical complexity is as follows:
1. New implementation has O(1) complexity for tryAdvance and trySplit
methods, as opposed to O(n) for reduction-style two-args concat
2. forEachRemaining has similar O(n) complexity for both implementations,
but Stream.concat has constant factor of 2, so effectively its O(n) vs
O(2n), resulting in slight edge of new implementation.over reduction concat
I have created a plots of avg time for standard concat,
guava implementation and new array-based implementation:
[image: image.png]
As you can see, graphs (at least to the best of my knowledge) confirms my
analysis: for findFirst, differences grow logarithmically, while for other
operations approximately linearly
As to numbers, here is the comparison table
Operation Streams NConcat_Time NConcat_Error Guava_Time Guava_Error
Standard_Time Standard_Error NConcat_Improvement Guava_Improvement
0 Basic Concat 4 1860.9 ± 27.5 27,54 2018.5 ± 124.4 124,406 1910.3 ± 18.3
18,326 +2.6% -5.7%
1 Basic Concat 8 3754.5 ± 36.0 35,967 3881.6 ± 34.2 34,188 3912.1 ± 43.6
43,646 +4.0% +0.8%
2 Basic Concat 16 7687.6 ± 110.6 110,61 7828.5 ± 346.7 346,685 8343.8 ± 61.6
61,578 +7.9% +6.2%
3 Basic Concat 32 15434.0 ± 200.3 200,266 16237.1 ± 1204.9 1204,86 18091.9
± 645.0 644,99 +14.7% +10.3%
4 Basic Concat 64 32563.0 ± 349.2 349,17 33429.5 ± 774.2 774,168 43664.2 ±
603.8 603,832 +25.4% +23.4%
5 FindFirst 4 92.1 ± 7.9 7,922 100.4 ± 6.8 6,801 123.8 ± 2.2 2,187 +25.6%
+18.9%
6 FindFirst 8 150.0 ± 2.8 2,793 162.3 ± 3.0 2,983 300.2 ± 3.3 3,288 +50.0%
+45.9%
7 FindFirst 16 275.2 ± 7.3 7,32 288.2 ± 5.4 5,412 877.9 ± 11.0 10,969 +68.7%
+67.2%
8 FindFirst 32 533.6 ± 10.9 10,856 539.2 ± 13.6 13,566 2858.9 ± 55.2 55,236
+81.3% +81.1%
9 FindFirst 64 1069.1 ± 17.4 17,419 1057.2 ± 20.4 20,366 11695.7 ± 103.9
103,941 +90.9% +91.0%
10 ToList 4 6565.2 ± 50.8 50,802 6899.2 ± 47.0 46,966 6756.1 ± 74.1 74,113
+2.8% -2.1%
11 ToList 8 13198.9 ± 113.8 113,804 13624.0 ± 122.2 122,223 13661.3 ± 157.4
157,366 +3.4% +0.3%
12 ToList 16 25234.2 ± 149.5 149,456 25822.8 ± 322.4 322,39 26686.5 ± 170.3
170,31 +5.4% +3.2%
13 ToList 32 48063.4 ± 1866.3 1866,299 46933.2 ± 414.2 414,197 53300.7 ±
2045.3 2045,28 +9.8% +11.9%
14 ToList 64 97838.0 ± 1574.1 1574,103 96473.1 ± 635.4 635,351 104342.7 ±
1159.1 1159,147 +6.2% +7.5%
15 ToListWithFilter 4 9368.4 ± 239.8 239,793 12481.3 ± 321.7 321,749 9776.4
± 212.0 212,029 +4.2% -27.7%
16 ToListWithFilter 8 20166.3 ± 964.2 964,186 26528.9 ± 390.2 390,212 20158.0
± 1482.1 1482,1 -0.0% -31.6%
17 ToListWithFilter 16 44067.1 ± 1090.0 1090,034 55530.4 ± 2455.1
2455,139 41009.5
± 1464.2 1464,204 -7.5% -35.4%
18 ToListWithFilter 32 89627.8 ± 2522.7 2522,718 109866.0 ± 2850.2
2850,223 81724.4
± 1279.5 1279,479 -9.7% -34.4%
19 ToListWithFilter 64 170779.8 ± 8213.1 8213,117 220987.1 ± 4705.0
4705,017 176520.6
± 3400.0 3400,028 +3.3% -25.2%
Besides strange (simingly of artifact nature, I guess) results regarding
filter + toList benchmark (filter is used to drop SIZED characteristic),
benchmarks show clear improvement for non-greedy operations, such as
findFirst, as well as slight improvements for toList and pretty significant
for forEach, which are internally using forEachRemaining instead of
tryAdvance. In summary, tryAdvance-based operations get significant, log(n)
improvement from the new implementation, while forEachRemaining-based show
clear improvements for some cases (forEach), while more controversial for
others (toList).
Now, I would like to, once again, reiterate over benefits of proposed
addition:
1. DX improvement and improved concitency with existing APIs like Stream.of
(though the new inconcistency emerges with methods like Math.max/min)
2. Improved flexibility
3. Signficant performance improvements for tryAdvance-based operations,
while still noticable ones for forEachRemaining-based as well
4. Relatively small change size: implementation of the new Spliterator is
completely independent, non-intrusive, and only takes 90 lines of code as
of now.
So now, that I have a relatively mature prototype, I would like to ask the
community for feedback: do you see this addition as valuable enough to be
present in the JDK?
Best regards
PS: Raw number could be found by the second link of the mail, or manually
in the root of the implementation repository
On Wed, Sep 17, 2025 at 11:21 PM Éamonn McManus <emcmanus at google.com> wrote:
> Guava has a Streams.concat
> <https://github.com/google/guava/blob/e9ea5a982cad06ebd223ec6fdb5294eeb18654f6/guava/src/com/google/common/collect/Streams.java#L187> method
> that may suggest implementation possibilities.
>
> On Wed, 17 Sept 2025 at 13:16, Olexandr Rotan <rotanolexandr842 at gmail.com>
> wrote:
>
>> So i have played around a bit and managed to come up with some
>> implementation based on array of streams, you can find it here:
>> https://github.com/Evemose/nconcat/blob/master/src/main/java/nconcat/NConcatSpliterator.java
>>
>> I have also added a small benchmark to the project, and the numbers are:
>>
>> Benchmark (streamCount)
>> Mode Cnt Score Error Units
>> NConcatBenchmark.nConcatFindFirst 4
>> avgt 10 131.616 � 15.474 ns/op
>> NConcatBenchmark.nConcatFindFirst 8
>> avgt 10 187.929 � 6.544 ns/op
>> NConcatBenchmark.nConcatFindFirst 16
>> avgt 10 322.342 � 6.940 ns/op
>> NConcatBenchmark.nConcatFindFirst 32
>> avgt 10 659.856 � 85.509 ns/op
>> NConcatBenchmark.nConcatFindFirst 64
>> avgt 10 1214.133 � 22.156 ns/op
>> NConcatBenchmark.nConcatMethod 4
>> avgt 10 1910.150 � 25.269 ns/op
>> NConcatBenchmark.nConcatMethod 8
>> avgt 10 3865.364 � 112.536 ns/op
>> NConcatBenchmark.nConcatMethod 16
>> avgt 10 7743.097 � 74.655 ns/op
>> NConcatBenchmark.nConcatMethod 32
>> avgt 10 15840.551 � 440.659 ns/op
>> NConcatBenchmark.nConcatMethod 64
>> avgt 10 32891.336 � 1122.630 ns/op
>> NConcatBenchmark.nConcatToListWithFilter 4
>> avgt 10 9527.120 � 376.325 ns/op
>> NConcatBenchmark.nConcatToListWithFilter 8
>> avgt 10 20260.027 � 552.444 ns/op
>> NConcatBenchmark.nConcatToListWithFilter 16
>> avgt 10 44724.856 � 5040.069 ns/op
>> NConcatBenchmark.nConcatToListWithFilter 32
>> avgt 10 82577.518 � 2050.955 ns/op
>> NConcatBenchmark.nConcatToListWithFilter 64
>> avgt 10 181460.219 � 20809.669 ns/op
>> NConcatBenchmark.nconcatToList 4
>> avgt 10 9268.814 � 712.883 ns/op
>> NConcatBenchmark.nconcatToList 8
>> avgt 10 18164.147 � 786.803 ns/op
>> NConcatBenchmark.nconcatToList 16
>> avgt 10 35146.891 � 966.871 ns/op
>> NConcatBenchmark.nconcatToList 32
>> avgt 10 68944.262 � 5321.730 ns/op
>> NConcatBenchmark.nconcatToList 64
>> avgt 10 136845.984 � 3491.562 ns/op
>> NConcatBenchmark.standardStreamConcat 4
>> avgt 10 1951.522 � 85.130 ns/op
>> NConcatBenchmark.standardStreamConcat 8
>> avgt 10 3990.410 � 190.517 ns/op
>> NConcatBenchmark.standardStreamConcat 16
>> avgt 10 8599.869 � 685.878 ns/op
>> NConcatBenchmark.standardStreamConcat 32
>> avgt 10 17923.603 � 361.874 ns/op
>> NConcatBenchmark.standardStreamConcat 64
>> avgt 10 46797.408 � 4458.069 ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst 4
>> avgt 10 125.192 � 3.123 ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst 8
>> avgt 10 303.791 � 8.670 ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst 16
>> avgt 10 907.429 � 52.620 ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst 32
>> avgt 10 2964.749 � 320.141 ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst 64
>> avgt 10 11749.653 � 189.300 ns/op
>> NConcatBenchmark.standardStreamConcatToList 4
>> avgt 10 7059.642 � 740.735 ns/op
>> NConcatBenchmark.standardStreamConcatToList 8
>> avgt 10 13714.980 � 250.208 ns/op
>> NConcatBenchmark.standardStreamConcatToList 16
>> avgt 10 27028.052 � 565.047 ns/op
>> NConcatBenchmark.standardStreamConcatToList 32
>> avgt 10 53537.731 � 853.363 ns/op
>> NConcatBenchmark.standardStreamConcatToList 64
>> avgt 10 105847.755 � 3179.918 ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter 4
>> avgt 10 9736.527 � 154.817 ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter 8
>> avgt 10 20607.061 � 713.083 ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter 16
>> avgt 10 41241.199 � 1171.672 ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter 32
>> avgt 10 83029.244 � 1843.176 ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter 64
>> avgt 10 182349.009 � 11282.832 ns/op
>>
>> Basically, the conclusion is following (guilty of using AI for
>> summarizing):
>>
>> The comprehensive benchmarks reveal that *NConcat significantly
>>> outperforms the standard library for processing-intensive operations*
>>> while trailing in simple collection scenarios. For short-circuit operations
>>> like findFirst(), NConcat delivers 38-90% better performance as stream
>>> count increases, reaching nearly 10x faster execution at 64 streams due to
>>> superior scaling (19ns/stream vs 184ns/stream). Full traversal operations
>>> like forEach consistently favor NConcat by 2-30%, with the advantage
>>> growing at scale. However, simple collection operations (toList())
>>> consistently run 22-24% faster with the standard library across all stream
>>> counts.
>>
>>
>>
>> I have tried multiple approaches to optimize toList with know size of all
>> sub-streams (which is clearly the reason why standard implementation wins
>> here), and am sure that there is still plenty of room for improvement,
>> especially in parallel, but the takeaway is, even a naive implementation
>> like mine could bring a significant performance improvement to the table in
>> early short-circuiting and full traversal cases that do not depend on size
>> of the spliterator.
>>
>> Besides the performance part, of course, the most significant advantage
>> of my proposal, as I think, is still developer experience, both reading and
>> writing stream code.
>>
>> Please let me know your thoughts on the results of prototype and possible
>> ways forward.
>>
>> Best regards
>>
>> On Wed, Sep 17, 2025 at 6:04 PM Olexandr Rotan <
>> rotanolexandr842 at gmail.com> wrote:
>>
>>> Hello everyone! Thanks for your responses
>>>
>>> I will start of by answering to Viktor
>>>
>>> I guess a "simple" implementation of an N-ary concat could work, but it
>>>> would have performance implications (think a recursive use of
>>>> Stream.concat())
>>>
>>>
>>> I too find just the addition of small reduction-performing sugar methods
>>> rather unsatisfactory and most certainly not bringing enough value to be
>>> considered a valuable addition. Moreover, I have not checked it myself, but
>>> I would dare to guess that popular utility libraries such as Guava or
>>> Apache Commons already provide this sort of functionality in their utility
>>> classes. Though, if this method could bring some significant performance
>>> benefits, I think it may be a valuable candidate to consider. Though, to me
>>> as a user, the main value would be uniformity of the API and ease of use
>>> and read. The main reason I am writing about this in the first place is the
>>> unintuitive inconsistency with many other static methods-creators that
>>> happily accept varargs
>>>
>>> I may play around with this spliterator code you have linked to to see
>>> if I could make it generalized for arrays of streams
>>>
>>> Now, answering to Pavel
>>>
>>> Is it such a useful use case, though? I mean, it's no different from
>>>> SequenceInputStream(...) or Math.min/max for that matter. I very rarely
>>>> have to do Math.min(a, Math(min(b, c)) or some such.
>>>
>>>
>>> I certainly see your point, but I would dare to say that most
>>> applications rely on the streams much more than SequenceInputStream and
>>> Math classes, and their lookalikes. Stream.concat is primarily a way to
>>> merge a few datasource outputs into one, for later uniform processing,
>>> which, in the nutshell, is one of the most common tasks in data-centric
>>> applications. Of course, not every such use case has characteristics that
>>> incline developers to use Stream.concat, such as combination of Stream.of
>>> and Collection.stream() sources, and even if they do, not every case that
>>> fits previous requirement requires to merge more than 2 sources. However,
>>> for mid-to-large scale apps, for which java is known the most, I would say
>>> it's fairly common. I went over our codebase and found that there were at
>>> least 10+ usages of concat, and a few of them followed this kinda ugly
>>> pattern of nested concates.
>>>
>>> Separately, it's not just one method. Consider that `concat` is also
>>>> implemented in specialized streams such as IntStream, DoubleStream, and
>>>> LongStream.
>>>
>>>
>>> This is unfortunate, but I would dare to say that once Reference
>>> spliterrator is implemented, others may also be derived by analogy fairly
>>> quickly
>>>
>>> And last but not least, answering Daniel
>>>
>>> Not immediately obvious but you can create a Stream<Stream<T>> using
>>>> Stream.of and reduce that using Stream::concat to obtain a Stream<T>.
>>>
>>> Something along those lines:
>>>
>>> ```
>>>> var stream = Stream.of(Stream.of(1,2,3), Stream.of(4), Stream.of(5, 6,
>>>> 7, 8)).reduce(Stream.empty(), Stream::concat, Stream::concat);
>>>
>>>
>>> This is what I meant by "reduction-like" implementation, which is fairly
>>> straightforward, but just from the looks of it, one could assume that this
>>> solution will surely have performance consequences, even if using flatmap
>>> insead of reduce. Not sure though, how often people would want to use such
>>> approach on the array of streams huge enough for the performance difference
>>> to be noticable, though I would assume that there is a non-linear scale of
>>> consumed time and resources from the length of streams array due to the
>>> implementation of concat method.
>>>
>>> Nevertheless, this is an acceptable workaround for such cases, even
>>> though not the most readable one. Even if this approach is accepted as
>>> sufficient for such cases of n-sized array of streams merging, It would
>>> probably make some sense to put note about it in the docs of the concat
>>> method. Though, not having concat(Stream..) overload would still remain
>>> unintuitive for many developers, including me
>>>
>>> Thanks everybody for the answers again
>>>
>>> Best regards
>>>
>>> On Wed, Sep 17, 2025 at 5:15 PM Pavel Rappo <pavel.rappo at gmail.com>
>>> wrote:
>>>
>>>> > this would be a great quality of life improvement
>>>>
>>>> Is it such a useful use case, though? I mean, it's no different from
>>>> SequenceInputStream(...) or Math.min/max for that matter. I very
>>>> rarely have to do Math.min(a, Math(min(b, c)) or some such. And those
>>>> methods predate streams API by more than a decade.
>>>>
>>>> Separately, it's not just one method. Consider that `concat` is also
>>>> implemented in specialized streams such as IntStream, DoubleStream,
>>>> and LongStream.
>>>>
>>>> On Wed, Sep 17, 2025 at 2:58 PM Olexandr Rotan
>>>> <rotanolexandr842 at gmail.com> wrote:
>>>> >
>>>> > Greetings to everyone on the list.
>>>> >
>>>> > When working on some routine tasks recently, I have encountered a,
>>>> seemingly to me, strange decision in design of Stream.concat method,
>>>> specifically the fact that it accepts exactly two streams. My concrete
>>>> example was something along the lines of
>>>> >
>>>> > var studentIds = ...;
>>>> > var teacherIds = ...;
>>>> > var partnerIds = ...;
>>>> >
>>>> > return Stream.concat(
>>>> > studentIds.stream(),
>>>> > teacherIds.stream(),
>>>> > partnerIds.stream() // oops, this one doesn't work
>>>> > )
>>>> >
>>>> > so I had to transform concat to a rather ugly
>>>> > Stream.concat(
>>>> > studentIds.stream(),
>>>> > Stream.concat(
>>>> > teacherIds.stream(),
>>>> > partnerIds.stream()
>>>> > )
>>>> > )
>>>> >
>>>> > Later on I had to add 4th stream of a single element (Stream.of), and
>>>> this one became even more ugly
>>>> >
>>>> > When I first wrote third argument to concat and saw that IDE
>>>> highlights it as error, I was very surprised. This design seems
>>>> inconsistent not only with the whole java stdlib, but even with Stream.of
>>>> static method of the same class. Is there any particular reason why concat
>>>> takes exactly to arguments?
>>>> >
>>>> > I would say that, even if just in a form of sugar method that just
>>>> does reduce on array (varagrs) of streams, this would be a great quality of
>>>> life improvement, but I'm sure there also may be some room for performance
>>>> improvement.
>>>> >
>>>> > Of course, there are workarounds like Stream.of + flatmap, but:
>>>> >
>>>> > 1. It gets messy when trying to concat streams of literal elements
>>>> set (Stream.of) and streams of collections or arrays
>>>> > 2. It certainly has significant performance overhead
>>>> > 3. It still doesn't explain absence of varagrs overload of concat
>>>> >
>>>> > So, once again, is there any particular reason to restrict arguments
>>>> list to exactly two streams? If not, I would be happy to contribute
>>>> Stream.concat(Stream... streams) overload.
>>>> >
>>>> > Best regards
>>>> >
>>>> >
>>>> >
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250928/058e51ae/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 180396 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250928/058e51ae/image-0001.png>
More information about the core-libs-dev
mailing list