Stream.concat with varagrs

Sun Sep 28 18:26:30 UTC 2025

After some further work, I have managed to come up with an implementation
<https://github.com/Evemose/nconcat/blob/master/src/main/java/nconcat/NConcatSpliterator.java>
that
beats both guava and standard streams even in size-sensitive operations,
like toList. <https://github.com/Evemose/nconcat/blob/master/results.txt>

Overall, the theoretical complexity is as follows:
1. New implementation has O(1) complexity for tryAdvance and trySplit
methods, as opposed to O(n) for reduction-style two-args concat
2. forEachRemaining has similar O(n) complexity for both implementations,
but Stream.concat has constant factor of 2, so effectively its O(n) vs
O(2n), resulting in slight edge of new implementation.over reduction concat

I have created a plots of avg time for standard concat,
guava implementation and new array-based implementation:

[image: image.png]
As you can see, graphs (at least to the best of my knowledge) confirms my
analysis: for findFirst, differences grow logarithmically, while for other
operations approximately linearly
As to numbers, here is the comparison table

  Operation Streams NConcat_Time NConcat_Error Guava_Time Guava_Error
Standard_Time Standard_Error NConcat_Improvement Guava_Improvement
0 Basic Concat 4 1860.9 ± 27.5 27,54 2018.5 ± 124.4 124,406 1910.3 ± 18.3
18,326 +2.6% -5.7%
1 Basic Concat 8 3754.5 ± 36.0 35,967 3881.6 ± 34.2 34,188 3912.1 ± 43.6
43,646 +4.0% +0.8%
2 Basic Concat 16 7687.6 ± 110.6 110,61 7828.5 ± 346.7 346,685 8343.8 ± 61.6
61,578 +7.9% +6.2%
3 Basic Concat 32 15434.0 ± 200.3 200,266 16237.1 ± 1204.9 1204,86 18091.9
± 645.0 644,99 +14.7% +10.3%
4 Basic Concat 64 32563.0 ± 349.2 349,17 33429.5 ± 774.2 774,168 43664.2 ±
603.8 603,832 +25.4% +23.4%
5 FindFirst 4 92.1 ± 7.9 7,922 100.4 ± 6.8 6,801 123.8 ± 2.2 2,187 +25.6%
+18.9%
6 FindFirst 8 150.0 ± 2.8 2,793 162.3 ± 3.0 2,983 300.2 ± 3.3 3,288 +50.0%
+45.9%
7 FindFirst 16 275.2 ± 7.3 7,32 288.2 ± 5.4 5,412 877.9 ± 11.0 10,969 +68.7%
+67.2%
8 FindFirst 32 533.6 ± 10.9 10,856 539.2 ± 13.6 13,566 2858.9 ± 55.2 55,236
+81.3% +81.1%
9 FindFirst 64 1069.1 ± 17.4 17,419 1057.2 ± 20.4 20,366 11695.7 ± 103.9
103,941 +90.9% +91.0%
10 ToList 4 6565.2 ± 50.8 50,802 6899.2 ± 47.0 46,966 6756.1 ± 74.1 74,113
+2.8% -2.1%
11 ToList 8 13198.9 ± 113.8 113,804 13624.0 ± 122.2 122,223 13661.3 ± 157.4
157,366 +3.4% +0.3%
12 ToList 16 25234.2 ± 149.5 149,456 25822.8 ± 322.4 322,39 26686.5 ± 170.3
170,31 +5.4% +3.2%
13 ToList 32 48063.4 ± 1866.3 1866,299 46933.2 ± 414.2 414,197 53300.7 ±
2045.3 2045,28 +9.8% +11.9%
14 ToList 64 97838.0 ± 1574.1 1574,103 96473.1 ± 635.4 635,351 104342.7 ±
1159.1 1159,147 +6.2% +7.5%
15 ToListWithFilter 4 9368.4 ± 239.8 239,793 12481.3 ± 321.7 321,749 9776.4
± 212.0 212,029 +4.2% -27.7%
16 ToListWithFilter 8 20166.3 ± 964.2 964,186 26528.9 ± 390.2 390,212 20158.0
± 1482.1 1482,1 -0.0% -31.6%
17 ToListWithFilter 16 44067.1 ± 1090.0 1090,034 55530.4 ± 2455.1
2455,139 41009.5
± 1464.2 1464,204 -7.5% -35.4%
18 ToListWithFilter 32 89627.8 ± 2522.7 2522,718 109866.0 ± 2850.2
2850,223 81724.4
± 1279.5 1279,479 -9.7% -34.4%
19 ToListWithFilter 64 170779.8 ± 8213.1 8213,117 220987.1 ± 4705.0
4705,017 176520.6
± 3400.0 3400,028 +3.3% -25.2%
Besides strange (simingly of artifact nature, I guess) results regarding
filter + toList benchmark (filter is used to drop SIZED characteristic),
benchmarks show clear improvement for non-greedy operations, such as
findFirst, as well as slight improvements for toList and pretty significant
for forEach, which are internally using forEachRemaining instead of
tryAdvance. In summary, tryAdvance-based operations get significant, log(n)
improvement from the new implementation, while forEachRemaining-based show
clear improvements for some cases (forEach), while more controversial for
others (toList).

Now, I would like to, once again, reiterate over benefits of proposed
addition:

1. DX improvement and improved concitency with existing APIs like Stream.of
(though the new inconcistency emerges with methods like Math.max/min)
2. Improved flexibility
3. Signficant performance improvements for tryAdvance-based operations,
while still noticable ones for forEachRemaining-based as well
4. Relatively small change size: implementation of the new Spliterator is
completely independent, non-intrusive, and only takes 90 lines of code as
of now.

So now, that I have a relatively mature prototype, I would like to ask the
community for feedback: do you see this addition as valuable enough to be
present in the JDK?

Best regards

PS: Raw number could be found by the second link of the mail, or manually
in the root of the implementation repository

On Wed, Sep 17, 2025 at 11:21 PM Éamonn McManus <emcmanus at google.com> wrote:

> Guava has a Streams.concat
> <https://github.com/google/guava/blob/e9ea5a982cad06ebd223ec6fdb5294eeb18654f6/guava/src/com/google/common/collect/Streams.java#L187> method
> that may suggest implementation possibilities.
>
> On Wed, 17 Sept 2025 at 13:16, Olexandr Rotan <rotanolexandr842 at gmail.com>
> wrote:
>
>> So i have played around a bit and managed to come up with some
>> implementation based on array of streams, you can find it here:
>> https://github.com/Evemose/nconcat/blob/master/src/main/java/nconcat/NConcatSpliterator.java
>>
>> I have also added a small benchmark to the project, and the numbers are:
>>
>> Benchmark                                              (streamCount)
>>  Mode  Cnt       Score       Error  Units
>> NConcatBenchmark.nConcatFindFirst                                  4
>>  avgt   10     131.616 �    15.474  ns/op
>> NConcatBenchmark.nConcatFindFirst                                  8
>>  avgt   10     187.929 �     6.544  ns/op
>> NConcatBenchmark.nConcatFindFirst                                 16
>>  avgt   10     322.342 �     6.940  ns/op
>> NConcatBenchmark.nConcatFindFirst                                 32
>>  avgt   10     659.856 �    85.509  ns/op
>> NConcatBenchmark.nConcatFindFirst                                 64
>>  avgt   10    1214.133 �    22.156  ns/op
>> NConcatBenchmark.nConcatMethod                                     4
>>  avgt   10    1910.150 �    25.269  ns/op
>> NConcatBenchmark.nConcatMethod                                     8
>>  avgt   10    3865.364 �   112.536  ns/op
>> NConcatBenchmark.nConcatMethod                                    16
>>  avgt   10    7743.097 �    74.655  ns/op
>> NConcatBenchmark.nConcatMethod                                    32
>>  avgt   10   15840.551 �   440.659  ns/op
>> NConcatBenchmark.nConcatMethod                                    64
>>  avgt   10   32891.336 �  1122.630  ns/op
>> NConcatBenchmark.nConcatToListWithFilter                           4
>>  avgt   10    9527.120 �   376.325  ns/op
>> NConcatBenchmark.nConcatToListWithFilter                           8
>>  avgt   10   20260.027 �   552.444  ns/op
>> NConcatBenchmark.nConcatToListWithFilter                          16
>>  avgt   10   44724.856 �  5040.069  ns/op
>> NConcatBenchmark.nConcatToListWithFilter                          32
>>  avgt   10   82577.518 �  2050.955  ns/op
>> NConcatBenchmark.nConcatToListWithFilter                          64
>>  avgt   10  181460.219 � 20809.669  ns/op
>> NConcatBenchmark.nconcatToList                                     4
>>  avgt   10    9268.814 �   712.883  ns/op
>> NConcatBenchmark.nconcatToList                                     8
>>  avgt   10   18164.147 �   786.803  ns/op
>> NConcatBenchmark.nconcatToList                                    16
>>  avgt   10   35146.891 �   966.871  ns/op
>> NConcatBenchmark.nconcatToList                                    32
>>  avgt   10   68944.262 �  5321.730  ns/op
>> NConcatBenchmark.nconcatToList                                    64
>>  avgt   10  136845.984 �  3491.562  ns/op
>> NConcatBenchmark.standardStreamConcat                              4
>>  avgt   10    1951.522 �    85.130  ns/op
>> NConcatBenchmark.standardStreamConcat                              8
>>  avgt   10    3990.410 �   190.517  ns/op
>> NConcatBenchmark.standardStreamConcat                             16
>>  avgt   10    8599.869 �   685.878  ns/op
>> NConcatBenchmark.standardStreamConcat                             32
>>  avgt   10   17923.603 �   361.874  ns/op
>> NConcatBenchmark.standardStreamConcat                             64
>>  avgt   10   46797.408 �  4458.069  ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst                     4
>>  avgt   10     125.192 �     3.123  ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst                     8
>>  avgt   10     303.791 �     8.670  ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst                    16
>>  avgt   10     907.429 �    52.620  ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst                    32
>>  avgt   10    2964.749 �   320.141  ns/op
>> NConcatBenchmark.standardStreamConcatFindFirst                    64
>>  avgt   10   11749.653 �   189.300  ns/op
>> NConcatBenchmark.standardStreamConcatToList                        4
>>  avgt   10    7059.642 �   740.735  ns/op
>> NConcatBenchmark.standardStreamConcatToList                        8
>>  avgt   10   13714.980 �   250.208  ns/op
>> NConcatBenchmark.standardStreamConcatToList                       16
>>  avgt   10   27028.052 �   565.047  ns/op
>> NConcatBenchmark.standardStreamConcatToList                       32
>>  avgt   10   53537.731 �   853.363  ns/op
>> NConcatBenchmark.standardStreamConcatToList                       64
>>  avgt   10  105847.755 �  3179.918  ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter              4
>>  avgt   10    9736.527 �   154.817  ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter              8
>>  avgt   10   20607.061 �   713.083  ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter             16
>>  avgt   10   41241.199 �  1171.672  ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter             32
>>  avgt   10   83029.244 �  1843.176  ns/op
>> NConcatBenchmark.standardStreamConcatToListWithFilter             64
>>  avgt   10  182349.009 � 11282.832  ns/op
>>
>> Basically, the conclusion is following (guilty of using AI for
>> summarizing):
>>
>> The comprehensive benchmarks reveal that *NConcat significantly
>>> outperforms the standard library for processing-intensive operations*
>>> while trailing in simple collection scenarios. For short-circuit operations
>>> like findFirst(), NConcat delivers 38-90% better performance as stream
>>> count increases, reaching nearly 10x faster execution at 64 streams due to
>>> superior scaling (19ns/stream vs 184ns/stream). Full traversal operations
>>> like forEach consistently favor NConcat by 2-30%, with the advantage
>>> growing at scale. However, simple collection operations (toList())
>>> consistently run 22-24% faster with the standard library across all stream
>>> counts.
>>
>>
>>
>> I have tried multiple approaches to optimize toList with know size of all
>> sub-streams (which is clearly the reason why standard implementation wins
>> here), and am sure that there is still plenty of room for improvement,
>> especially in parallel, but the takeaway is, even a naive implementation
>> like mine could bring a significant performance improvement to the table in
>> early short-circuiting and full traversal cases that do not depend on size
>> of the spliterator.
>>
>> Besides the performance part, of course, the most significant advantage
>> of my proposal, as I think, is still developer experience, both reading and
>> writing stream code.
>>
>> Please let me know your thoughts on the results of prototype and possible
>> ways forward.
>>
>> Best regards
>>
>> On Wed, Sep 17, 2025 at 6:04 PM Olexandr Rotan <
>> rotanolexandr842 at gmail.com> wrote:
>>
>>> Hello everyone! Thanks for your responses
>>>
>>> I will start of by answering to Viktor
>>>
>>> I guess a "simple" implementation of an N-ary concat could work, but it
>>>> would have performance implications (think a recursive use of
>>>> Stream.concat())
>>>
>>>
>>> I too find just the addition of small reduction-performing sugar methods
>>> rather unsatisfactory and most certainly not bringing enough value to be
>>> considered a valuable addition. Moreover, I have not checked it myself, but
>>> I would dare to guess that popular utility libraries such as Guava or
>>> Apache Commons already provide this sort of functionality in their utility
>>> classes. Though, if this method could bring some significant performance
>>> benefits, I think it may be a valuable candidate to consider. Though, to me
>>> as a user, the main value would be uniformity of the API and ease of use
>>> and read. The main reason I am writing about this in the first place is the
>>> unintuitive inconsistency with many other static methods-creators that
>>> happily accept varargs
>>>
>>> I may play around with this spliterator code you have linked to to see
>>> if I could make it generalized for arrays of streams
>>>
>>> Now, answering to Pavel
>>>
>>> Is it such a useful use case, though? I mean, it's no different from
>>>> SequenceInputStream(...) or Math.min/max for that matter. I very rarely
>>>> have to do Math.min(a, Math(min(b, c)) or some such.
>>>
>>>
>>> I certainly see your point, but I would dare to say that most
>>> applications rely on the streams much more than SequenceInputStream and
>>> Math classes, and their lookalikes. Stream.concat is primarily a way to
>>> merge a few datasource outputs into one, for later uniform processing,
>>> which, in the nutshell, is one of the most common tasks in data-centric
>>> applications. Of course, not every such use case has characteristics that
>>> incline developers to use Stream.concat, such as combination of Stream.of
>>> and Collection.stream() sources, and even if they do, not every case that
>>> fits previous requirement requires to merge more than 2 sources. However,
>>> for mid-to-large scale apps, for which java is known the most, I would say
>>> it's fairly common. I went over our codebase and found that there were at
>>> least 10+ usages of concat, and a few of them followed this kinda ugly
>>> pattern of nested concates.
>>>
>>> Separately, it's not just one method. Consider that `concat` is also
>>>> implemented in specialized streams such as IntStream, DoubleStream, and
>>>> LongStream.
>>>
>>>
>>> This is unfortunate, but I would dare to say that once Reference
>>> spliterrator is implemented, others may also be derived by analogy fairly
>>> quickly
>>>
>>> And last but not least, answering Daniel
>>>
>>> Not immediately obvious but you can create a Stream<Stream<T>> using
>>>> Stream.of and reduce that using Stream::concat to obtain a Stream<T>.
>>>
>>> Something along those lines:
>>>
>>> ```
>>>> var stream = Stream.of(Stream.of(1,2,3), Stream.of(4), Stream.of(5, 6,
>>>> 7, 8)).reduce(Stream.empty(), Stream::concat, Stream::concat);
>>>
>>>
>>> This is what I meant by "reduction-like" implementation, which is fairly
>>> straightforward, but just from the looks of it, one could assume that this
>>> solution will surely have performance consequences, even if using  flatmap
>>> insead of reduce. Not sure though, how often people would want to use such
>>> approach on the array of streams huge enough for the performance difference
>>> to be noticable, though I would assume that there is a non-linear scale of
>>> consumed time and resources from the length of streams array due to the
>>> implementation of concat method.
>>>
>>> Nevertheless, this is an acceptable workaround for such cases, even
>>> though not the most readable one. Even if this approach is accepted as
>>> sufficient for such cases of n-sized array of streams merging, It would
>>> probably make some sense to put note about it in the docs of the concat
>>> method. Though, not having concat(Stream..) overload would still remain
>>> unintuitive for many developers, including me
>>>
>>> Thanks everybody for the answers again
>>>
>>> Best regards
>>>
>>> On Wed, Sep 17, 2025 at 5:15 PM Pavel Rappo <pavel.rappo at gmail.com>
>>> wrote:
>>>
>>>> >  this would be a great quality of life improvement
>>>>
>>>> Is it such a useful use case, though? I mean, it's no different from
>>>> SequenceInputStream(...) or Math.min/max for that matter. I very
>>>> rarely have to do Math.min(a, Math(min(b, c)) or some such. And those
>>>> methods predate streams API by more than a decade.
>>>>
>>>> Separately, it's not just one method. Consider that `concat` is also
>>>> implemented in specialized streams such as IntStream, DoubleStream,
>>>> and LongStream.
>>>>
>>>> On Wed, Sep 17, 2025 at 2:58 PM Olexandr Rotan
>>>> <rotanolexandr842 at gmail.com> wrote:
>>>> >
>>>> > Greetings to everyone on the list.
>>>> >
>>>> > When working on some routine tasks recently, I have encountered a,
>>>> seemingly to me, strange decision in design of Stream.concat method,
>>>> specifically the fact that it accepts exactly two streams. My concrete
>>>> example was something along the lines of
>>>> >
>>>> > var studentIds = ...;
>>>> > var teacherIds = ...;
>>>> > var partnerIds = ...;
>>>> >
>>>> > return Stream.concat(
>>>> >     studentIds.stream(),
>>>> >     teacherIds.stream(),
>>>> >     partnerIds.stream() // oops, this one doesn't work
>>>> > )
>>>> >
>>>> > so I had to transform concat to a rather ugly
>>>> > Stream.concat(
>>>> >     studentIds.stream(),
>>>> >     Stream.concat(
>>>> >         teacherIds.stream(),
>>>> >         partnerIds.stream()
>>>> >     )
>>>> > )
>>>> >
>>>> > Later on I had to add 4th stream of a single element (Stream.of), and
>>>> this one became even more ugly
>>>> >
>>>> > When I first wrote third argument to concat and saw that IDE
>>>> highlights it as error, I was very surprised. This design seems
>>>> inconsistent not only with the whole java stdlib, but even with Stream.of
>>>> static method of the same class. Is there any particular reason why concat
>>>> takes exactly to arguments?
>>>> >
>>>> > I would say that, even if just in a form of sugar method that just
>>>> does reduce on array (varagrs) of streams, this would be a great quality of
>>>> life improvement, but I'm sure there also may be some room for performance
>>>> improvement.
>>>> >
>>>> > Of course, there are workarounds like Stream.of + flatmap, but:
>>>> >
>>>> > 1. It gets messy when trying to concat streams of literal elements
>>>> set (Stream.of) and streams of collections or arrays
>>>> > 2. It certainly has significant performance overhead
>>>> > 3. It still doesn't explain absence of varagrs overload of concat
>>>> >
>>>> > So, once again, is there any particular reason to restrict arguments
>>>> list to exactly two streams? If not, I would be happy to contribute
>>>> Stream.concat(Stream... streams) overload.
>>>> >
>>>> > Best regards
>>>> >
>>>> >
>>>> >
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250928/058e51ae/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 180396 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250928/058e51ae/image-0001.png>