Question about Streams, Gatherers, and fetching too many elements

David Alayachew davidalayachew at gmail.com
Mon Nov 11 13:46:20 UTC 2024


Hello Core Libs Dev Team,

I was trying out Gatherers for a project at work, and ran into a rather sad
scenario.

I need to process a large file in batches. Each batch is small enough that
I can hold it in memory, but I cannot hold the entire file (and thus, all
of the batches) in memory at once.

Looking at the Gatherers API, I saw windowFixed and thought that it would
be a great match for my use case.

However, when trying it out, I was disappointed to see that it ran out of
memory very quickly. Here is my attempt at using it.

stream
.parallel()
.unordered()
.gather(Gatherers.windowFixed(BATCH_SIZE))
.forEach(eachList -> println(eachList.getFirst()))
;

As you can see, I am just splitting the file into batches, and printing out
the first of each batch. This is purely for example's sake, of course. I
had planned on building even more functionality on top of this, but I
couldn't even get past this example.

But anyways, not even a single one of them printed out. Which leads me to
believe that it's pulling all of them in the Gatherer.

I can get it to run successfully if I go sequentially, but not parallel.
Parallel gives me that out of memory error.

Is there any way for me to be able to have the Gatherer NOT pull in
everything while still remaining parallel and unordered?

Thank you for your time and help.
David Alayachew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20241111/4987d5c1/attachment.htm>


More information about the core-libs-dev mailing list