<p dir="ltr">Thanks for the workaround. It's running beautifully.</p>
<p dir="ltr">Is there a future where this island concept is extended to the rest of streams? Tbh, I don't fully understand it.</p>
<br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 11, 2024, 9:59 AM Viktor Klang <<a href="mailto:viktor.klang@oracle.com">viktor.klang@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi David,<br>
<br>
This is the effect of how parallel streams are implemented, where different stages, which are not representible as a join-less Spliterator are executed as a series of "islands" where the next isn't started until the former has completed.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
If you think about it, parallelization of a Stream works best when the entire data set can be split amongst a set of worker threads, and that sort of implies that you want eager pre-fetch of data, so if your dataset does not fit in memory, that is likely to
lead to less desirable outcomes.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
What I was able to do for Gatherers is to implement "gather(…) + collect(…)"-fusion so any number of consecutive gather(…)-operations immediately followed by a collect(…) is run in the same "island".<br>
<br>
</div>
<div style="direction:ltr;text-align:left;text-indent:0px;line-height:normal;background-color:rgb(255,255,255);margin:0px;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
So with that said, you could try something like the following:<br>
<br>
</div>
<div style="direction:ltr;text-align:left;text-indent:0px;line-height:normal;background-color:rgb(255,255,255);margin:0px;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
static <T> Collector<T, ?, Void> <b>forEach</b>(Consumer<? <u>super</u> T> <b>each</b>) {</div>
<div style="direction:ltr;text-align:left;text-indent:0px;line-height:normal;background-color:rgb(255,255,255);margin:0px;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<u>return</u> Collector.of(() -> null, (<b>v</b>, <b>e</b>) -> each.accept(e), (<b>l</b>,
<b>r</b>) -> l, (<b>v</b>) -> null, Collector.Characteristics.IDENTITY_FINISH);<br>
}</div>
<div style="direction:ltr;text-align:left;text-indent:0px;margin:0px;color:rgb(0,0,0)">
<span style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt"><br>
<br>
</span><span style="font-family:"Segoe UI","Segoe UI Web (West European)",-apple-system,"system-ui",Roboto,"Helvetica Neue",sans-serif;font-size:15px">stream</span></div>
<div style="direction:ltr;text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:"Segoe UI","Segoe UI Web (West European)",-apple-system,"system-ui",Roboto,"Helvetica Neue",sans-serif;font-size:15px;color:rgb(0,0,0)">
.parallel()</div>
<div style="direction:ltr;text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:"Segoe UI","Segoe UI Web (West European)",-apple-system,"system-ui",Roboto,"Helvetica Neue",sans-serif;font-size:15px;color:rgb(0,0,0)">
.unordered()</div>
<div style="direction:ltr;text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:"Segoe UI","Segoe UI Web (West European)",-apple-system,"system-ui",Roboto,"Helvetica Neue",sans-serif;font-size:15px;color:rgb(0,0,0)">
.gather(Gatherers.windowFixed(BATCH_SIZE))</div>
<div style="direction:ltr;text-align:left;text-indent:0px;background-color:rgb(255,255,255);margin:0px;font-family:"Segoe UI","Segoe UI Web (West European)",-apple-system,"system-ui",Roboto,"Helvetica Neue",sans-serif;font-size:15px;color:rgb(0,0,0)">
.collect(forEach(eachList -> println(eachList.getFirst())));</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_7527700827726692131Signature" style="color:inherit">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Cheers,<br>
√</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b><br>
</b></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b>Viktor Klang</b></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Software Architect, Java Platform Group<br>
Oracle</div>
</div>
<div id="m_7527700827726692131appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_7527700827726692131divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> core-libs-dev <<a href="mailto:core-libs-dev-retn@openjdk.org" target="_blank" rel="noreferrer">core-libs-dev-retn@openjdk.org</a>> on behalf of David Alayachew <<a href="mailto:davidalayachew@gmail.com" target="_blank" rel="noreferrer">davidalayachew@gmail.com</a>><br>
<b>Sent:</b> Monday, 11 November 2024 14:52<br>
<b>To:</b> core-libs-dev <<a href="mailto:core-libs-dev@openjdk.org" target="_blank" rel="noreferrer">core-libs-dev@openjdk.org</a>><br>
<b>Subject:</b> Re: Question about Streams, Gatherers, and fetching too many elements</font>
<div> </div>
</div>
<div>
<div dir="auto">
<div>And just to avoid the obvious question, I can hold about 30 batches in memory before the Out of Memory error occurs. So this is not an issue of my batch size being too high.</div>
<div dir="auto"><br>
</div>
<div dir="auto">But just to confirm, I set the batch size to 1, and it still ran into an out of memory error. So I feel fairly confident saying that the Gatherer is trying to grab all available data before sending any of it downstream.<br>
<br>
<div dir="auto">
<div dir="ltr">On Mon, Nov 11, 2024, 8:46 AM David Alayachew <<a href="mailto:davidalayachew@gmail.com" target="_blank" rel="noreferrer">davidalayachew@gmail.com</a>> wrote:<br>
</div>
<blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="auto">Hello Core Libs Dev Team,
<div dir="auto"><br>
</div>
<div dir="auto">I was trying out Gatherers for a project at work, and ran into a rather sad scenario.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I need to process a large file in batches. Each batch is small enough that I can hold it in memory, but I cannot hold the entire file (and thus, all of the batches) in memory at once.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Looking at the Gatherers API, I saw windowFixed and thought that it would be a great match for my use case.</div>
<div dir="auto"><br>
</div>
<div dir="auto">However, when trying it out, I was disappointed to see that it ran out of memory very quickly. Here is my attempt at using it.</div>
<div dir="auto"><br>
</div>
<div dir="auto">stream</div>
<div dir="auto">.parallel()</div>
<div dir="auto">.unordered()</div>
<div dir="auto">.gather(Gatherers.windowFixed(BATCH_SIZE))</div>
<div dir="auto">.forEach(eachList -> println(eachList.getFirst()))</div>
<div dir="auto">;</div>
<div dir="auto"><br>
</div>
<div dir="auto">As you can see, I am just splitting the file into batches, and printing out the first of each batch. This is purely for example's sake, of course. I had planned on building even more functionality on top of this, but I couldn't even get past
this example.</div>
<div dir="auto"><br>
</div>
<div dir="auto">But anyways, not even a single one of them printed out. Which leads me to believe that it's pulling all of them in the Gatherer.</div>
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">I can get it to run successfully if I go sequentially, but not parallel. Parallel gives me that out of memory error.</div>
<div dir="auto"><br>
</div>
</div>
<div dir="auto">Is there any way for me to be able to have the Gatherer NOT pull in everything while still remaining parallel and unordered?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Thank you for your time and help.</div>
<div dir="auto">David Alayachew</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote></div>