<p dir="ltr">Thanks Viktor, this was what I was looking for.</p>
<p dir="ltr">Ok, so due to existing difficulties in translating push to pull, it makes sense why this wouldn't work.</p>
<p dir="ltr">I really look forward to upgrading my codebase soon. Gatherers fixed window completely negates this problem. At least, it appears to.</p>
<br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 21, 2024, 6:54 AM Viktor Klang <<a href="mailto:viktor.klang@oracle.com">viktor.klang@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi David,<br>
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Stream::spliterator() and Stream::iterator() suffer from inherent limitations (see
<a href="https://bugs.openjdk.org/browse/JDK-8268483" id="m_4661393143754468959LPlnk723584" target="_blank" rel="noreferrer">https://bugs.openjdk.org/browse/JDK-8268483</a> ) because they attempt to convert push-style streams into pull-style constructs (Iterator, Spliterator). Since the only way to know if there's
something to pull is for something to get pushed, but who's doing to pushing (answer: the same one who tries to do the pulling)?<br>
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
For sequential Streams this is often as problematic, but for parallel streams it's not the caller which evaluates the stream but rather a task-tree submitted to a ForkJoinPool. You can see the implementation here:
<a href="https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/StreamSpliterators.java#L272" id="m_4661393143754468959LPlnk" target="_blank" rel="noreferrer">
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/StreamSpliterators.java#L272</a></div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
As an example, Stream::flatMap(…) had corner cases where its use of Stream::spliterator() would end up not terminating, which was fixed in Java 23:
<a href="https://github.com/openjdk/jdk/pull/18625" id="m_4661393143754468959LPlnk" target="_blank" rel="noreferrer">https://github.com/openjdk/jdk/pull/18625</a></div>
<div style="color:inherit;background-color:inherit">
<div id="m_4661393143754468959LPBorder_GTaHR0cHM6Ly9naXRodWIuY29tL29wZW5qZGsvamRrL3B1bGwvMTg2MjU." style="width:100%;margin-top:16px;margin-bottom:16px;max-width:800px;min-width:424px">
<table id="m_4661393143754468959LPContainer886680" role="presentation" style="padding:12px 36px 12px 12px;width:100%;border-width:1px;border-style:solid;border-color:rgb(200,200,200);border-radius:2px">
<tbody>
<tr valign="top" style="border-spacing:0px">
<td>
<div id="m_4661393143754468959LPImageContainer886680" style="margin-right:12px;height:120px;overflow:hidden;width:240px">
<a id="m_4661393143754468959LPImageAnchor886680" href="https://github.com/openjdk/jdk/pull/18625" target="_blank" rel="noreferrer"><img id="m_4661393143754468959LPThumbnailImageId886680" alt="" height="120" style="display:block" width="240" src="https://opengraph.githubassets.com/0c04ee3059849e8b8dd4ac103bc00db5b08ff8b3bd56c826c9e7ea86105f4824/openjdk/jdk/pull/18625"></a></div>
</td>
<td style="width:100%">
<div id="m_4661393143754468959LPTitle886680" style="font-size:21px;font-weight:300;margin-right:8px;font-family:wf_segoe-ui_light,"Segoe UI Light","Segoe WP Light","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif;margin-bottom:12px">
<a id="m_4661393143754468959LPUrlAnchor886680" href="https://github.com/openjdk/jdk/pull/18625" style="text-decoration:none" target="_blank" rel="noreferrer">8196106: Support nested infinite or recursive flat mapped streams by viktorklang-ora · Pull Request #18625 · openjdk/jdk</a></div>
<div id="m_4661393143754468959LPDescription886680" style="font-size:14px;max-height:100px;color:rgb(102,102,102);font-family:wf_segoe-ui_normal,"Segoe UI","Segoe WP",Tahoma,Arial,sans-serif;margin-bottom:12px;margin-right:8px;overflow:hidden">
This PR implements Gatherer-inspired encoding of flatMap that shows that it is both competitive performance-wise as well as improve correctness. Below is the performance of Stream::flatMap (for ref...</div>
<div id="m_4661393143754468959LPMetadata886680" style="font-size:14px;font-weight:400;color:rgb(166,166,166);font-family:wf_segoe-ui_normal,"Segoe UI","Segoe WP",Tahoma,Arial,sans-serif">
<a href="http://github.com" target="_blank" rel="noreferrer">github.com</a></div>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_4661393143754468959Signature" style="color:inherit">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Cheers,<br>
√</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b><br>
</b></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<b>Viktor Klang</b></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Software Architect, Java Platform Group<br>
Oracle</div>
</div>
<div id="m_4661393143754468959appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_4661393143754468959divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> core-libs-dev <<a href="mailto:core-libs-dev-retn@openjdk.org" target="_blank" rel="noreferrer">core-libs-dev-retn@openjdk.org</a>> on behalf of David Alayachew <<a href="mailto:davidalayachew@gmail.com" target="_blank" rel="noreferrer">davidalayachew@gmail.com</a>><br>
<b>Sent:</b> Saturday, 19 October 2024 07:54<br>
<b>To:</b> core-libs-dev <<a href="mailto:core-libs-dev@openjdk.org" target="_blank" rel="noreferrer">core-libs-dev@openjdk.org</a>><br>
<b>Subject:</b> Streams, parallelization, and OOME.</font>
<div> </div>
</div>
<div>
<div dir="auto">
<div dir="auto">Hello Core Libs Dev Team,</div>
<div dir="auto"><br>
</div>
<div dir="auto">I have a file that I am streaming from a service, and I am trying to split into multiple parts based on a certain attribute found on each line. I am sending each part up to a different service.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I am using BufferedReader.lines(). However, I cannot read the whole file into memory because it is larger than the amount of RAM that I have on the machine. So, since I don't have access to Java 22's Preview Gatherers Fixed Window, I used the
iterator() method on my stream, wrapped that in another iterator that can grab my batch size worth of data, then built a spliterator from that that I then used to create a new stream. In short, this wrapper iterator isn't Iterator<T>, it's Iterator<List<T>>.</div>
<div dir="auto"><br>
</div>
<div dir="auto">When I ran this sequentially, everything worked well. However, my CPU was low and we definitely have a performance problem -- our team needs this number as fast as we can get. Plus, we had plenty of network bandwidth to spare, so I had (imo)
good reason to go use parallelism.</div>
<div dir="auto"><br>
</div>
<div dir="auto">As soon as I turned on parallelism, the stream's behaviour changed completely. Instead of fetching the batch and processing, it started grabbing SEVERAL BATCHES and processing NONE OF THEM. Or at the very least, it grabbed so many batches that
it ran out of memory before it could get to processing them.</div>
<div dir="auto"><br>
</div>
<div dir="auto">To give some numbers, this is a 4 core machine. And we can safely hold about 30-40 batches worth of data in memory before crashing. But again, when running sequentially, this thing only grabs 1 batch, processes that one batch, sends out the
results, and then start the next one, all as expected. I thought that adding parallelism would simply make it so that we have this happening 4 or 8 times at once.</div>
<div dir="auto"><br>
</div>
<div dir="auto">After a very long period of digging, I managed to find this link.</div>
<div dir="auto"><br>
</div>
<div dir="auto"><a href="https://stackoverflow.com/questions/30825708/java-8-using-parallel-in-a-stream-causes-oom-error" target="_blank" rel="noreferrer">https://stackoverflow.com/questions/30825708/java-8-using-parallel-in-a-stream-causes-oom-error</a></div>
<div dir="auto"><br>
</div>
<div dir="auto">Tagir Valeev gives an answer which doesn't go very deep into the "why" at all. And the answer is more directed to the user's specific question as opposed to solving this particular problem.</div>
<div dir="auto"><br>
</div>
<div dir="auto">After digging through a bunch of other solutions (plus my own testing), it seems that the answer is that the engine that does parallelization for Streams tries to grab a large enough "buffer" before doing any parallel processing. I could be
wrong, and how large that buffer is? I have no idea.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Regardless, that's about where I gave up and went sequential, since the clock was ticking.</div>
<div dir="auto"><br>
</div>
<div dir="auto">But I still have a performance problem. How would one suggest going about this in Java 8?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Thank you for your time and help.</div>
<div dir="auto">David Alayachew</div>
<div dir="auto"><br>
</div>
</div>
</div>
</div>
</blockquote></div>