Parallel processing of lines ina file (was RE: Basic functional style question)

Thu Nov 28 02:52:36 PST 2013

Hi,

On Nov 28, 2013, at 11:24 AM, "Millies, Sebastian" <Sebastian.Millies at softwareag.com> wrote:

> Hello there,
> 
> Assuming very large log files, how well does that approach parallelize?
> Can/should I use  "Files.lines(path).parallel()" ?
> 

You can, but you might want to measure. Since the log is very large there is a good chance of some speed up, especially so if the cost of processing an element through the pipeline is high, but may also depend on the terminal operation and whether order is important or not.

> I assume that the JRE would buffer blocks of lines in an array, the elements of which would then
> be processed in parallel. Is that right?

Yes.

> Is there any way to tweak the buffer size etc.?

No, unless you write your own Spliterator (see below).

> And will the next
> block be read in parallel to processing the previous one, or after processing of each block has finished?
> 

In parallel, with only at most one thread copying lines into an array.

The problem can be reduced to splitting from an Iterator. Splitting will copy a prefix of elements from the Iterator into an array and return a Spliterator wrapping that array. Here is some internal documentation from one of the Spltierator from Iterator implementations:

            /*
             * Split into arrays of arithmetically increasing batch
             * sizes.  This will only improve parallel performance if
             * per-element Consumer actions are more costly than
             * transferring them into an array.  The use of an
             * arithmetic progression in split sizes provides overhead
             * vs parallelism bounds that do not particularly favor or
             * penalize cases of lightweight vs heavyweight element
             * operations, across combinations of #elements vs #cores,
             * whether or not either are known.  We generate
             * O(sqrt(#elements)) splits, allowing O(sqrt(#cores))
             * potential speedup.
             */

The initial batch size is 1024 and increases arithmetically by 1024 for each split. Originally the initial size was 1 and the size increased arithmetically by 1, but this created much larger right-balanced trees that unduly affected the performance of reduction operations. Note that many collection implementations also use this technique (such as LinkedList or ConcurrentLinkedQueue).

So if you have less than <= 1024 elements you are unfortunately out of luck and are not gonna observe any speed up. We could improve matters here if we know the size of the input (which for processing lines is likely not to be the case, but will be for LinkedList) or in the future expose some control over the arithmetic progression. The smaller the size the higher the cost of per-element-processing will need to be. In general it is a very tricky problem to automate.

Paul.