Regex Point Lambdafication Patch

Sun Mar 10 12:45:05 PDT 2013

Might work in some cases, but puts you on the wrong side of Amdahl's Law.

If you can't start any downstream work until all the matches are done, 
you dramatically increase the serial fraction, limiting your potential 
speedup.  In the example below, we can start keeping cores busy 
immediately, since we'll fork a task to do the downstream map/forEach as 
soon as we find the first match, while we're looking for the second 
match.  The faster we start using all the cores, the faster we finish. 
The gain we make in more efficient splitting (due to streaming into an 
array) we can easily lose (and more) by starting the first element much 
later.

On 3/10/2013 3:26 PM, Remi Forax wrote:
> On 03/10/2013 07:26 PM, Brian Goetz wrote:
>> Because you're not overriding trySplit, this spliterator will not allow
>> streams to be parallelized at all, even if the downstream operations
>> could benefit from such, such as in:
>>
>>      pattern.splitAsStream(bigString)
>>             .parallel()
>>             .map(expensiveTransform)
>>             .forEach(...);
>>
>> Even though the above pipeline will be limited by the sequential regex
>> splitting at its source, if the downstream operations are expensive
>> enough, they could still benefit from parallelization.  But the
>> spliterator, as written, won't permit that -- it is strictly sequential.
>
> I wonder if in that case, when the transformation is more expensive that
> the pattern matching, it not better to wrap the result of
> pattern.split() in a stream with Array.stream().
>
> Rémi
>
>