RFR 8075307 Pipeline calculating inconsistent flag state for parallel stateful ops

Paul Sandoz paul.sandoz at oracle.com
Tue Mar 24 07:57:30 UTC 2015


Hi Stuart,

Many thanks for looking at this gnarly code.

On Mar 24, 2015, at 2:41 AM, Stuart Marks <stuart.marks at oracle.com> wrote:

> Hi Paul,
> 
> After reading your notes here, and in the bug reports, and the comments in the code, and banging my head against the code (before and after) for a while, I'm starting to see that this is on the right track. Sorry to hedge a bit but I have to admit that I don't fully understand the code.
> 
> However, I do see the comment you referred to (old version, line 460) and that the new code is essentially a merge of the old parallelPrepare() code into sourceSpliterator(). And I do see the crucial addition of the flag modification based on the spliterator's SIZED characteristic. So that seems right, and if the tests pass, so much the better.
> 
> A few comments for future maintenance/cleanups in this area.
> 

All good points i will add some of my own below.


> * This package seems like a curious mixture of abstraction of bit-twiddling into small methods (e.g., combineOpFlags), and bit-twiddling in open code (evalation of Spliterator.SIZED into thisOpFlags, new version lines 459ff). While not incorrect, this is jarring.
> 

Yeah, in more common cases i optimized but it is probably not necessary.


> * The loop variables over the pipeline stages are hard to follow. The variable 'p' is used in the loop at line 413 a the "current" pipeline stage, whereas 'p' is used for the "next" stage in the loop at 434, and 'u' is the "current" stage. This is confusing, and I don't know what 'u' means.
> 

"p" is consistent between both loops for initialization and the next stage, it's the termination that is different (the first loop never needs to process the source stage and keep track of the previous stage). There is a reason for this due to the optimisation in toArray.

"u" = I forget :-)


> * Also, in the same loop, 'e' is initialized to 'this' and is checked as the loop exit condition (I guess that's what 'e' stands for) but it's not used elsewhere, so it's not clear to me how much value this adds.

e == "end of pipeline".

Additionally we don't need the perform the first loop if there are no terminal flags.


> 
> Stepping back from this a bit, this is clearly an area of some complexity, and it might benefit from some additional testing. The streams library is overall well-tested, but mostly from a functional level, i.e. running a bunch of streams in various combinations to ensure the right output is produced. For this case it might be helpful to assemble (but not execute) a bunch of combinations of pipelines and then make sure that each stage ends up with the right flags. (I didn't find such a test when I went looking, but I might have missed it.)
> 

There are some. Follow the trail of StatelessTestOp. What we lack is a systematic way of performing combinatorial testing derived from a more formal description of the operations and their flag manipulation.

In this case, as per the comment in the old code, there was a deliberate limitation so testing in this case would not have helped.


> In any case I think it's reasonble to proceed with this patch as it stands instead of tinkering with it. Some additional cleanups are warranted at some point but we should keep an eye on these for the future.
> 

Thanks,
Paul.



More information about the core-libs-dev mailing list