Code review request: JDK-8008670 (partial java.util.stream implementation)

Tue Mar 12 08:24:00 PDT 2013

Hi Mike,

On Mar 12, 2013, at 2:29 AM, Mike Duigou <mike.duigou at oracle.com> wrote:
> AbstractTask:: 
> 
> - "that describes a portion of the input" -> describes the he portion of the input associated with the subtree rooted at this task.
> 
> - setRawResult() : needs better docs. Why does getRawResult return local (which may be null)?
> 

I need to look into this more closely. I recall some subtle things related to CountedCompleter.complete/onCompletion methods and short-circuiting operations in terms of the order of invocation that caused us to side step get/setRawResult with get/setLocalResult.

> - Are trees of abstract task always homogeneous?

Yes.

> Perhaps an assert parent.getClass == class ? since the T can't be verified to be the same as the class.
> 
> - onCompletion does not clear numChildren
> 

Currently it is designed to clear stuff to help GC, once the task is completed it should not be reused. So we should state something about reuse.

> 
> 
> ForEachOps::
> 
> - BooleanProvider -> BooleanSupplier
> 
> - in whatever Thread -> on whatever thread
> 
> - "<p/>There is no guarantee that additional elements.." should be before @apiNote.
> 
> - Wouldn't it be nice if there as a way to template the primitive impls. :-)
> 

I am not sure it is worth it for 3 primitive types. There is also a cost of developing with templates, including that of the template system itself. Ideally i want a template system whereby one can implement the int version and annotate with descriptions that say how code can be substituted to produce long/double versions and we use javac with an annotation processor to generate the source as part of the compilation process.

> 
> IntermediateOp::
> 
> - "An intermediate operation has an input type and an output type". These aren't reified or introspectable in contrast to the shape.
> 

Tricky. How would we do that for say:

  Stream<List<String>> s = ...
  s.flatMap(s::stream)..   // List<String> -> String

?

> - Perhaps defer more about statefulness to StatefulOp
> 

StatefulOp is a helper interface. It's not required to to implement a stateful operation, so perhaps we should clarify that point.

> 
> StatefulOp::
> 
> - Should include a sentence about how a stateful op knows when it has received it's last element.
> 
> - Perhaps re-iterate wrapSink() to provide documentation about completion and when elements are written to output. ie. Sink.end()
> 

Not all stateful ops accumulate state and wait until sink.end() to push that accumulated state. SortedOp does that but DistinctOp and SliceOp do not, as they accumulate "side-state" to determine when elements are pushed downstream or not.

> 
> Sink::
> 
> - ::end() "If the {@code Sink} buffers any results from previous values, they should dump their contents downstream and clear any stored state." -> If the {@code Sink} is stateful it should send all of it's stored state downstream.
> 
> - :: I wonder about the suggestion to clear stored state. Why is this better than just letting it be GCed? If there is value to explicit clearing we should explain why (at least partially).
> 

The idea being is Sink's can be resued. We don't currently do so, but we can consider an optimization later on whereby we cache a sink per fork/join thread.

> - ::cancellationRequested() -> :: accepting()? Cancellation sounds like an exceptional state. 
> 
> - ::cancellationRequested() - document default. @implNote
> 
> - why does accept(T) not merit an ISE default?
> 

Notice that for the primitive values the corresponding accept is re-abstracted. So Sink accepts references, Sink.OfInt accepts ints etc. so the important method to implement remains abstract.

> - ChainedReference/ChainedInt/etc : Should mention in the first sentence that they are abstract. "Abstract skeleton {@code Sink} implementation for creating chains of sinks."
> 
> 
> Tripwire::
> 
> - "Turned off for production." -> "Normally this should be turned off for production systems."
> 
> 
> Node::
> 
> - I don't find the defaults particularly helpful. I would probably prefer them to be fully abstract so that I remember to implement them.

I have found them helpful since the majority of Node implementations are leaf nodes.

> 
> - ::forEach() - worth mentioning that the traversal order is possibly unspecified?
> 

Its a bit like that for Collection. We could say something like "in the order elements are returned by a spliterator".

> - ::toArray() warning regarding shared array and mutation is not sufficiently dire.
> 

Suggestions?

> - ::toArray() generator is unexplained. relation between generator needs to be clear. ie. generator might not be called even for unshared.
> 

OK. A copy should always be made, using and returning the generated array, if a reference cannot be returned.

> - ::copyInto() doesn't offer the ability to get some subset of node elements. count() returns long but arrays can only hold INTEGER.MAX_VALUE  (less actually) elements. Either there has to be a way to get partial results or we might as well make count an int.
> 

The idea here is that we have a Node tree that reflects some parallel computation and we want to produce an array from that tree, in parallel, using F/J tasks. First we create an array from the size at the root, so it is gonna fail fast. We keep track of the offset into the array as nodes are processed down the tree. When we hit a leaf node we copy the contents of the node into that array at the correct offset.

So there is no general requirement to copy a subset.

I agree there is some odd inconsistency though. The idea about using a long for count() (and the same for Spliterator.estimateSize) is that at some point the JVM may support larger array sizes.

> - ::Builder - is it necessary to say that it is a flat node?
> 

I think so, a builder should not be building a Node that has children.

> - That's a lotta template code!
> 

Yes.

> 
> 
> StreamOpFlag::
> 
> - tables could be a proper HTML table. If you want me to rework it as a proper 508 compliant table, just ask. :-)
> 

I am asking :-)

Thanks,
Paul.