Code review request: JDK-8008670 (partial java.util.stream implementation)
At: http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/ there is a webrev of a subset of the implementation of java.util.stream, for review. These are all new files. None of these are public classes; they are internal interfaces for the Stream library, as well as some implementation classes. This is about half the classes in the Streams package. (The webrev includes changes to Map, but those are being reviewed separately, so just ignore those.) We're asking people to focus their review on both the quality of API specification for these internal APIs, and the quality of implementation of the specified classes. It may not be obvious how some of these work until you've seen the rest of it, but do what you can.
I've posted an updated webrev incorporating comments received to date: http://cr.openjdk.java.net/~briangoetz/jdk-8008670.2/webrev/ On 2/21/2013 2:16 PM, Brian Goetz wrote:
At: http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
there is a webrev of a subset of the implementation of java.util.stream, for review. These are all new files. None of these are public classes; they are internal interfaces for the Stream library, as well as some implementation classes. This is about half the classes in the Streams package.
(The webrev includes changes to Map, but those are being reviewed separately, so just ignore those.)
We're asking people to focus their review on both the quality of API specification for these internal APIs, and the quality of implementation of the specified classes. It may not be obvious how some of these work until you've seen the rest of it, but do what you can.
My comments... 1) TerminalOp: When default methods are very simple (i.e., a one-liner), you are sometimes (not always) putting the body on a new line. I think you should use a consistent coding style and always start the body on the next line. 2) I find X_IN and X_OUT type parameters to be disruptive to reading. Sticking to a simple "I" and "O" would be much easier to read. 3) StreamShape: Javadoc should be formatted to be within 80 characters. 4) There are plentiful of if() statements that are not using braces. An if() should always use braces per the published Java coding conventions. Paul On Mon, Mar 11, 2013 at 4:55 PM, Brian Goetz <brian.goetz@oracle.com> wrote:
I've posted an updated webrev incorporating comments received to date:
http://cr.openjdk.java.net/~briangoetz/jdk-8008670.2/webrev/
On 2/21/2013 2:16 PM, Brian Goetz wrote:
At: http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
there is a webrev of a subset of the implementation of java.util.stream, for review. These are all new files. None of these are public classes; they are internal interfaces for the Stream library, as well as some implementation classes. This is about half the classes in the Streams package.
(The webrev includes changes to Map, but those are being reviewed separately, so just ignore those.)
We're asking people to focus their review on both the quality of API specification for these internal APIs, and the quality of implementation of the specified classes. It may not be obvious how some of these work until you've seen the rest of it, but do what you can.
2) I find X_IN and X_OUT type parameters to be disruptive to reading. Sticking to a simple "I" and "O" would be much easier to read.
I'll answer this one because we've gotten it from a few places. This is a nice idea, and it works fine up to a certain level of complexity. But when you have a generic method of a generic class that between them have five type variables, the single-letter scheme hits the wall, and it crashes and burns. (This isn't supposition; the earlier versions of this code base stuck to the single letter scheme and people told us they couldn't keep track of which type variables are which. For example, in the interaction between AbstractPipeline and PipelineHelper, both have in and out parameters, and they're not the same, so I and O are completely confusing in that context.) But one also wants to be consistent. So when you cross the multi-letter boundary, you likely have to stay there even in many cases where one letter would do.
On Mar 12, 2013, at 12:13 AM, Paul Benedict <pbenedict@apache.org> wrote:
3) StreamShape: Javadoc should be formatted to be within 80 characters.
Hmm... that is linked to an old version. It should be the same as: http://hg.openjdk.java.net/lambda/lambda/jdk/raw-file/9ccdccc3c754/src/share... Paul.
Some notes: - Copyrights update. - The order of notes on files is the order that I read the files. It seems like not a bad order to review them. - package docs? Coming next? java.util.Map:: - @since 1.8 missing for defaults - <tt></tt> should be replace with {@code} - default methods need the notice that the default implementations do not provide atomicity. (copy from putIfAbsent) - ::forEach() - Rename the parameter from block? - ::computeIfAbsent() - "and enters it into this map" -> inserts? - I wonder if compute if absent should replace the original get(key) with containsKey(key) and replace get() with containsKey() in the pseudo-code documentation. If implementations know that the map can't contain null then they can optimize to get(). Demonstrated with get() would seem to produce incorrect results with a present null value. - ::computeIfAbsent()/computeIfPresent()/compute() - "key with which the specified value is to be associated" -> derived value. - ::computeIfAbsent()/computeIfPresent()/compute() - @throws RuntimeException or Error These should be separate. java.util.Collections:: - Map.synchronizedMap needs extension to provide synchronized implementations of Map default methods. java.util.Hashtable:: - synchronized implementations of Map default methods are needed. AbstractTask:: - "that describes a portion of the input" -> describes the he portion of the input associated with the subtree rooted at this task. - setRawResult() : needs better docs. Why does getRawResult return local (which may be null)? - Are trees of abstract task always homogeneous? Perhaps an assert parent.getClass == class ? since the T can't be verified to be the same as the class. - onCompletion does not clear numChildren ForEachOps:: - BooleanProvider -> BooleanSupplier - in whatever Thread -> on whatever thread - "<p/>There is no guarantee that additional elements.." should be before @apiNote. - Wouldn't it be nice if there as a way to template the primitive impls. :-) IntermediateOp:: - "An intermediate operation has an input type and an output type". These aren't reified or introspectable in contrast to the shape. - Perhaps defer more about statefulness to StatefulOp StatefulOp:: - Should include a sentence about how a stateful op knows when it has received it's last element. - Perhaps re-iterate wrapSink() to provide documentation about completion and when elements are written to output. ie. Sink.end() Sink:: - ::end() "If the {@code Sink} buffers any results from previous values, they should dump their contents downstream and clear any stored state." -> If the {@code Sink} is stateful it should send all of it's stored state downstream. - :: I wonder about the suggestion to clear stored state. Why is this better than just letting it be GCed? If there is value to explicit clearing we should explain why (at least partially). - ::cancellationRequested() -> :: accepting()? Cancellation sounds like an exceptional state. - ::cancellationRequested() - document default. @implNote - why does accept(T) not merit an ISE default? - ChainedReference/ChainedInt/etc : Should mention in the first sentence that they are abstract. "Abstract skeleton {@code Sink} implementation for creating chains of sinks." Tripwire:: - "Turned off for production." -> "Normally this should be turned off for production systems." Node:: - I don't find the defaults particularly helpful. I would probably prefer them to be fully abstract so that I remember to implement them. - ::forEach() - worth mentioning that the traversal order is possibly unspecified? - ::toArray() warning regarding shared array and mutation is not sufficiently dire. - ::toArray() generator is unexplained. relation between generator needs to be clear. ie. generator might not be called even for unshared. - ::copyInto() doesn't offer the ability to get some subset of node elements. count() returns long but arrays can only hold INTEGER.MAX_VALUE (less actually) elements. Either there has to be a way to get partial results or we might as well make count an int. - ::Builder - is it necessary to say that it is a flat node? - That's a lotta template code! StreamOpFlag:: - tables could be a proper HTML table. If you want me to rework it as a proper 508 compliant table, just ask. :-) On Mar 11 2013, at 14:55 , Brian Goetz wrote:
I've posted an updated webrev incorporating comments received to date:
http://cr.openjdk.java.net/~briangoetz/jdk-8008670.2/webrev/
On 2/21/2013 2:16 PM, Brian Goetz wrote:
At: http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
there is a webrev of a subset of the implementation of java.util.stream, for review. These are all new files. None of these are public classes; they are internal interfaces for the Stream library, as well as some implementation classes. This is about half the classes in the Streams package.
(The webrev includes changes to Map, but those are being reviewed separately, so just ignore those.)
We're asking people to focus their review on both the quality of API specification for these internal APIs, and the quality of implementation of the specified classes. It may not be obvious how some of these work until you've seen the rest of it, but do what you can.
On 03/12/2013 02:29 AM, Mike Duigou wrote:
Some notes:
- Copyrights update.
- The order of notes on files is the order that I read the files. It seems like not a bad order to review them.
- package docs? Coming next?
java.util.Map::
- @since 1.8 missing for defaults
- <tt></tt> should be replace with {@code}
- default methods need the notice that the default implementations do not provide atomicity. (copy from putIfAbsent)
- ::forEach() - Rename the parameter from block?
- ::computeIfAbsent() - "and enters it into this map" -> inserts?
- I wonder if compute if absent should replace the original get(key) with containsKey(key) and replace get() with containsKey() in the pseudo-code documentation. If implementations know that the map can't contain null then they can optimize to get(). Demonstrated with get() would seem to produce incorrect results with a present null value.
- ::computeIfAbsent()/computeIfPresent()/compute() - "key with which the specified value is to be associated" -> derived value.
- ::computeIfAbsent()/computeIfPresent()/compute() - @throws RuntimeException or Error These should be separate.
Hi, I would just like to point out to a discussion a couple of months ago: http://mail.openjdk.java.net/pipermail/lambda-dev/2013-January/007509.html ... at least Map.compute() still has this anomaly when executed against HashMap. Regards, Peter
java.util.Collections::
- Map.synchronizedMap needs extension to provide synchronized implementations of Map default methods.
java.util.Hashtable::
- synchronized implementations of Map default methods are needed.
AbstractTask::
- "that describes a portion of the input" -> describes the he portion of the input associated with the subtree rooted at this task.
- setRawResult() : needs better docs. Why does getRawResult return local (which may be null)?
- Are trees of abstract task always homogeneous? Perhaps an assert parent.getClass == class ? since the T can't be verified to be the same as the class.
- onCompletion does not clear numChildren
ForEachOps::
- BooleanProvider -> BooleanSupplier
- in whatever Thread -> on whatever thread
- "<p/>There is no guarantee that additional elements.." should be before @apiNote.
- Wouldn't it be nice if there as a way to template the primitive impls. :-)
IntermediateOp::
- "An intermediate operation has an input type and an output type". These aren't reified or introspectable in contrast to the shape.
- Perhaps defer more about statefulness to StatefulOp
StatefulOp::
- Should include a sentence about how a stateful op knows when it has received it's last element.
- Perhaps re-iterate wrapSink() to provide documentation about completion and when elements are written to output. ie. Sink.end()
Sink::
- ::end() "If the {@code Sink} buffers any results from previous values, they should dump their contents downstream and clear any stored state." -> If the {@code Sink} is stateful it should send all of it's stored state downstream.
- :: I wonder about the suggestion to clear stored state. Why is this better than just letting it be GCed? If there is value to explicit clearing we should explain why (at least partially).
- ::cancellationRequested() -> :: accepting()? Cancellation sounds like an exceptional state.
- ::cancellationRequested() - document default. @implNote
- why does accept(T) not merit an ISE default?
- ChainedReference/ChainedInt/etc : Should mention in the first sentence that they are abstract. "Abstract skeleton {@code Sink} implementation for creating chains of sinks."
Tripwire::
- "Turned off for production." -> "Normally this should be turned off for production systems."
Node::
- I don't find the defaults particularly helpful. I would probably prefer them to be fully abstract so that I remember to implement them.
- ::forEach() - worth mentioning that the traversal order is possibly unspecified?
- ::toArray() warning regarding shared array and mutation is not sufficiently dire.
- ::toArray() generator is unexplained. relation between generator needs to be clear. ie. generator might not be called even for unshared.
- ::copyInto() doesn't offer the ability to get some subset of node elements. count() returns long but arrays can only hold INTEGER.MAX_VALUE (less actually) elements. Either there has to be a way to get partial results or we might as well make count an int.
- ::Builder - is it necessary to say that it is a flat node?
- That's a lotta template code!
StreamOpFlag::
- tables could be a proper HTML table. If you want me to rework it as a proper 508 compliant table, just ask. :-)
On Mar 11 2013, at 14:55 , Brian Goetz wrote:
I've posted an updated webrev incorporating comments received to date:
http://cr.openjdk.java.net/~briangoetz/jdk-8008670.2/webrev/
On 2/21/2013 2:16 PM, Brian Goetz wrote:
At: http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
there is a webrev of a subset of the implementation of java.util.stream, for review. These are all new files. None of these are public classes; they are internal interfaces for the Stream library, as well as some implementation classes. This is about half the classes in the Streams package.
(The webrev includes changes to Map, but those are being reviewed separately, so just ignore those.)
We're asking people to focus their review on both the quality of API specification for these internal APIs, and the quality of implementation of the specified classes. It may not be obvious how some of these work until you've seen the rest of it, but do what you can.
On Mar 12, 2013, at 2:29 AM, Mike Duigou <mike.duigou@oracle.com> wrote:
Some notes:
- Copyrights update.
Pushed to lambda repo: http://hg.openjdk.java.net/lambda/lambda/jdk/rev/40c0a71455d7
- The order of notes on files is the order that I read the files. It seems like not a bad order to review them.
- package docs? Coming next?
Soonish :-) there is much left to review/spec.
java.util.Map::
- @since 1.8 missing for defaults
- <tt></tt> should be replace with {@code}
It is mostly consistent with the rest of the Map documentation. We should do a global replace in that case?
- default methods need the notice that the default implementations do not provide atomicity. (copy from putIfAbsent)
I have shuffled stuff around and used @implSpec.
- ::forEach() - Rename the parameter from block?
I updated that to use similar language as Iterator/Iterable.forEach.
- ::computeIfAbsent() - "and enters it into this map" -> inserts?
- I wonder if compute if absent should replace the original get(key) with containsKey(key) and replace get() with containsKey() in the pseudo-code documentation. If implementations know that the map can't contain null then they can optimize to get(). Demonstrated with get() would seem to produce incorrect results with a present null value.
- ::computeIfAbsent()/computeIfPresent()/compute() - "key with which the specified value is to be associated" -> derived value.
Perhaps the above three merit a separate discussion. Raise in a separate email thread?
- ::computeIfAbsent()/computeIfPresent()/compute() - @throws RuntimeException or Error These should be separate.
Do we really need to declare that? It would apply for every use of lambdas and it is previously stated: "If the function itself throws an (unchecked) exception, the exception is rethrown, and the current mapping is left unchanged." I have removed them. http://hg.openjdk.java.net/lambda/lambda/jdk/rev/36664688aa52
java.util.Collections::
- Map.synchronizedMap needs extension to provide synchronized implementations of Map default methods.
http://hg.openjdk.java.net/lambda/lambda/jdk/rev/077efaf92c8c You know off any tests that can be reused?
java.util.Hashtable::
- synchronized implementations of Map default methods are needed.
Good point. I think it may help to address that and some other things together. Checked and unmodifiable lists should also be updated to be efficient for appropriate default methods. There is also the synchronized collections/lists. We need to consider the spliterator/stream/parallelStream methods, especially how the spliterator should behave. Paul.
On Mar 12 2013, at 06:51 , Paul Sandoz wrote:
- <tt></tt> should be replace with {@code}
It is mostly consistent with the rest of the Map documentation. We should do a global replace in that case?
We are incrementally updating the source with this change. Nobody is going out of their way to fix it but "while you're in the neighborhood.
- ::computeIfAbsent()/computeIfPresent()/compute() - "key with which the specified value is to be associated" -> derived value.
Perhaps the above three merit a separate discussion. Raise in a separate email thread?
I had misunderstood that the Map changes were part of this review (no harm done). We can address this as part of the review of JDK-8010122
http://hg.openjdk.java.net/lambda/lambda/jdk/rev/077efaf92c8c
You know off any tests that can be reused?
I started to look for ConcurrentMap tests but found not enough. We have a gap here. Mike
Hi Mike, On Mar 12, 2013, at 2:29 AM, Mike Duigou <mike.duigou@oracle.com> wrote:
AbstractTask::
- "that describes a portion of the input" -> describes the he portion of the input associated with the subtree rooted at this task.
- setRawResult() : needs better docs. Why does getRawResult return local (which may be null)?
I need to look into this more closely. I recall some subtle things related to CountedCompleter.complete/onCompletion methods and short-circuiting operations in terms of the order of invocation that caused us to side step get/setRawResult with get/setLocalResult.
- Are trees of abstract task always homogeneous?
Yes.
Perhaps an assert parent.getClass == class ? since the T can't be verified to be the same as the class.
- onCompletion does not clear numChildren
Currently it is designed to clear stuff to help GC, once the task is completed it should not be reused. So we should state something about reuse.
ForEachOps::
- BooleanProvider -> BooleanSupplier
- in whatever Thread -> on whatever thread
- "<p/>There is no guarantee that additional elements.." should be before @apiNote.
- Wouldn't it be nice if there as a way to template the primitive impls. :-)
I am not sure it is worth it for 3 primitive types. There is also a cost of developing with templates, including that of the template system itself. Ideally i want a template system whereby one can implement the int version and annotate with descriptions that say how code can be substituted to produce long/double versions and we use javac with an annotation processor to generate the source as part of the compilation process.
IntermediateOp::
- "An intermediate operation has an input type and an output type". These aren't reified or introspectable in contrast to the shape.
Tricky. How would we do that for say: Stream<List<String>> s = ... s.flatMap(s::stream).. // List<String> -> String ?
- Perhaps defer more about statefulness to StatefulOp
StatefulOp is a helper interface. It's not required to to implement a stateful operation, so perhaps we should clarify that point.
StatefulOp::
- Should include a sentence about how a stateful op knows when it has received it's last element.
- Perhaps re-iterate wrapSink() to provide documentation about completion and when elements are written to output. ie. Sink.end()
Not all stateful ops accumulate state and wait until sink.end() to push that accumulated state. SortedOp does that but DistinctOp and SliceOp do not, as they accumulate "side-state" to determine when elements are pushed downstream or not.
Sink::
- ::end() "If the {@code Sink} buffers any results from previous values, they should dump their contents downstream and clear any stored state." -> If the {@code Sink} is stateful it should send all of it's stored state downstream.
- :: I wonder about the suggestion to clear stored state. Why is this better than just letting it be GCed? If there is value to explicit clearing we should explain why (at least partially).
The idea being is Sink's can be resued. We don't currently do so, but we can consider an optimization later on whereby we cache a sink per fork/join thread.
- ::cancellationRequested() -> :: accepting()? Cancellation sounds like an exceptional state.
- ::cancellationRequested() - document default. @implNote
- why does accept(T) not merit an ISE default?
Notice that for the primitive values the corresponding accept is re-abstracted. So Sink accepts references, Sink.OfInt accepts ints etc. so the important method to implement remains abstract.
- ChainedReference/ChainedInt/etc : Should mention in the first sentence that they are abstract. "Abstract skeleton {@code Sink} implementation for creating chains of sinks."
Tripwire::
- "Turned off for production." -> "Normally this should be turned off for production systems."
Node::
- I don't find the defaults particularly helpful. I would probably prefer them to be fully abstract so that I remember to implement them.
I have found them helpful since the majority of Node implementations are leaf nodes.
- ::forEach() - worth mentioning that the traversal order is possibly unspecified?
Its a bit like that for Collection. We could say something like "in the order elements are returned by a spliterator".
- ::toArray() warning regarding shared array and mutation is not sufficiently dire.
Suggestions?
- ::toArray() generator is unexplained. relation between generator needs to be clear. ie. generator might not be called even for unshared.
OK. A copy should always be made, using and returning the generated array, if a reference cannot be returned.
- ::copyInto() doesn't offer the ability to get some subset of node elements. count() returns long but arrays can only hold INTEGER.MAX_VALUE (less actually) elements. Either there has to be a way to get partial results or we might as well make count an int.
The idea here is that we have a Node tree that reflects some parallel computation and we want to produce an array from that tree, in parallel, using F/J tasks. First we create an array from the size at the root, so it is gonna fail fast. We keep track of the offset into the array as nodes are processed down the tree. When we hit a leaf node we copy the contents of the node into that array at the correct offset. So there is no general requirement to copy a subset. I agree there is some odd inconsistency though. The idea about using a long for count() (and the same for Spliterator.estimateSize) is that at some point the JVM may support larger array sizes.
- ::Builder - is it necessary to say that it is a flat node?
I think so, a builder should not be building a Node that has children.
- That's a lotta template code!
Yes.
StreamOpFlag::
- tables could be a proper HTML table. If you want me to rework it as a proper 508 compliant table, just ask. :-)
I am asking :-) Thanks, Paul.
I have updated this webrev, now at: http://cr.openjdk.java.net/~briangoetz/JDK-8008670.3/webrev/ These are all nonpublic implementation-only files (no public APIs) that do not depend on the public Stream APIs. These can go back as soon as the changes to Map and Spliterator are put back. On 2/21/2013 2:16 PM, Brian Goetz wrote:
At: http://cr.openjdk.java.net/~briangoetz/jdk-8008670/webrev/
there is a webrev of a subset of the implementation of java.util.stream, for review. These are all new files. None of these are public classes; they are internal interfaces for the Stream library, as well as some implementation classes. This is about half the classes in the Streams package.
(The webrev includes changes to Map, but those are being reviewed separately, so just ignore those.)
We're asking people to focus their review on both the quality of API specification for these internal APIs, and the quality of implementation of the specified classes. It may not be obvious how some of these work until you've seen the rest of it, but do what you can.
Single new source file at: http://hg.openjdk.java.net/lambda/lambda/jdk/file/76ac19e61df1/src/share/cla... Doc at: http://cr.openjdk.java.net/~briangoetz/JDK-8008682/api/java/util/stream/Coll... for JDK-8011917 (Collectors class in java.util.stream). (No webrev as there's just one new file.)
Brian, this is low-hanging fruit, but all descriptions after @param and @return should start with lowercase per published JavaDoc conventions. On Wed, Apr 17, 2013 at 2:26 PM, Brian Goetz <brian.goetz@oracle.com> wrote:
Single new source file at:
http://hg.openjdk.java.net/lambda/lambda/jdk/file/76ac19e61df1/src/share/cla...
Doc at:
http://cr.openjdk.java.net/~briangoetz/JDK-8008682/api/java/util/stream/Coll...
for JDK-8011917 (Collectors class in java.util.stream).
(No webrev as there's just one new file.)
Updated spec for Collection.spliterator; default methods for Collection.stream() and parallelStream(). Specialized implementations in subtypes will come in a separate putback. Webrev at: http://cr.openjdk.java.net/~briangoetz/JDK-8012542/webrev/
participants (5)
-
Brian Goetz
-
Mike Duigou
-
Paul Benedict
-
Paul Sandoz
-
Peter Levart