RFR: JDK-8205461 Create Collector which merges results of two other collectors
Please review and sponsor: https://bugs.openjdk.java.net/browse/JDK-8205461 http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r1/ See also previous discussion thread at [1]. It seems that we did not reach the final conclusion about the collector name, so in this review it's still named as "pairing" (proposed by me). Other name proposals: bisecting - by Paul Sandoz (sounds like input is split by two parts like in partitioningBy, which is not the case) tee or teeing - by Brian Goetz (IIUC from the T shape, it's assymmetrical operation, while our collector is symmetrical) duplex(ing) - by Jacob Glickman (well, probably ok?) bifurcate - by James Laskey (or bifurcating?) replicator - by James Laskey (as in DNA) replicating - by Peter Levart fanout or fanningOut - Chris Hegarty (sounds cryptic to me, checked Wikipedia [2], does not look like suitable) distributing - by Brian Goetz (like in distributive law a(b+c) = ab+ac; but common meaning of "distributing" is different) tapping - by Kirk Pepperdine (I have no associations; Google images shows some acupuncture procedures) split - by Kirk Pepperdine (may be confused with Spliterator) unzipping - by John Rose biMapping - by Zheka Kozlov (but he also suggests to change signature which is unnecessary) toBoth - by Peter Levart (but usually toXyz shows target container like toArray/toList/toSet, but there's not "Both" container) collectionToBothAndThen - by Zheka Kozlov (but too long) collectingToBoth - by Zheka Kozlov collectingTo - by Brian Goetz (isn't collect(collectingTo(...)) a little bit repititive?) biCollecting - by Zheka Kozlov expanding - by Paul Sandoz (doesn't sound very descriptive to me) forking - by Victor Williams Stafusa da Silva (could be thought that something is parallelized as forking is usually something connected to parallel computations) I think that it's better to concentrate not on the "splitting" part (each element is passed to two collectors), but on the "joining" part (results of two collectors are combined together). So my preference now is "merging" or "combining". If limiting the selection to the suggested options I'd prefer "duplexing" or "bifurcating" or to stay with my original version "pairing". Of course original Stream API author voices have more weight here, so if you insist on particular version, I will follow. By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively 2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown 3. The `if (!Map.class.isAssignableFrom(map.getClass()))` looks useless as `map` is already of `Map<Boolean, D>` type, so this check is always false (we would never enter this method if it's true). Similarly `if (!List.class.isAssignableFrom(value.getClass()))` 4. Deprecated `clazz.newInstance()` is used 5. Test named `testGroupubgByWithReducing` should be renamed to `testGroupingByWithReducing` Should I fix some or all of these issues? As separate changeset or within this one? With best regards, Tagir Valeev. [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053718.html [2] https://en.wikipedia.org/wiki/Fan-out
Ping! Could you please review and sponsor this changeset? I updated version tag from since 11 to since 12: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r2/ Thanks in advance! With best regards, Tagir Valeev. On Thu, Jun 21, 2018 at 11:33 AM Tagir Valeev <amaembo@gmail.com> wrote:
Please review and sponsor:
https://bugs.openjdk.java.net/browse/JDK-8205461 http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r1/
See also previous discussion thread at [1]. It seems that we did not reach the final conclusion about the collector name, so in this review it's still named as "pairing" (proposed by me). Other name proposals:
bisecting - by Paul Sandoz (sounds like input is split by two parts like in partitioningBy, which is not the case) tee or teeing - by Brian Goetz (IIUC from the T shape, it's assymmetrical operation, while our collector is symmetrical) duplex(ing) - by Jacob Glickman (well, probably ok?) bifurcate - by James Laskey (or bifurcating?) replicator - by James Laskey (as in DNA) replicating - by Peter Levart fanout or fanningOut - Chris Hegarty (sounds cryptic to me, checked Wikipedia [2], does not look like suitable) distributing - by Brian Goetz (like in distributive law a(b+c) = ab+ac; but common meaning of "distributing" is different) tapping - by Kirk Pepperdine (I have no associations; Google images shows some acupuncture procedures) split - by Kirk Pepperdine (may be confused with Spliterator) unzipping - by John Rose biMapping - by Zheka Kozlov (but he also suggests to change signature which is unnecessary) toBoth - by Peter Levart (but usually toXyz shows target container like toArray/toList/toSet, but there's not "Both" container) collectionToBothAndThen - by Zheka Kozlov (but too long) collectingToBoth - by Zheka Kozlov collectingTo - by Brian Goetz (isn't collect(collectingTo(...)) a little bit repititive?) biCollecting - by Zheka Kozlov expanding - by Paul Sandoz (doesn't sound very descriptive to me) forking - by Victor Williams Stafusa da Silva (could be thought that something is parallelized as forking is usually something connected to parallel computations)
I think that it's better to concentrate not on the "splitting" part (each element is passed to two collectors), but on the "joining" part (results of two collectors are combined together). So my preference now is "merging" or "combining". If limiting the selection to the suggested options I'd prefer "duplexing" or "bifurcating" or to stay with my original version "pairing". Of course original Stream API author voices have more weight here, so if you insist on particular version, I will follow.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively 2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown 3. The `if (!Map.class.isAssignableFrom(map.getClass()))` looks useless as `map` is already of `Map<Boolean, D>` type, so this check is always false (we would never enter this method if it's true). Similarly `if (!List.class.isAssignableFrom(value.getClass()))` 4. Deprecated `clazz.newInstance()` is used 5. Test named `testGroupubgByWithReducing` should be renamed to `testGroupingByWithReducing`
Should I fix some or all of these issues? As separate changeset or within this one?
With best regards, Tagir Valeev.
[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053718.html [2] https://en.wikipedia.org/wiki/Fan-out
Hi Tagir, Unless you have already got a sponsor (can't remember if somebody already offered you a sponsorship), I can volunteer. Regarding the code, I only have one comment about the naming of the last parameter: "finisher". To avoid confusion, it would be good to name it differently from the Collector's functions. "finisher" and "combiner" are already taken by Collector API. You describe the parameter in javadoc as "the function which merges two results into the single one". So what about "merger" or "result[s]Merger" ? ...and one comment about handling of IDENTITY_FINISH: You correctly remove IDENTITY_FINISH from the resulting collector's characteristics, but you don't honor the IDENTITY_FINISH characteristics of the two parameter collectors. It should work nevertheless, but suppose some "sloppy programmed" collector doesn't bother to return the identity function when it announces that it has IDENTITY_FINISH characteristic... Regards, Peter On 08/07/2018 07:52 AM, Tagir Valeev wrote:
Ping! Could you please review and sponsor this changeset? I updated version tag from since 11 to since 12: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r2/
Thanks in advance!
With best regards, Tagir Valeev.
On Thu, Jun 21, 2018 at 11:33 AM Tagir Valeev <amaembo@gmail.com> wrote:
Please review and sponsor:
https://bugs.openjdk.java.net/browse/JDK-8205461 http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r1/
See also previous discussion thread at [1]. It seems that we did not reach the final conclusion about the collector name, so in this review it's still named as "pairing" (proposed by me). Other name proposals:
bisecting - by Paul Sandoz (sounds like input is split by two parts like in partitioningBy, which is not the case) tee or teeing - by Brian Goetz (IIUC from the T shape, it's assymmetrical operation, while our collector is symmetrical) duplex(ing) - by Jacob Glickman (well, probably ok?) bifurcate - by James Laskey (or bifurcating?) replicator - by James Laskey (as in DNA) replicating - by Peter Levart fanout or fanningOut - Chris Hegarty (sounds cryptic to me, checked Wikipedia [2], does not look like suitable) distributing - by Brian Goetz (like in distributive law a(b+c) = ab+ac; but common meaning of "distributing" is different) tapping - by Kirk Pepperdine (I have no associations; Google images shows some acupuncture procedures) split - by Kirk Pepperdine (may be confused with Spliterator) unzipping - by John Rose biMapping - by Zheka Kozlov (but he also suggests to change signature which is unnecessary) toBoth - by Peter Levart (but usually toXyz shows target container like toArray/toList/toSet, but there's not "Both" container) collectionToBothAndThen - by Zheka Kozlov (but too long) collectingToBoth - by Zheka Kozlov collectingTo - by Brian Goetz (isn't collect(collectingTo(...)) a little bit repititive?) biCollecting - by Zheka Kozlov expanding - by Paul Sandoz (doesn't sound very descriptive to me) forking - by Victor Williams Stafusa da Silva (could be thought that something is parallelized as forking is usually something connected to parallel computations)
I think that it's better to concentrate not on the "splitting" part (each element is passed to two collectors), but on the "joining" part (results of two collectors are combined together). So my preference now is "merging" or "combining". If limiting the selection to the suggested options I'd prefer "duplexing" or "bifurcating" or to stay with my original version "pairing". Of course original Stream API author voices have more weight here, so if you insist on particular version, I will follow.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively 2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown 3. The `if (!Map.class.isAssignableFrom(map.getClass()))` looks useless as `map` is already of `Map<Boolean, D>` type, so this check is always false (we would never enter this method if it's true). Similarly `if (!List.class.isAssignableFrom(value.getClass()))` 4. Deprecated `clazz.newInstance()` is used 5. Test named `testGroupubgByWithReducing` should be renamed to `testGroupingByWithReducing`
Should I fix some or all of these issues? As separate changeset or within this one?
With best regards, Tagir Valeev.
[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053718.html [2] https://en.wikipedia.org/wiki/Fan-out
Hi, As an API addition, it will need a CSR as well and it should have a couple of reviewers that are fully aware of Streams design. Regards, Roger On 8/7/18 4:08 AM, Peter Levart wrote:
Hi Tagir,
Unless you have already got a sponsor (can't remember if somebody already offered you a sponsorship), I can volunteer.
Regarding the code, I only have one comment about the naming of the last parameter: "finisher". To avoid confusion, it would be good to name it differently from the Collector's functions. "finisher" and "combiner" are already taken by Collector API. You describe the parameter in javadoc as "the function which merges two results into the single one". So what about "merger" or "result[s]Merger" ?
...and one comment about handling of IDENTITY_FINISH: You correctly remove IDENTITY_FINISH from the resulting collector's characteristics, but you don't honor the IDENTITY_FINISH characteristics of the two parameter collectors. It should work nevertheless, but suppose some "sloppy programmed" collector doesn't bother to return the identity function when it announces that it has IDENTITY_FINISH characteristic...
Regards, Peter
On 08/07/2018 07:52 AM, Tagir Valeev wrote:
Ping! Could you please review and sponsor this changeset? I updated version tag from since 11 to since 12: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r2/
Thanks in advance!
With best regards, Tagir Valeev.
On Thu, Jun 21, 2018 at 11:33 AM Tagir Valeev <amaembo@gmail.com> wrote:
Please review and sponsor:
https://bugs.openjdk.java.net/browse/JDK-8205461 http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r1/
See also previous discussion thread at [1]. It seems that we did not reach the final conclusion about the collector name, so in this review it's still named as "pairing" (proposed by me). Other name proposals:
bisecting - by Paul Sandoz (sounds like input is split by two parts like in partitioningBy, which is not the case) tee or teeing - by Brian Goetz (IIUC from the T shape, it's assymmetrical operation, while our collector is symmetrical) duplex(ing) - by Jacob Glickman (well, probably ok?) bifurcate - by James Laskey (or bifurcating?) replicator - by James Laskey (as in DNA) replicating - by Peter Levart fanout or fanningOut - Chris Hegarty (sounds cryptic to me, checked Wikipedia [2], does not look like suitable) distributing - by Brian Goetz (like in distributive law a(b+c) = ab+ac; but common meaning of "distributing" is different) tapping - by Kirk Pepperdine (I have no associations; Google images shows some acupuncture procedures) split - by Kirk Pepperdine (may be confused with Spliterator) unzipping - by John Rose biMapping - by Zheka Kozlov (but he also suggests to change signature which is unnecessary) toBoth - by Peter Levart (but usually toXyz shows target container like toArray/toList/toSet, but there's not "Both" container) collectionToBothAndThen - by Zheka Kozlov (but too long) collectingToBoth - by Zheka Kozlov collectingTo - by Brian Goetz (isn't collect(collectingTo(...)) a little bit repititive?) biCollecting - by Zheka Kozlov expanding - by Paul Sandoz (doesn't sound very descriptive to me) forking - by Victor Williams Stafusa da Silva (could be thought that something is parallelized as forking is usually something connected to parallel computations)
I think that it's better to concentrate not on the "splitting" part (each element is passed to two collectors), but on the "joining" part (results of two collectors are combined together). So my preference now is "merging" or "combining". If limiting the selection to the suggested options I'd prefer "duplexing" or "bifurcating" or to stay with my original version "pairing". Of course original Stream API author voices have more weight here, so if you insist on particular version, I will follow.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively 2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown 3. The `if (!Map.class.isAssignableFrom(map.getClass()))` looks useless as `map` is already of `Map<Boolean, D>` type, so this check is always false (we would never enter this method if it's true). Similarly `if (!List.class.isAssignableFrom(value.getClass()))` 4. Deprecated `clazz.newInstance()` is used 5. Test named `testGroupubgByWithReducing` should be renamed to `testGroupingByWithReducing`
Should I fix some or all of these issues? As separate changeset or within this one?
With best regards, Tagir Valeev.
[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053718.html
Hello, Peter! Yes, I would be happy, though see also Roger note.
Regarding the code, I only have one comment about the naming of the last parameter: "finisher"
Sounds reasonable, renamed to "merger".
...and one comment about handling of IDENTITY_FINISH
I think this would complicate the code unnecessarily adding unpleasant unchecked casts and more branches. I think that returning non-identity finisher while reporting IDENTITY_FINISH characteristic is a contract violation. See, for example, that the existing collectingAndThen collector relies on the finisher implementation just like I do. It simply uses downstream.finisher().andThen(finisher) even if finisher is identity. With best regards, Tagir Valeev. On Tue, Aug 7, 2018 at 3:08 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
Unless you have already got a sponsor (can't remember if somebody already offered you a sponsorship), I can volunteer.
Regarding the code, I only have one comment about the naming of the last parameter: "finisher". To avoid confusion, it would be good to name it differently from the Collector's functions. "finisher" and "combiner" are already taken by Collector API. You describe the parameter in javadoc as "the function which merges two results into the single one". So what about "merger" or "result[s]Merger" ?
...and one comment about handling of IDENTITY_FINISH: You correctly remove IDENTITY_FINISH from the resulting collector's characteristics, but you don't honor the IDENTITY_FINISH characteristics of the two parameter collectors. It should work nevertheless, but suppose some "sloppy programmed" collector doesn't bother to return the identity function when it announces that it has IDENTITY_FINISH characteristic...
Regards, Peter
On 08/07/2018 07:52 AM, Tagir Valeev wrote:
Ping! Could you please review and sponsor this changeset? I updated version tag from since 11 to since 12:http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r2/
Thanks in advance!
With best regards, Tagir Valeev.
On Thu, Jun 21, 2018 at 11:33 AM Tagir Valeev <amaembo@gmail.com> <amaembo@gmail.com> wrote:
Please review and sponsor: https://bugs.openjdk.java.net/browse/JDK-8205461http://cr.openjdk.java.net/~...
See also previous discussion thread at [1]. It seems that we did not reach the final conclusion about the collector name, so in this review it's still named as "pairing" (proposed by me). Other name proposals:
bisecting - by Paul Sandoz (sounds like input is split by two parts like in partitioningBy, which is not the case) tee or teeing - by Brian Goetz (IIUC from the T shape, it's assymmetrical operation, while our collector is symmetrical) duplex(ing) - by Jacob Glickman (well, probably ok?) bifurcate - by James Laskey (or bifurcating?) replicator - by James Laskey (as in DNA) replicating - by Peter Levart fanout or fanningOut - Chris Hegarty (sounds cryptic to me, checked Wikipedia [2], does not look like suitable) distributing - by Brian Goetz (like in distributive law a(b+c) = ab+ac; but common meaning of "distributing" is different) tapping - by Kirk Pepperdine (I have no associations; Google images shows some acupuncture procedures) split - by Kirk Pepperdine (may be confused with Spliterator) unzipping - by John Rose biMapping - by Zheka Kozlov (but he also suggests to change signature which is unnecessary) toBoth - by Peter Levart (but usually toXyz shows target container like toArray/toList/toSet, but there's not "Both" container) collectionToBothAndThen - by Zheka Kozlov (but too long) collectingToBoth - by Zheka Kozlov collectingTo - by Brian Goetz (isn't collect(collectingTo(...)) a little bit repititive?) biCollecting - by Zheka Kozlov expanding - by Paul Sandoz (doesn't sound very descriptive to me) forking - by Victor Williams Stafusa da Silva (could be thought that something is parallelized as forking is usually something connected to parallel computations)
I think that it's better to concentrate not on the "splitting" part (each element is passed to two collectors), but on the "joining" part (results of two collectors are combined together). So my preference now is "merging" or "combining". If limiting the selection to the suggested options I'd prefer "duplexing" or "bifurcating" or to stay with my original version "pairing". Of course original Stream API author voices have more weight here, so if you insist on particular version, I will follow.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively 2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown 3. The `if (!Map.class.isAssignableFrom(map.getClass()))` looks useless as `map` is already of `Map<Boolean, D>` type, so this check is always false (we would never enter this method if it's true). Similarly `if (!List.class.isAssignableFrom(value.getClass()))` 4. Deprecated `clazz.newInstance()` is used 5. Test named `testGroupubgByWithReducing` should be renamed to `testGroupingByWithReducing`
Should I fix some or all of these issues? As separate changeset or within this one?
With best regards, Tagir Valeev.
[1]http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-June/053718.html [2] https://en.wikipedia.org/wiki/Fan-out
On 6/21/2018 12:33 AM, Tagir Valeev wrote:
Please review and sponsor:
https://bugs.openjdk.java.net/browse/JDK-8205461 http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r1/
See also previous discussion thread at [1]. It seems that we did not reach the final conclusion about the collector name, so in this review it's still named as "pairing" (proposed by me). Other name proposals:
None of the names so far are great. The essential challenge in naming here is that this Collector does two (or maybe three) things: duplicate the stream into two identical streams ("tee"), sends each element to the two collectors ("collecting"), and then combines the results ("finishing"). So all the one-word names (pairing, teeing, unzipping, biMapping) only emphasize one half of the operation, and names that capture the full workflow accurately (teeingAndCollectingAndThen) are unwieldy. Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy. Names along the lines of "merging" may incorrectly give the idea that the merge is happening elementwise, rather than duplicating the streams, collecting, and merging the results.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively Does IntelliJ have an inspection for eliminating such locutions? 2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case. The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated With best regards, Tagir Valeev
Hi Tagir, I think this looks very good. It just needs a CSR. Will you file it? Regards, Peter On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy. Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively Does IntelliJ have an inspection for eliminating such locutions? Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.) I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics. Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
Hello! A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly) With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
Hi, I think this is a very good addition to the Collectors API. Whereas somehow specific, it will come in very handy when needed. But the name... well, honestly, teeingAndThen doesn't tell me anything from the perspective of an user of the API. What is 'teeing'? What does 'tee' actually mean? Or perhaps... Ah! They've finally added teeing to the API! Great! Now let's check what 'tee' actually means for these guys. I'm not sure, maybe it sounds a little bit too technical, or too implementation-specific to me. If I forget for a minute about the 'how' and focus on the 'what', I see that the operation either delegates to two collectors and then merges the results, or that it is aggregating two collectors. So, if it's not too late for new proposals, here are my 2 options: - delegateAndMerge (not gerund, but I think that delegatingAndMerging ends up beeing too long and too gerund-ish) - aggregating (I admit that I've though about these names once I read the comments of the operations/classes in the linked resources). Thanks, fps.- El lun., 20 ago. 2018 a las 10:49, Tagir Valeev (<amaembo@gmail.com>) escribió:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced
with
simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
I note that the only place that R appears is in the output of the merger. So the "? extends" is not needed there; it can be just BiFunction<? super R1, ? super R2, R>. On 8/20/2018 4:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
Hi Tagir, Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it. I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-) I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first. 1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable. I'm envisioning something like this: - supplier: creates a result container that contains result containers obtained by calling each collector's supplier - accumulator: calls each collector's accumulator with its result container and the input element ... and similar for the combiner and finisher functions. 2. Characteristics - UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.) - CONCURRENT: current draft seems correct - IDENTITY_FINISH: clearly not present; perhaps this should be specified 3. Parameter naming The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.) 4. Naming Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration: - compound - composite - conjoined - bonded - fused - duplex Discussion: A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are. "Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound. "Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them. In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".) s'marks On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
Hi Stuart, On 08/21/2018 07:43 AM, Stuart Marks wrote:
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
I think *both* is the right behavior. If you are collecting: teeingAndThen( Collectors.toList(), Collectors.toSet(), Map::entry ) ...then you might want the returned List part of result to respect encounter order regardless of the returned Set part which doesn't. Regards, Peter
On 8/21/18 12:04 AM, Peter Levart wrote:
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
I think *both* is the right behavior. If you are collecting:
teeingAndThen( Collectors.toList(), Collectors.toSet(), Map::entry )
...then you might want the returned List part of result to respect encounter order regardless of the returned Set part which doesn't. OK, this makes sense. Seems like it should be "both" then.
s'marks
I just note that sometimes naming is hard. Not because there would be no suitable name to choose from but because there are too many. In such situations it becomes apparent that every individual's brain works slightly differently. That said, I must admit that teeingAndThen is not my favorite either. The "tee" UNIX command is not "symmetrical". It interposes itself into the pipeline as a pass-through element (stdin -> stdout) and has (one ore more) additional outputs in the form of files. The name "tee" comes from T-splitter used in plumbing, where the pass-through direction is a straight line (the horizontal on letter T), while the additional output is the vertical: https://en.wikipedia.org/wiki/Piping_and_plumbing_fitting#Tee Out collector is symmetrical though. On 08/21/2018 07:43 AM, Stuart Marks wrote:
a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
My brain likes "duplex". duplex, duplexing or duplexingAndThen ? I think "duplexing". AndThen suffix is not needed. Even if there is a standard Pair class to come in the future, overloaded "duplexing" method could be added with no ambiguity (mental or javac). Regards, Peter
Hello, Stuart and Peter! Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/ 1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested. No changes in implementation since r3 revision. Please check. With best regards, Tagir Valeev. On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
Hi Tagir, I like duplexing more than teeingAndThen. If consensus can be established about the name, I think you will then want to update the CSR draft to reflect new name. Then we'll kindly ask Stuart if he has any more advice before submitting the CSR... Regards, Peter On 09/14/2018 10:41 AM, Tagir Valeev wrote:
Hello, Stuart and Peter!
Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/
1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested.
No changes in implementation since r3 revision. Please check.
With best regards, Tagir Valeev.
On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
Hi, I'd like to remind three little things to which there was no response (AFAIK): 1) Brian Goetz' suggestion of changing "? extends R" into "R": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054947.html 2) Stuart Marks' suggestion about renaming "c1" and "c2" to "downstream1" and "downstream2": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054949.html 3) my suggestion about renaming "merger" to "biFinisher" because "merger" means BinaryOperator everywhere else: - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055235.h... As to the name, I still think a name related to a well-understood concept (like the composite pattern [1]) would be better. Note that "composite" is also a verb [2], but "Collectors.compositing" looks a bit strange. "Collectors.composing" seems much better to me, but - as far as I understand - there was some concern that the users could misunderstand it as element-wise composition, is that right? Regards, Tomasz Linkowski [1] https://en.wikipedia.org/wiki/Composite_pattern [2] https://en.wiktionary.org/wiki/composite#Verb On Fri, Sep 14, 2018 at 11:23 AM, Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I like duplexing more than teeingAndThen. If consensus can be established about the name, I think you will then want to update the CSR draft to reflect new name. Then we'll kindly ask Stuart if he has any more advice before submitting the CSR...
Regards, Peter
On 09/14/2018 10:41 AM, Tagir Valeev wrote:
Hello, Stuart and Peter!
Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/
1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested.
No changes in implementation since r3 revision. Please check.
With best regards, Tagir Valeev.
On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
-- Pozdrawiam, Tomasz Linkowski
Hi Tagir, thanks for the update. Also thanks Tomasz for keeping everybody honest on the open issues. First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good. Unfortunately "duplex" is not really a verb. For some reason "duplexing" sounds OK. This probably follows in the long tradition of the tech industry's "verbing" of nouns and adjectives. However, I can't quite bring myself to recommend using this in the javadoc summary sentence, e.g., Returns a {@code Collector} that duplexes stream elements to two downstream collectors. Ugh. I think the current summary sentence is fine Returns a {@code Collector} that is a composite of two downstream collectors. even though it uses "composite" which is not the name of this collector. But I think it's a more descriptive term and it reads more smoothly. Turning to the issues mentioned by Tomasz:
1) Brian Goetz' suggestion of changing "? extends R" into "R": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054947.html Yes, I think this should be done. 2) Stuart Marks' suggestion about renaming "c1" and "c2" to "downstream1" and "downstream2": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054949.html Yes. It's a small thing, but the parameter names do appear in the javadoc. The text refers to "downstream" collectors, and naming the parameters "downstream" strengthens the association to the text. Otherwise, the reader has to think "what are c1 and c2? Oh, they're the downstream collectors." 3) my suggestion about renaming "merger" to "biFinisher" because "merger" means BinaryOperator everywhere else: - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055235.h... I'm ok with "merger". I don't feel that "merge" has a strong implication that the result must be the same type as the two inputs. I'm ok with the notion of "merge" taking two inputs (which might be of different types) and producing a single output (which might be a different type from either input). The difficulty with "biFinisher" is that it doesn't seem to imply "take two things and produce one". A finisher takes one input and produces one output, so would a "biFinisher" take two inputs and produce two outputs? As to the name, I still think a name related to a well-understood concept (like the composite pattern [1]) would be better. Note that "composite" is also a verb [2], but "Collectors.compositing" looks a bit strange. "Collectors.composing" seems much better to me, but - as far as I understand - there was some concern that the users could misunderstand it as element-wise composition, is that right? The difficulty I have with "compose" and "composition" is that with function composition, one function is applied, and the second function is applied to the result of the first. Of course that isn't what happens here. The notion of "duplexing" is that things are happening side-by-side, which applies well here, I think.
** Tagir, overall, looks good! Let me know when you've finished updating the CSR. Thanks, s'marks
Regards, Tomasz Linkowski
[1] https://en.wikipedia.org/wiki/Composite_pattern [2] https://en.wiktionary.org/wiki/composite#Verb
On Fri, Sep 14, 2018 at 11:23 AM, Peter Levart <peter.levart@gmail.com <mailto:peter.levart@gmail.com>> wrote:
Hi Tagir,
I like duplexing more than teeingAndThen. If consensus can be established about the name, I think you will then want to update the CSR draft to reflect new name. Then we'll kindly ask Stuart if he has any more advice before submitting the CSR...
Regards, Peter
On 09/14/2018 10:41 AM, Tagir Valeev wrote:
Hello, Stuart and Peter!
Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/ <http://cr.openjdk.java.net/%7Etvaleev/webrev/8205461/r4/>
1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested.
No changes in implementation since r3 revision. Please check.
With best regards, Tagir Valeev.
On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com <mailto:stuart.marks@oracle.com>> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 <https://bugs.openjdk.java.net/browse/JDK-8209685> (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com <mailto:peter.levart@gmail.com>> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ <http://cr.openjdk.java.net/%7Etvaleev/webrev/8205461/r3/> Also copyright year is updated
With best regards, Tagir Valeev
-- Pozdrawiam, Tomasz Linkowski
Hello! Tomasz, Peter, Stuart, Remi, thank you for review and comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r5/ 1. ? extends R -> R 2. parameter names c1, c2 -> downstream1, downstream2; Objects.requireNonNull messages updated correspondingly merger is left as is. I think it's fine.
Returns a {@code Collector} that is a composite of two downstream collectors.
even though it uses "composite" which is not the name of this collector. But I think it's a more descriptive term and it reads more smoothly.
I also think that using a different word in description is fine. It could be helpful for non-native speakers (some may better understand 'duplex', others are more familiar with 'composite'). CSR is also updated: https://bugs.openjdk.java.net/projects/JDK/issues/JDK-8209685 With best regards, Tagir Valeev. On Sat, Sep 15, 2018 at 12:09 AM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir, thanks for the update.
Also thanks Tomasz for keeping everybody honest on the open issues.
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
Unfortunately "duplex" is not really a verb. For some reason "duplexing" sounds OK. This probably follows in the long tradition of the tech industry's "verbing" of nouns and adjectives. However, I can't quite bring myself to recommend using this in the javadoc summary sentence, e.g.,
Returns a {@code Collector} that duplexes stream elements to two downstream collectors.
Ugh. I think the current summary sentence is fine
Returns a {@code Collector} that is a composite of two downstream collectors.
even though it uses "composite" which is not the name of this collector. But I think it's a more descriptive term and it reads more smoothly.
Turning to the issues mentioned by Tomasz:
1) Brian Goetz' suggestion of changing "? extends R" into "R": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054947.html
Yes, I think this should be done.
2) Stuart Marks' suggestion about renaming "c1" and "c2" to "downstream1" and "downstream2": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054949.html
Yes. It's a small thing, but the parameter names do appear in the javadoc. The text refers to "downstream" collectors, and naming the parameters "downstream" strengthens the association to the text. Otherwise, the reader has to think "what are c1 and c2? Oh, they're the downstream collectors."
3) my suggestion about renaming "merger" to "biFinisher" because "merger" means BinaryOperator everywhere else: - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055235.h...
I'm ok with "merger". I don't feel that "merge" has a strong implication that the result must be the same type as the two inputs. I'm ok with the notion of "merge" taking two inputs (which might be of different types) and producing a single output (which might be a different type from either input). The difficulty with "biFinisher" is that it doesn't seem to imply "take two things and produce one". A finisher takes one input and produces one output, so would a "biFinisher" take two inputs and produce two outputs?
As to the name, I still think a name related to a well-understood concept (like the composite pattern [1]) would be better. Note that "composite" is also a verb [2], but "Collectors.compositing" looks a bit strange. "Collectors.composing" seems much better to me, but - as far as I understand - there was some concern that the users could misunderstand it as element-wise composition, is that right?
The difficulty I have with "compose" and "composition" is that with function composition, one function is applied, and the second function is applied to the result of the first. Of course that isn't what happens here. The notion of "duplexing" is that things are happening side-by-side, which applies well here, I think.
**
Tagir, overall, looks good! Let me know when you've finished updating the CSR.
Thanks,
s'marks
Regards, Tomasz Linkowski
[1] https://en.wikipedia.org/wiki/Composite_pattern [2] https://en.wiktionary.org/wiki/composite#Verb
On Fri, Sep 14, 2018 at 11:23 AM, Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I like duplexing more than teeingAndThen. If consensus can be established about the name, I think you will then want to update the CSR draft to reflect new name. Then we'll kindly ask Stuart if he has any more advice before submitting the CSR...
Regards, Peter
On 09/14/2018 10:41 AM, Tagir Valeev wrote:
Hello, Stuart and Peter!
Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/
1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested.
No changes in implementation since r3 revision. Please check.
With best regards, Tagir Valeev.
On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
-- Pozdrawiam, Tomasz Linkowski
tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely to be a name we regret, for reasons outlined below. The behavior of this Collector is: - duplicate the stream into two identical streams - collect the two streams with two collectors, yielding two results - merge the two results into a single result Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, entirely accurate and explanatory, is "a bit" unwieldy. So the questions are: - how much can we drop and still be accurate - which parts are best to drop. When we pick names, we are not just trying to pick the best name for now, but we should imagine all the possible operations one might ever want to do in the future (names in the JDK are forever) and make a reasonable attempt to imagine whether this could cause confusion or regret in the future. To evaluate "duplexing" here (which seems the most important thing to keep), I'd ask: is there any other reasonable way to imagine a `duplexing` collect operation, now or in the future? One could imagine wanting an operation that takes a stream and produces two streams whose contents are that of the original stream. And "duplex" is a good name for that. But, it is not a Collector; it would be a stream transform, like concat. So that doesn't seem a conflict; a duplexing collector and a duplexing stream transform are sort of from "different namespaces." Can one imagine a "duplexing" Collector that doesn't do any collection? I cannot. Something that returns a pair of streams would not be a Collector, but something else. So dropping AndCollecting seems justified. What about "AndThenMerging"? The purpose of collect is to reduce the stream into a summary description. Can we imagine a duplexing operation that doesn't merge the two results, but instead just returns a tuple of the results? Yes, I can totally imagine this, especially once we have value types and records, which makes returning ad-hoc tuples cheaper (syntactically, heap-wise, CPU-wise.) So I think this is quite a reasonable possibility. But, I would have no problem with an overload that didn't take a merger and returned a tuple of the result, and was still called `duplexing`. So I'm fine with dropping all the extra AndThisAndThat. Finally, there's one other obvious direction we might extend this -- more than two collectors. There's no reason why we can only do two; we could take a (likely homogeneous) varargs of Collectors, and return a List of results -- which itself could then be streamed into another collector. This actually sounds pretty useful (though I'm not suggesting doing this right now.) And, I think it would be silly if this were not called the same thing as the two-collector version (just as it would be silly to have separate names for "concat two" and "concat n".) And, this is where I think "duplexing" runs out of gas -- duplex implies "two". Pedantic argue-for-the-sake-of-argument folks might observe that "tee" also has bilateral symmetry, but I don't think you could reasonably argue that a four-way "tee" is not less of an arity abuse than a four-way "duplex", and the plumbing industry would agree: https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM So, for these reasons, I still think "teeing" has a better balance of being both evocative what it does and likely to stand the test of time. On 9/14/2018 1:09 PM, Stuart Marks wrote:
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
Hello, Brian! Regarding more than two collectors. Some libraries definitely have analogs (e.g. [1]) which combine more than two collectors. To my opinion combining two collectors this way is an upper limit for readable code. Especially if you are going to collect to the list, you will have a list of untyped and unnamed results which positionally correspond to the collectors. If you have more than two collectors to combine, writing a separate accumulator class with accept/combine methods and creating a collector from the scratch would be much easier to read and support. A good example is IntSummaryStatistics and the corresponding summarizingInt collector. It could be emulated combining four collectors (maxBy, minBy, summingInt, counting), but having a dedicated class IntSummaryStatistics which does all four things explicitly is much better. It could be easily reused outside of Stream API context, it has well-named and well-typed accessor methods and it may contain other domain-specific methods like average(). Imagine if it were a List of four elements and you had to call summary.get(1) to get a maximum. So I think that supporting more than two collectors would encourage obscure programming. With best regards, Tagir Valeev [1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/j... (don't ask me why!) On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz@oracle.com> wrote:
tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely to be a name we regret, for reasons outlined below.
The behavior of this Collector is: - duplicate the stream into two identical streams - collect the two streams with two collectors, yielding two results - merge the two results into a single result
Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, entirely accurate and explanatory, is "a bit" unwieldy. So the questions are: - how much can we drop and still be accurate - which parts are best to drop.
When we pick names, we are not just trying to pick the best name for now, but we should imagine all the possible operations one might ever want to do in the future (names in the JDK are forever) and make a reasonable attempt to imagine whether this could cause confusion or regret in the future.
To evaluate "duplexing" here (which seems the most important thing to keep), I'd ask: is there any other reasonable way to imagine a `duplexing` collect operation, now or in the future?
One could imagine wanting an operation that takes a stream and produces two streams whose contents are that of the original stream. And "duplex" is a good name for that. But, it is not a Collector; it would be a stream transform, like concat. So that doesn't seem a conflict; a duplexing collector and a duplexing stream transform are sort of from "different namespaces."
Can one imagine a "duplexing" Collector that doesn't do any collection? I cannot. Something that returns a pair of streams would not be a Collector, but something else. So dropping AndCollecting seems justified.
What about "AndThenMerging"? The purpose of collect is to reduce the stream into a summary description. Can we imagine a duplexing operation that doesn't merge the two results, but instead just returns a tuple of the results? Yes, I can totally imagine this, especially once we have value types and records, which makes returning ad-hoc tuples cheaper (syntactically, heap-wise, CPU-wise.) So I think this is quite a reasonable possibility. But, I would have no problem with an overload that didn't take a merger and returned a tuple of the result, and was still called `duplexing`.
So I'm fine with dropping all the extra AndThisAndThat.
Finally, there's one other obvious direction we might extend this -- more than two collectors. There's no reason why we can only do two; we could take a (likely homogeneous) varargs of Collectors, and return a List of results -- which itself could then be streamed into another collector. This actually sounds pretty useful (though I'm not suggesting doing this right now.) And, I think it would be silly if this were not called the same thing as the two-collector version (just as it would be silly to have separate names for "concat two" and "concat n".)
And, this is where I think "duplexing" runs out of gas -- duplex implies "two". Pedantic argue-for-the-sake-of-argument folks might observe that "tee" also has bilateral symmetry, but I don't think you could reasonably argue that a four-way "tee" is not less of an arity abuse than a four-way "duplex", and the plumbing industry would agree:
https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
So, for these reasons, I still think "teeing" has a better balance of being both evocative what it does and likely to stand the test of time.
On 9/14/2018 1:09 PM, Stuart Marks wrote:
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
I agree with Tagir that supporting more than two Collectors sounds risky. I especially agree that well-typed and well-named accessors are important. I use the quoted library (jOOL), but I: - either avoid all those tuple-based functions, - or I use only Tuple2/Tuple3 and I map the tuple to a dedicated result type immediately (with Collectors.collectingAndThen) so that I get well-named accessors. Note that if you need to combine more than two (generally, N) collectors, you can just call duplexing() N-1 times and use intermediate result holders, like I did for N=3 in [1]. It may be a bit of boilerplate, but the only *other* way to do it without tuples in a well-typed manner for N=3 would be to introduce a new functional interface like TriFunction<T,U,V,R> as a merger. That said, I found Brian's line of reasoning about dropping name parts very convincing, and I really liked the analogy to a 4-way tee in plumbing. Finally, here's a summary of the characteristics of the possible results types for n-ary *heterogeneous* Collector composition: - List<?> => well-typed: NO, well-named: NO - n-ary tuple => well-typed: YES, well-named: NO - custom result holder => well-typed: YES, well-named: YES Personally, I don't find n-ary *homogeneous* Collector composition that much useful, but if it were to be added, I agree List<T> would be the best result type. Regards, Tomasz Linkowski [1] https://stackoverflow.com/a/52211175/2032415 On Sun, Sep 16, 2018 at 11:23 AM, Tagir Valeev <amaembo@gmail.com> wrote:
Hello, Brian!
Regarding more than two collectors. Some libraries definitely have analogs (e.g. [1]) which combine more than two collectors. To my opinion combining two collectors this way is an upper limit for readable code. Especially if you are going to collect to the list, you will have a list of untyped and unnamed results which positionally correspond to the collectors. If you have more than two collectors to combine, writing a separate accumulator class with accept/combine methods and creating a collector from the scratch would be much easier to read and support. A good example is IntSummaryStatistics and the corresponding summarizingInt collector. It could be emulated combining four collectors (maxBy, minBy, summingInt, counting), but having a dedicated class IntSummaryStatistics which does all four things explicitly is much better. It could be easily reused outside of Stream API context, it has well-named and well-typed accessor methods and it may contain other domain-specific methods like average(). Imagine if it were a List of four elements and you had to call summary.get(1) to get a maximum. So I think that supporting more than two collectors would encourage obscure programming.
With best regards, Tagir Valeev
[1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae 2804d2ee86/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple.java#L1282 (don't ask me why!)
On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz@oracle.com> wrote:
tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely to be a name we regret, for reasons outlined below.
The behavior of this Collector is: - duplicate the stream into two identical streams - collect the two streams with two collectors, yielding two results - merge the two results into a single result
Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, entirely accurate and explanatory, is "a bit" unwieldy. So the questions are: - how much can we drop and still be accurate - which parts are best to drop.
When we pick names, we are not just trying to pick the best name for now, but we should imagine all the possible operations one might ever want to do in the future (names in the JDK are forever) and make a reasonable attempt to imagine whether this could cause confusion or regret in the future.
To evaluate "duplexing" here (which seems the most important thing to keep), I'd ask: is there any other reasonable way to imagine a `duplexing` collect operation, now or in the future?
One could imagine wanting an operation that takes a stream and produces two streams whose contents are that of the original stream. And "duplex" is a good name for that. But, it is not a Collector; it would be a stream transform, like concat. So that doesn't seem a conflict; a duplexing collector and a duplexing stream transform are sort of from "different namespaces."
Can one imagine a "duplexing" Collector that doesn't do any collection? I cannot. Something that returns a pair of streams would not be a Collector, but something else. So dropping AndCollecting seems justified.
What about "AndThenMerging"? The purpose of collect is to reduce the stream into a summary description. Can we imagine a duplexing operation that doesn't merge the two results, but instead just returns a tuple of the results? Yes, I can totally imagine this, especially once we have value types and records, which makes returning ad-hoc tuples cheaper (syntactically, heap-wise, CPU-wise.) So I think this is quite a reasonable possibility. But, I would have no problem with an overload that didn't take a merger and returned a tuple of the result, and was still called `duplexing`.
So I'm fine with dropping all the extra AndThisAndThat.
Finally, there's one other obvious direction we might extend this -- more than two collectors. There's no reason why we can only do two; we could take a (likely homogeneous) varargs of Collectors, and return a List of results -- which itself could then be streamed into another collector. This actually sounds pretty useful (though I'm not suggesting doing this right now.) And, I think it would be silly if this were not called the same thing as the two-collector version (just as it would be silly to have separate names for "concat two" and "concat n".)
And, this is where I think "duplexing" runs out of gas -- duplex implies "two". Pedantic argue-for-the-sake-of-argument folks might observe that "tee" also has bilateral symmetry, but I don't think you could reasonably argue that a four-way "tee" is not less of an arity abuse than a four-way "duplex", and the plumbing industry would agree:
https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
So, for these reasons, I still think "teeing" has a better balance of being both evocative what it does and likely to stand the test of time.
On 9/14/2018 1:09 PM, Stuart Marks wrote:
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
-- Pozdrawiam, Tomasz Linkowski
The example of ISS is a good one. It is analogous to the question of "when is it right to write a class, and when it is right to write a function?" And the answer is, of course, "it depends." ISS was an obvious grouping, but even there there was significant disagreement during its design about what it should support and not (especially with regard to sum-of-squares calculations), and extra work done to make it extensible. If you're writing from scratch, you might well consider writing something like ISS. But ... the whole motivation for having "teeing" _at all_ is that you have some existing collectors you want to reuse! It seems a little silly to claim "I definitely will want to reuse two collectors, so much so that we need a new method, but can't imagine ever wanting to reuse three." So, while I am not saying we have to solve the N-way problem now, but I think we'd be silly to pick a naming scheme that falls apart when we try to go past two. So I'm still at "teeing". It works for two, and it works for larger numbers as well. On 9/16/2018 5:23 AM, Tagir Valeev wrote:
Hello, Brian!
Regarding more than two collectors. Some libraries definitely have analogs (e.g. [1]) which combine more than two collectors. To my opinion combining two collectors this way is an upper limit for readable code. Especially if you are going to collect to the list, you will have a list of untyped and unnamed results which positionally correspond to the collectors. If you have more than two collectors to combine, writing a separate accumulator class with accept/combine methods and creating a collector from the scratch would be much easier to read and support. A good example is IntSummaryStatistics and the corresponding summarizingInt collector. It could be emulated combining four collectors (maxBy, minBy, summingInt, counting), but having a dedicated class IntSummaryStatistics which does all four things explicitly is much better. It could be easily reused outside of Stream API context, it has well-named and well-typed accessor methods and it may contain other domain-specific methods like average(). Imagine if it were a List of four elements and you had to call summary.get(1) to get a maximum. So I think that supporting more than two collectors would encourage obscure programming.
With best regards, Tagir Valeev
[1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/j... (don't ask me why!)
On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz@oracle.com> wrote:
tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely to be a name we regret, for reasons outlined below.
The behavior of this Collector is: - duplicate the stream into two identical streams - collect the two streams with two collectors, yielding two results - merge the two results into a single result
Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, entirely accurate and explanatory, is "a bit" unwieldy. So the questions are: - how much can we drop and still be accurate - which parts are best to drop.
When we pick names, we are not just trying to pick the best name for now, but we should imagine all the possible operations one might ever want to do in the future (names in the JDK are forever) and make a reasonable attempt to imagine whether this could cause confusion or regret in the future.
To evaluate "duplexing" here (which seems the most important thing to keep), I'd ask: is there any other reasonable way to imagine a `duplexing` collect operation, now or in the future?
One could imagine wanting an operation that takes a stream and produces two streams whose contents are that of the original stream. And "duplex" is a good name for that. But, it is not a Collector; it would be a stream transform, like concat. So that doesn't seem a conflict; a duplexing collector and a duplexing stream transform are sort of from "different namespaces."
Can one imagine a "duplexing" Collector that doesn't do any collection? I cannot. Something that returns a pair of streams would not be a Collector, but something else. So dropping AndCollecting seems justified.
What about "AndThenMerging"? The purpose of collect is to reduce the stream into a summary description. Can we imagine a duplexing operation that doesn't merge the two results, but instead just returns a tuple of the results? Yes, I can totally imagine this, especially once we have value types and records, which makes returning ad-hoc tuples cheaper (syntactically, heap-wise, CPU-wise.) So I think this is quite a reasonable possibility. But, I would have no problem with an overload that didn't take a merger and returned a tuple of the result, and was still called `duplexing`.
So I'm fine with dropping all the extra AndThisAndThat.
Finally, there's one other obvious direction we might extend this -- more than two collectors. There's no reason why we can only do two; we could take a (likely homogeneous) varargs of Collectors, and return a List of results -- which itself could then be streamed into another collector. This actually sounds pretty useful (though I'm not suggesting doing this right now.) And, I think it would be silly if this were not called the same thing as the two-collector version (just as it would be silly to have separate names for "concat two" and "concat n".)
And, this is where I think "duplexing" runs out of gas -- duplex implies "two". Pedantic argue-for-the-sake-of-argument folks might observe that "tee" also has bilateral symmetry, but I don't think you could reasonably argue that a four-way "tee" is not less of an arity abuse than a four-way "duplex", and the plumbing industry would agree:
https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
So, for these reasons, I still think "teeing" has a better balance of being both evocative what it does and likely to stand the test of time.
On 9/14/2018 1:09 PM, Stuart Marks wrote:
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
Ok, teeing. Webrev updated: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r6/ CSR updated accordingly: https://bugs.openjdk.java.net/browse/JDK-8209685 With best regards, Tagir Valeev. On Fri, Sep 21, 2018 at 8:26 PM Brian Goetz <brian.goetz@oracle.com> wrote:
The example of ISS is a good one. It is analogous to the question of "when is it right to write a class, and when it is right to write a function?" And the answer is, of course, "it depends." ISS was an obvious grouping, but even there there was significant disagreement during its design about what it should support and not (especially with regard to sum-of-squares calculations), and extra work done to make it extensible. If you're writing from scratch, you might well consider writing something like ISS.
But ... the whole motivation for having "teeing" _at all_ is that you have some existing collectors you want to reuse! It seems a little silly to claim "I definitely will want to reuse two collectors, so much so that we need a new method, but can't imagine ever wanting to reuse three."
So, while I am not saying we have to solve the N-way problem now, but I think we'd be silly to pick a naming scheme that falls apart when we try to go past two. So I'm still at "teeing". It works for two, and it works for larger numbers as well.
On 9/16/2018 5:23 AM, Tagir Valeev wrote:
Hello, Brian!
Regarding more than two collectors. Some libraries definitely have analogs (e.g. [1]) which combine more than two collectors. To my opinion combining two collectors this way is an upper limit for readable code. Especially if you are going to collect to the list, you will have a list of untyped and unnamed results which positionally correspond to the collectors. If you have more than two collectors to combine, writing a separate accumulator class with accept/combine methods and creating a collector from the scratch would be much easier to read and support. A good example is IntSummaryStatistics and the corresponding summarizingInt collector. It could be emulated combining four collectors (maxBy, minBy, summingInt, counting), but having a dedicated class IntSummaryStatistics which does all four things explicitly is much better. It could be easily reused outside of Stream API context, it has well-named and well-typed accessor methods and it may contain other domain-specific methods like average(). Imagine if it were a List of four elements and you had to call summary.get(1) to get a maximum. So I think that supporting more than two collectors would encourage obscure programming.
With best regards, Tagir Valeev
[1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/j... (don't ask me why!)
On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz@oracle.com> wrote:
tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely to be a name we regret, for reasons outlined below.
The behavior of this Collector is: - duplicate the stream into two identical streams - collect the two streams with two collectors, yielding two results - merge the two results into a single result
Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, entirely accurate and explanatory, is "a bit" unwieldy. So the questions are: - how much can we drop and still be accurate - which parts are best to drop.
When we pick names, we are not just trying to pick the best name for now, but we should imagine all the possible operations one might ever want to do in the future (names in the JDK are forever) and make a reasonable attempt to imagine whether this could cause confusion or regret in the future.
To evaluate "duplexing" here (which seems the most important thing to keep), I'd ask: is there any other reasonable way to imagine a `duplexing` collect operation, now or in the future?
One could imagine wanting an operation that takes a stream and produces two streams whose contents are that of the original stream. And "duplex" is a good name for that. But, it is not a Collector; it would be a stream transform, like concat. So that doesn't seem a conflict; a duplexing collector and a duplexing stream transform are sort of from "different namespaces."
Can one imagine a "duplexing" Collector that doesn't do any collection? I cannot. Something that returns a pair of streams would not be a Collector, but something else. So dropping AndCollecting seems justified.
What about "AndThenMerging"? The purpose of collect is to reduce the stream into a summary description. Can we imagine a duplexing operation that doesn't merge the two results, but instead just returns a tuple of the results? Yes, I can totally imagine this, especially once we have value types and records, which makes returning ad-hoc tuples cheaper (syntactically, heap-wise, CPU-wise.) So I think this is quite a reasonable possibility. But, I would have no problem with an overload that didn't take a merger and returned a tuple of the result, and was still called `duplexing`.
So I'm fine with dropping all the extra AndThisAndThat.
Finally, there's one other obvious direction we might extend this -- more than two collectors. There's no reason why we can only do two; we could take a (likely homogeneous) varargs of Collectors, and return a List of results -- which itself could then be streamed into another collector. This actually sounds pretty useful (though I'm not suggesting doing this right now.) And, I think it would be silly if this were not called the same thing as the two-collector version (just as it would be silly to have separate names for "concat two" and "concat n".)
And, this is where I think "duplexing" runs out of gas -- duplex implies "two". Pedantic argue-for-the-sake-of-argument folks might observe that "tee" also has bilateral symmetry, but I don't think you could reasonably argue that a four-way "tee" is not less of an arity abuse than a four-way "duplex", and the plumbing industry would agree:
https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
So, for these reasons, I still think "teeing" has a better balance of being both evocative what it does and likely to stand the test of time.
On 9/14/2018 1:09 PM, Stuart Marks wrote:
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
Webrev looks good. In the CSR, I updated the webrev link to point to the latest, I set the fix-version to 12, and I set the scope to SE. I've marked the CSR reviewed. The next thing is for you to mark the CSR as Finalized. Thanks, s'marks On 9/24/18 3:39 AM, Tagir Valeev wrote:
Ok, teeing. Webrev updated: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r6/ CSR updated accordingly: https://bugs.openjdk.java.net/browse/JDK-8209685
With best regards, Tagir Valeev. On Fri, Sep 21, 2018 at 8:26 PM Brian Goetz <brian.goetz@oracle.com> wrote:
The example of ISS is a good one. It is analogous to the question of "when is it right to write a class, and when it is right to write a function?" And the answer is, of course, "it depends." ISS was an obvious grouping, but even there there was significant disagreement during its design about what it should support and not (especially with regard to sum-of-squares calculations), and extra work done to make it extensible. If you're writing from scratch, you might well consider writing something like ISS.
But ... the whole motivation for having "teeing" _at all_ is that you have some existing collectors you want to reuse! It seems a little silly to claim "I definitely will want to reuse two collectors, so much so that we need a new method, but can't imagine ever wanting to reuse three."
So, while I am not saying we have to solve the N-way problem now, but I think we'd be silly to pick a naming scheme that falls apart when we try to go past two. So I'm still at "teeing". It works for two, and it works for larger numbers as well.
On 9/16/2018 5:23 AM, Tagir Valeev wrote:
Hello, Brian!
Regarding more than two collectors. Some libraries definitely have analogs (e.g. [1]) which combine more than two collectors. To my opinion combining two collectors this way is an upper limit for readable code. Especially if you are going to collect to the list, you will have a list of untyped and unnamed results which positionally correspond to the collectors. If you have more than two collectors to combine, writing a separate accumulator class with accept/combine methods and creating a collector from the scratch would be much easier to read and support. A good example is IntSummaryStatistics and the corresponding summarizingInt collector. It could be emulated combining four collectors (maxBy, minBy, summingInt, counting), but having a dedicated class IntSummaryStatistics which does all four things explicitly is much better. It could be easily reused outside of Stream API context, it has well-named and well-typed accessor methods and it may contain other domain-specific methods like average(). Imagine if it were a List of four elements and you had to call summary.get(1) to get a maximum. So I think that supporting more than two collectors would encourage obscure programming.
With best regards, Tagir Valeev
[1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/j... (don't ask me why!)
On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz@oracle.com> wrote:
tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely to be a name we regret, for reasons outlined below.
The behavior of this Collector is: - duplicate the stream into two identical streams - collect the two streams with two collectors, yielding two results - merge the two results into a single result
Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, entirely accurate and explanatory, is "a bit" unwieldy. So the questions are: - how much can we drop and still be accurate - which parts are best to drop.
When we pick names, we are not just trying to pick the best name for now, but we should imagine all the possible operations one might ever want to do in the future (names in the JDK are forever) and make a reasonable attempt to imagine whether this could cause confusion or regret in the future.
To evaluate "duplexing" here (which seems the most important thing to keep), I'd ask: is there any other reasonable way to imagine a `duplexing` collect operation, now or in the future?
One could imagine wanting an operation that takes a stream and produces two streams whose contents are that of the original stream. And "duplex" is a good name for that. But, it is not a Collector; it would be a stream transform, like concat. So that doesn't seem a conflict; a duplexing collector and a duplexing stream transform are sort of from "different namespaces."
Can one imagine a "duplexing" Collector that doesn't do any collection? I cannot. Something that returns a pair of streams would not be a Collector, but something else. So dropping AndCollecting seems justified.
What about "AndThenMerging"? The purpose of collect is to reduce the stream into a summary description. Can we imagine a duplexing operation that doesn't merge the two results, but instead just returns a tuple of the results? Yes, I can totally imagine this, especially once we have value types and records, which makes returning ad-hoc tuples cheaper (syntactically, heap-wise, CPU-wise.) So I think this is quite a reasonable possibility. But, I would have no problem with an overload that didn't take a merger and returned a tuple of the result, and was still called `duplexing`.
So I'm fine with dropping all the extra AndThisAndThat.
Finally, there's one other obvious direction we might extend this -- more than two collectors. There's no reason why we can only do two; we could take a (likely homogeneous) varargs of Collectors, and return a List of results -- which itself could then be streamed into another collector. This actually sounds pretty useful (though I'm not suggesting doing this right now.) And, I think it would be silly if this were not called the same thing as the two-collector version (just as it would be silly to have separate names for "concat two" and "concat n".)
And, this is where I think "duplexing" runs out of gas -- duplex implies "two". Pedantic argue-for-the-sake-of-argument folks might observe that "tee" also has bilateral symmetry, but I don't think you could reasonably argue that a four-way "tee" is not less of an arity abuse than a four-way "duplex", and the plumbing industry would agree:
https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
So, for these reasons, I still think "teeing" has a better balance of being both evocative what it does and likely to stand the test of time.
On 9/14/2018 1:09 PM, Stuart Marks wrote:
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
In the CSR, I updated the webrev link to point to the latest, I set the fix-version to 12, and I set the scope to SE. I've marked the CSR reviewed.
The next thing is for you to mark the CSR as Finalized.
Done, thanks! With best regards, Tagir Valeev
Thanks,
s'marks
On 9/24/18 3:39 AM, Tagir Valeev wrote:
Ok, teeing. Webrev updated: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r6/ CSR updated accordingly: https://bugs.openjdk.java.net/browse/JDK-8209685
With best regards, Tagir Valeev. On Fri, Sep 21, 2018 at 8:26 PM Brian Goetz <brian.goetz@oracle.com> wrote:
The example of ISS is a good one. It is analogous to the question of "when is it right to write a class, and when it is right to write a function?" And the answer is, of course, "it depends." ISS was an obvious grouping, but even there there was significant disagreement during its design about what it should support and not (especially with regard to sum-of-squares calculations), and extra work done to make it extensible. If you're writing from scratch, you might well consider writing something like ISS.
But ... the whole motivation for having "teeing" _at all_ is that you have some existing collectors you want to reuse! It seems a little silly to claim "I definitely will want to reuse two collectors, so much so that we need a new method, but can't imagine ever wanting to reuse three."
So, while I am not saying we have to solve the N-way problem now, but I think we'd be silly to pick a naming scheme that falls apart when we try to go past two. So I'm still at "teeing". It works for two, and it works for larger numbers as well.
On 9/16/2018 5:23 AM, Tagir Valeev wrote:
Hello, Brian!
Regarding more than two collectors. Some libraries definitely have analogs (e.g. [1]) which combine more than two collectors. To my opinion combining two collectors this way is an upper limit for readable code. Especially if you are going to collect to the list, you will have a list of untyped and unnamed results which positionally correspond to the collectors. If you have more than two collectors to combine, writing a separate accumulator class with accept/combine methods and creating a collector from the scratch would be much easier to read and support. A good example is IntSummaryStatistics and the corresponding summarizingInt collector. It could be emulated combining four collectors (maxBy, minBy, summingInt, counting), but having a dedicated class IntSummaryStatistics which does all four things explicitly is much better. It could be easily reused outside of Stream API context, it has well-named and well-typed accessor methods and it may contain other domain-specific methods like average(). Imagine if it were a List of four elements and you had to call summary.get(1) to get a maximum. So I think that supporting more than two collectors would encourage obscure programming.
With best regards, Tagir Valeev
[1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/j... (don't ask me why!)
On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz@oracle.com> wrote:
tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely to be a name we regret, for reasons outlined below.
The behavior of this Collector is: - duplicate the stream into two identical streams - collect the two streams with two collectors, yielding two results - merge the two results into a single result
Obviously, a name like `duplexingAndCollectingAndThenMerging`, which, entirely accurate and explanatory, is "a bit" unwieldy. So the questions are: - how much can we drop and still be accurate - which parts are best to drop.
When we pick names, we are not just trying to pick the best name for now, but we should imagine all the possible operations one might ever want to do in the future (names in the JDK are forever) and make a reasonable attempt to imagine whether this could cause confusion or regret in the future.
To evaluate "duplexing" here (which seems the most important thing to keep), I'd ask: is there any other reasonable way to imagine a `duplexing` collect operation, now or in the future?
One could imagine wanting an operation that takes a stream and produces two streams whose contents are that of the original stream. And "duplex" is a good name for that. But, it is not a Collector; it would be a stream transform, like concat. So that doesn't seem a conflict; a duplexing collector and a duplexing stream transform are sort of from "different namespaces."
Can one imagine a "duplexing" Collector that doesn't do any collection? I cannot. Something that returns a pair of streams would not be a Collector, but something else. So dropping AndCollecting seems justified.
What about "AndThenMerging"? The purpose of collect is to reduce the stream into a summary description. Can we imagine a duplexing operation that doesn't merge the two results, but instead just returns a tuple of the results? Yes, I can totally imagine this, especially once we have value types and records, which makes returning ad-hoc tuples cheaper (syntactically, heap-wise, CPU-wise.) So I think this is quite a reasonable possibility. But, I would have no problem with an overload that didn't take a merger and returned a tuple of the result, and was still called `duplexing`.
So I'm fine with dropping all the extra AndThisAndThat.
Finally, there's one other obvious direction we might extend this -- more than two collectors. There's no reason why we can only do two; we could take a (likely homogeneous) varargs of Collectors, and return a List of results -- which itself could then be streamed into another collector. This actually sounds pretty useful (though I'm not suggesting doing this right now.) And, I think it would be silly if this were not called the same thing as the two-collector version (just as it would be silly to have separate names for "concat two" and "concat n".)
And, this is where I think "duplexing" runs out of gas -- duplex implies "two". Pedantic argue-for-the-sake-of-argument folks might observe that "tee" also has bilateral symmetry, but I don't think you could reasonably argue that a four-way "tee" is not less of an arity abuse than a four-way "duplex", and the plumbing industry would agree:
https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
So, for these reasons, I still think "teeing" has a better balance of being both evocative what it does and likely to stand the test of time.
On 9/14/2018 1:09 PM, Stuart Marks wrote:
First, naming. I think "duplex" as the root word wins! Using "duplexing" to conform to many of other collectors is fine; so, "duplexing" is good.
Hi, I've already sent an email to this list regarding this issue, but I'm not sure if it finally made its way. So here I'm sending something again w.r.t the name... duplexing might be OK, but collectingBothAndThen is hard to beat. Arguments: - Existing collectingAndThen method is the simpler version of this guy, why introducing other verbs? - While being short (for all the things this operation does), it is yet an extremely descriptive name; this method does exactly what its name says, i.e. it really collects the stream using two collectors and then applies a function to transform both collected results into the final result - It reads naturally when used along with other existing methods of the Collectors API, especially when used as a downstream collector, i.e. filtering( n -> n > 0, collectingBothAndThen( summingDouble(e -> e.getDouble()), counting(), (sum, count) -> sum / count)); Just my 2 cents. Thanks, fps.- El 14 set. 2018 2:44 p.m., "Tomasz Linkowski" <t.linkowski@gmail.com> escribió: Hi, I'd like to remind three little things to which there was no response (AFAIK): 1) Brian Goetz' suggestion of changing "? extends R" into "R": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054947.html 2) Stuart Marks' suggestion about renaming "c1" and "c2" to "downstream1" and "downstream2": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-August/054949.html 3) my suggestion about renaming "merger" to "biFinisher" because "merger" means BinaryOperator everywhere else: - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-September/055235.h... As to the name, I still think a name related to a well-understood concept (like the composite pattern [1]) would be better. Note that "composite" is also a verb [2], but "Collectors.compositing" looks a bit strange. "Collectors.composing" seems much better to me, but - as far as I understand - there was some concern that the users could misunderstand it as element-wise composition, is that right? Regards, Tomasz Linkowski [1] https://en.wikipedia.org/wiki/Composite_pattern [2] https://en.wiktionary.org/wiki/composite#Verb On Fri, Sep 14, 2018 at 11:23 AM, Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I like duplexing more than teeingAndThen. If consensus can be established about the name, I think you will then want to update the CSR draft to reflect new name. Then we'll kindly ask Stuart if he has any more advice before submitting the CSR...
Regards, Peter
On 09/14/2018 10:41 AM, Tagir Valeev wrote:
Hello, Stuart and Peter!
Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/
1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested.
No changes in implementation since r3 revision. Please check.
With best regards, Tagir Valeev.
On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
-- Pozdrawiam, Tomasz Linkowski
Hi all, Stuart, thanks for your response and explanations, I really appreciate it! Good point about function composition - didn't cross my mind, but it could be confusing indeed. As for "merger" - good point about "biFinisher", but I'm still unconvinced about "merger", because usually, you won't *merge* the two results (like summing ints or concatenating strings) - instead, you'll usually *compose* them into another object (yeah, and a moment before I said "composition" could be confusing). But how about the name "duplexer" in the meaning of "to make into a duplex"? As for "duplex" not being a verb - according to some dictionaries, "duplex" IS a verb: - https://en.wiktionary.org/wiki/duplex#Verb - https://www.dictionary.com/browse/duplex - https://www.merriam-webster.com/dictionary/duplex That said, I certainly wouldn't object against the word "composite" finding its way into the JavaDoc. Federico, since I proposed "collectingBothAndThen", I obviously agree with you. To be honest, though, the major drawback of this name would be its length (both wordwise and letterwise). On the other hand, it seems that "collectingBothAndThen" could be much easier to understand than "duplexing", especially for non-natives. Regards, Tomasz Linkowski On Fri, Sep 14, 2018 at 9:11 PM, Federico Peralta Schaffner < federico.peralta@gmail.com> wrote:
Hi,
I've already sent an email to this list regarding this issue, but I'm not sure if it finally made its way. So here I'm sending something again w.r.t the name...
duplexing might be OK, but
collectingBothAndThen
is hard to beat.
Arguments:
- Existing collectingAndThen method is the simpler version of this guy, why introducing other verbs?
- While being short (for all the things this operation does), it is yet an extremely descriptive name; this method does exactly what its name says, i.e. it really collects the stream using two collectors and then applies a function to transform both collected results into the final result
- It reads naturally when used along with other existing methods of the Collectors API, especially when used as a downstream collector, i.e.
filtering( n -> n > 0, collectingBothAndThen( summingDouble(e -> e.getDouble()), counting(), (sum, count) -> sum / count));
Just my 2 cents.
Thanks, fps.-
El 14 set. 2018 2:44 p.m., "Tomasz Linkowski" <t.linkowski@gmail.com> escribió:
Hi,
I'd like to remind three little things to which there was no response (AFAIK):
1) Brian Goetz' suggestion of changing "? extends R" into "R": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018- August/054947.html
2) Stuart Marks' suggestion about renaming "c1" and "c2" to "downstream1" and "downstream2": - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018- August/054949.html
3) my suggestion about renaming "merger" to "biFinisher" because "merger" means BinaryOperator everywhere else: - http://mail.openjdk.java.net/pipermail/core-libs-dev/2018- September/055235.html
As to the name, I still think a name related to a well-understood concept (like the composite pattern [1]) would be better. Note that "composite" is also a verb [2], but "Collectors.compositing" looks a bit strange. "Collectors.composing" seems much better to me, but - as far as I understand - there was some concern that the users could misunderstand it as element-wise composition, is that right?
Regards, Tomasz Linkowski
[1] https://en.wikipedia.org/wiki/Composite_pattern [2] https://en.wiktionary.org/wiki/composite#Verb
On Fri, Sep 14, 2018 at 11:23 AM, Peter Levart <peter.levart@gmail.com>
wrote:
Hi Tagir,
I like duplexing more than teeingAndThen. If consensus can be established about the name, I think you will then want to update the CSR draft to reflect new name. Then we'll kindly ask Stuart if he has any more advice before submitting the CSR...
Regards, Peter
On 09/14/2018 10:41 AM, Tagir Valeev wrote:
Hello, Stuart and Peter!
Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/
1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested.
No changes in implementation since r3 revision. Please check.
With best regards, Tagir Valeev.
On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
-- Pozdrawiam, Tomasz Linkowski
-- Pozdrawiam, Tomasz Linkowski
I'm neither Stuart nor Peter but this looks good to me. Rémi ----- Mail original -----
De: "Tagir Valeev" <amaembo@gmail.com> À: "Stuart Marks" <stuart.marks@oracle.com> Cc: "core-libs-dev" <core-libs-dev@openjdk.java.net> Envoyé: Vendredi 14 Septembre 2018 10:41:53 Objet: Re: RFR: JDK-8205461 Create Collector which merges results of two other collectors
Hello, Stuart and Peter!
Thank you for valuable comments. I updated the webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r4/
1. I renamed "teeingAndThen" to "duplexing". Brian insisted that "-ing" suffix shall present and I agree. Hopefully it's final name. 2. I updated the spec as Stuart suggested.
No changes in implementation since r3 revision. Please check.
With best regards, Tagir Valeev.
On Tue, Aug 21, 2018 at 12:43 PM Stuart Marks <stuart.marks@oracle.com> wrote:
Hi Tagir,
Thanks for working on this. This looks really cool! And thanks Peter for agreeing to sponsor it.
I can help out with the CSR. My first bit of advice about the CSR process is to hold off until the specification is complete. :-)
I think the intent of the API is fine, but I think some details of the returned collector need to be ironed out first.
1. The spec doesn't say what the returned collector's supplier, accumulator, combiner, and finisher do. On the one hand, we don't necessarily want to describe the actual implementation. On the other hand, we need to specify how the thing actually behaves. One can certainly deduce the intended behavior from the description, but this really needs to be specified, and it mustn't rely on the reader having to derive the required behaviors. Since the actual implementation is fairly simple, the spec might end up being rather close to the implementation, but that might be unavoidable.
I'm envisioning something like this:
- supplier: creates a result container that contains result containers obtained by calling each collector's supplier
- accumulator: calls each collector's accumulator with its result container and the input element
... and similar for the combiner and finisher functions.
2. Characteristics
- UNORDERED: should the returned collector be UNORDERED if *either* of the provided collectors is UNORDERED? (Current draft says *both*.)
- CONCURRENT: current draft seems correct
- IDENTITY_FINISH: clearly not present; perhaps this should be specified
3. Parameter naming
The collector parameters are referred to as "specified collectors" or "supplied collectors". Other "higher-order" collectors refer to their underlying collectors as "downstream" collectors. I think it would be useful to work the "downstream" concept into this collector. This would enable the opening summary sentence of the doc to be something like, "Returns a collector that is a composite of two downstream collectors" or some such. (But see naming below.)
4. Naming
Sigh, naming is hard, and I know there was a fair amount of discussion in the previous thread and earlier in this one, but it seems like there's still some dissatisfaction with the name. (And I'm not particularly thrilled with teeingAndThen myself.) In a few minutes I've managed to come up with a few more names that (mostly) don't seem to have been proposed before, and so here they are for your consideration:
- compound - composite - conjoined - bonded - fused - duplex
Discussion:
A "composite" evokes function composition; this might be good, though it might be odd in that collectors can't be composed in the same way that functions are.
"Compound" might be a useful alternative. In chemistry, two substances are combined (or bonded, or fused) to form a single substance, which is a compound.
"Conjoin" seems to adequately describe the structure of the two collectors, but it lacks somewhat the connotation of unifying them.
In an earlier discussion, Brian had pushed back on names related to split/fork/merge/join since those are currently in use in streams regarding splitting of input elements and merging of results. In describing how the current proposal differs, he said that elements are "multiplexed" to the different collectors. Since we're doing this with two collectors, how about "duplex"? (I note that Jacob Glickman also had suggested "duplex".)
s'marks
On 8/20/18 1:48 AM, Tagir Valeev wrote:
Hello!
A CSR is created: https://bugs.openjdk.java.net/browse/JDK-8209685 (this is my first CSR, hopefully I did it correctly)
With best regards, Tagir Valeev. On Mon, Aug 20, 2018 at 2:06 PM Peter Levart <peter.levart@gmail.com> wrote:
Hi Tagir,
I think this looks very good. It just needs a CSR. Will you file it?
Regards, Peter
On 08/19/2018 11:24 AM, Tagir Valeev wrote:
Hello, Brian!
Of the three phases, teeing is the most important and least obvious, so I think something that includes that in the name is going to be helpful. Perhaps "teeingAndThen" is more evocative and not totally unwieldy.
Ok, sounds acceptable to me. Renamed pairing to teeingAndThen.
By the way looking into CollectorsTest.java I found some minor things to cleanup: 1. `.map(mapper::apply)` and `.flatMap(mapper::apply)` can be replaced with simple `.map(mapper)` and `.flatMap(mapper)` respectively
Does IntelliJ have an inspection for eliminating such locutions?
Sure, that's how I found them. Well, I took the liberty to fix these two things.
2. In many methods redundant `throws ReflectiveOperationException` is declared while exception is never thrown
For test code where a significant fraction of test cases are going to throw something, we often do this, since its easier to just uniformly tag such methods rather than thinking about which test methods actually throw the exception and which don't. So I think this is harmless (though cleaning it up is harmless too.)
I'm not thinking about this, because my IDE thinks for me :-) Ok, I'll leave them as is for now.
You may want to optimize the EnumSet mechanics for the case where neither collector has interesting characteristics.
Added a special case when reported characteristics for either of collectors are empty or IDENTITY_FINISH only. I think this should be a common case.
The updated webrev is posted here (along with Peter suggestion to rename finisher to merger): http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r3/ Also copyright year is updated
With best regards, Tagir Valeev
participants (8)
-
Brian Goetz
-
Federico Peralta Schaffner
-
Peter Levart
-
Remi Forax
-
Roger Riggs
-
Stuart Marks
-
Tagir Valeev
-
Tomasz Linkowski