RFR-8148838: Stream.flatMap(...).spliterator() cannot properly split after tryAdvance()
Please review and sponsor this fix: https://bugs.openjdk.java.net/browse/JDK-8148838 http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r1/ When buffer traversal is already started, and split is requested, then the existing buffer should be carefully transferred to the prefix part. The only problem I see here is the testing time. Due to permuteFunctions() call it increases significantly what I added the flatMap() test. If this becomes an issue, I can write new simple test which tests flatMap() only (without permutations with other intermediate operations). With best regards, Tagir Valeev.
Hi Tagir, Nice find. My inclination is to keep this simple and not support splitting after partial traversal. Sometimes splitting after partial traversal might be more complex not support (e.g. ArrayList). In other cases it’s more complex to support and in such cases i would argue it is not worth it since this kind of functionality is an edge case. Testing wise you are right to be concerned about the increase in test execution time. The lack of flatMap is definitely an omission. In this case I think it is ok to replace map with flatMap, as both result in stuff going into the holding buffer, and we can use the smaller data sets, e.g. "StreamTestData<Integer>.small”, that were recently added. Paul.
On 2 Feb 2016, at 09:24, Tagir F. Valeev <amaembo@gmail.com> wrote:
Please review and sponsor this fix:
https://bugs.openjdk.java.net/browse/JDK-8148838 http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r1/
When buffer traversal is already started, and split is requested, then the existing buffer should be carefully transferred to the prefix part.
The only problem I see here is the testing time. Due to permuteFunctions() call it increases significantly what I added the flatMap() test. If this becomes an issue, I can write new simple test which tests flatMap() only (without permutations with other intermediate operations).
With best regards, Tagir Valeev.
Hello! Here's updated webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r2/ PS> My inclination is to keep this simple and not support splitting after partial traversal. PS> Sometimes splitting after partial traversal might be more complex PS> not support (e.g. ArrayList). In other cases it’s more complex to PS> support and in such cases i would argue it is not worth it since PS> this kind of functionality is an edge case. I would still like that this problem is fixed (i.e. support splitting after advance, not just return null). This is probably an edge case for JDK, but might be crucial for library writers. As a library writer I actually had bad times working around this issue. While returning null instead of incorrect splitting would indeed be helpful, it will unnecessarily remove parallelization capabilities in some cases. For example, sometimes it's desired to extract the first element from the stream and create the stream from the tail. Returning null here would mean that the resulting stream will never split after that. To my opinion this fix is not very complex. It just adds several lines into single method (trySplit()). I added a comment which clarifies the intention. Other changes like extraction of initPusher() do not add complexity. Actually they reduce the complexity as four identical lambdas are merged into one. This patch actually reduces the size of created class files. With javac 9-ea+99 StreamSpliterators.java is compiled into 79004 bytes after my patch and 79472 bytes before my patch. Also this patch does not require primitive specializations and adds no overhead for common case when traversal is not started. PS> Testing wise you are right to be concerned about the increase in PS> test execution time. The lack of flatMap is definitely an PS> omission. In this case I think it is ok to replace map with PS> flatMap, as both result in stuff going into the holding buffer, PS> and we can use the smaller data sets, e.g. PS> "StreamTestData<Integer>.small”, that were recently added. I replaced all the datasets with .small alternatives (I think it's ok here as WrappingSpliterator behavior does not differ much for various sources) and removed all .map tests. Now the test works faster (like 25 sec -> 17 sec on by box). Please check. With best regards, Tagir Valeev. PS. PS> Paul.
On 2 Feb 2016, at 09:24, Tagir F. Valeev <amaembo@gmail.com> wrote:
Please review and sponsor this fix:
https://bugs.openjdk.java.net/browse/JDK-8148838 http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r1/
When buffer traversal is already started, and split is requested, then the existing buffer should be carefully transferred to the prefix part.
The only problem I see here is the testing time. Due to permuteFunctions() call it increases significantly what I added the flatMap() test. If this becomes an issue, I can write new simple test which tests flatMap() only (without permutations with other intermediate operations).
With best regards, Tagir Valeev.
Hi Tagir, Test updates look good, thanks, that should reduce the test times on some of our test machines. I still disagree and pushing back on the support for splitting after partial traversal. It’s not a pattern i think we should encourage and propagate so such behaviour can be generally relied upon, and predominantly for edge cases. That’s where the complexity string is being pulled on. While your fix in isolation is not terribly complex, it is more complex than the alternative (which was actually the intent of the current impl, we just forget to include the check). Paul.
On 3 Feb 2016, at 12:19, Tagir F. Valeev <amaembo@gmail.com> wrote:
Hello!
Here's updated webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r2/
PS> My inclination is to keep this simple and not support splitting after partial traversal.
PS> Sometimes splitting after partial traversal might be more complex PS> not support (e.g. ArrayList). In other cases it’s more complex to PS> support and in such cases i would argue it is not worth it since PS> this kind of functionality is an edge case.
I would still like that this problem is fixed (i.e. support splitting after advance, not just return null). This is probably an edge case for JDK, but might be crucial for library writers. As a library writer I actually had bad times working around this issue. While returning null instead of incorrect splitting would indeed be helpful, it will unnecessarily remove parallelization capabilities in some cases. For example, sometimes it's desired to extract the first element from the stream and create the stream from the tail. Returning null here would mean that the resulting stream will never split after that.
To my opinion this fix is not very complex. It just adds several lines into single method (trySplit()). I added a comment which clarifies the intention. Other changes like extraction of initPusher() do not add complexity. Actually they reduce the complexity as four identical lambdas are merged into one. This patch actually reduces the size of created class files. With javac 9-ea+99 StreamSpliterators.java is compiled into 79004 bytes after my patch and 79472 bytes before my patch. Also this patch does not require primitive specializations and adds no overhead for common case when traversal is not started.
PS> Testing wise you are right to be concerned about the increase in PS> test execution time. The lack of flatMap is definitely an PS> omission. In this case I think it is ok to replace map with PS> flatMap, as both result in stuff going into the holding buffer, PS> and we can use the smaller data sets, e.g. PS> "StreamTestData<Integer>.small”, that were recently added.
I replaced all the datasets with .small alternatives (I think it's ok here as WrappingSpliterator behavior does not differ much for various sources) and removed all .map tests. Now the test works faster (like 25 sec -> 17 sec on by box). Please check.
With best regards, Tagir Valeev.
PS.
PS> Paul.
On 2 Feb 2016, at 09:24, Tagir F. Valeev <amaembo@gmail.com> wrote:
Please review and sponsor this fix:
https://bugs.openjdk.java.net/browse/JDK-8148838 http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r1/
When buffer traversal is already started, and split is requested, then the existing buffer should be carefully transferred to the prefix part.
The only problem I see here is the testing time. Due to permuteFunctions() call it increases significantly what I added the flatMap() test. If this becomes an issue, I can write new simple test which tests flatMap() only (without permutations with other intermediate operations).
With best regards, Tagir Valeev.
Hello! PS> I still disagree and pushing back on the support for splitting PS> after partial traversal. It’s not a pattern i think we should PS> encourage and propagate so such behaviour can be generally relied PS> upon, and predominantly for edge cases. That’s where the PS> complexity string is being pulled on. While your fix in isolation PS> is not terribly complex, it is more complex than the alternative PS> (which was actually the intent of the current impl, we just forget to include the check). I still don't like doing this, but as Brian agreed with you [1], seems I have no other choice. Here's updated webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r3/ With best regards, Tagir Valeev. [1] https://twitter.com/BrianGoetz/status/694985109628387328 PS> Paul.
On 3 Feb 2016, at 12:19, Tagir F. Valeev <amaembo@gmail.com> wrote:
Hello!
Here's updated webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r2/
PS> My inclination is to keep this simple and not support splitting after partial traversal.
PS> Sometimes splitting after partial traversal might be more complex PS> not support (e.g. ArrayList). In other cases it’s more complex to PS> support and in such cases i would argue it is not worth it since PS> this kind of functionality is an edge case.
I would still like that this problem is fixed (i.e. support splitting after advance, not just return null). This is probably an edge case for JDK, but might be crucial for library writers. As a library writer I actually had bad times working around this issue. While returning null instead of incorrect splitting would indeed be helpful, it will unnecessarily remove parallelization capabilities in some cases. For example, sometimes it's desired to extract the first element from the stream and create the stream from the tail. Returning null here would mean that the resulting stream will never split after that.
To my opinion this fix is not very complex. It just adds several lines into single method (trySplit()). I added a comment which clarifies the intention. Other changes like extraction of initPusher() do not add complexity. Actually they reduce the complexity as four identical lambdas are merged into one. This patch actually reduces the size of created class files. With javac 9-ea+99 StreamSpliterators.java is compiled into 79004 bytes after my patch and 79472 bytes before my patch. Also this patch does not require primitive specializations and adds no overhead for common case when traversal is not started.
PS> Testing wise you are right to be concerned about the increase in PS> test execution time. The lack of flatMap is definitely an PS> omission. In this case I think it is ok to replace map with PS> flatMap, as both result in stuff going into the holding buffer, PS> and we can use the smaller data sets, e.g. PS> "StreamTestData<Integer>.small”, that were recently added.
I replaced all the datasets with .small alternatives (I think it's ok here as WrappingSpliterator behavior does not differ much for various sources) and removed all .map tests. Now the test works faster (like 25 sec -> 17 sec on by box). Please check.
With best regards, Tagir Valeev.
PS.
PS> Paul.
On 2 Feb 2016, at 09:24, Tagir F. Valeev <amaembo@gmail.com> wrote:
Please review and sponsor this fix:
https://bugs.openjdk.java.net/browse/JDK-8148838 http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r1/
When buffer traversal is already started, and split is requested, then the existing buffer should be carefully transferred to the prefix part.
The only problem I see here is the testing time. Due to permuteFunctions() call it increases significantly what I added the flatMap() test. If this becomes an issue, I can write new simple test which tests flatMap() only (without permutations with other intermediate operations).
With best regards, Tagir Valeev.
On 6 Feb 2016, at 14:29, Tagir F. Valeev <amaembo@gmail.com> wrote:
Hello!
PS> I still disagree and pushing back on the support for splitting PS> after partial traversal. It’s not a pattern i think we should PS> encourage and propagate so such behaviour can be generally relied PS> upon, and predominantly for edge cases. That’s where the PS> complexity string is being pulled on. While your fix in isolation PS> is not terribly complex, it is more complex than the alternative PS> (which was actually the intent of the current impl, we just forget to include the check).
I still don't like doing this, but as Brian agreed with you [1], seems I have no other choice.
Thanks for accommodating.
Here's updated webrev: http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r3/
188 public Spliterator<P_OUT> trySplit() { 189 if (isParallel && !finished) { 190 init(); 191 192 if (buffer != null && buffer.count() > 0) // partial traversal started 193 return null; Why don’t you check if "buffer == null” at #189? i.e. similar to forEachRemaining: @Override public void forEachRemaining(Consumer<? super P_OUT> consumer) { if (buffer == null && !finished) { For clarity and consistency we should key off the "buffer == null” partial traversal guard. The state of "buffer != null" and “buffer.count() == 0” will be when traversal has completed i.e. "finished == true" (see fillBuffer). Paul.
Hello! PS> 188 public Spliterator<P_OUT> trySplit() { PS> 189 if (isParallel && !finished) { PS> 190 init(); PS> 191 PS> 192 if (buffer != null && buffer.count() > 0) // partial traversal started PS> 193 return null; PS> Why don’t you check if "buffer == null” at #189? i.e. similar to forEachRemaining: That would make minimal behavioral change to fix this issue (fix flatMap keys only, but not affect other intermediate ops which were working correctly). Well, if buffer == null check is enough, here's update: http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r4/ (test unchanged) With best regards, Tagir Valeev.
On 8 Feb 2016, at 14:53, Tagir F. Valeev <amaembo@gmail.com> wrote:
Hello!
PS> 188 public Spliterator<P_OUT> trySplit() { PS> 189 if (isParallel && !finished) { PS> 190 init(); PS> 191 PS> 192 if (buffer != null && buffer.count() > 0) // partial traversal started PS> 193 return null;
PS> Why don’t you check if "buffer == null” at #189? i.e. similar to forEachRemaining:
That would make minimal behavioral change to fix this issue (fix flatMap keys only, but not affect other intermediate ops which were working correctly). Well, if buffer == null check is enough, here's update:
Many thanks, yes it’s sufficient (finished == true when buffer != null && buffer.count() == 0). Would you mind updating the date in the license headers and the test with the bug id. Then i will push. I forgot to tell you about those little things, see the following for one of your fixes i pushed with such updates (to avoid any email/review latency): http://hg.openjdk.java.net/jdk9/dev/jdk/rev/4a497e746019 <http://hg.openjdk.java.net/jdk9/dev/jdk/rev/4a497e746019> Paul.
http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r4/ (test unchanged)
With best regards, Tagir Valeev.
Hello! PS> Would you mind updating the date in the license headers and the PS> test with the bug id. Then i will push. I forgot to tell you about PS> those little things, see the following for one of your fixes i PS> pushed with such updates (to avoid any email/review latency): Ah, sorry. I saw this, but always forget to add to my patch. Will take care about these things in future. Updated: http://cr.openjdk.java.net/~tvaleev/webrev/8148838/r5/ Also updated 8147505 webrev correspondingly (seems it's also not pushed yet): http://cr.openjdk.java.net/~tvaleev/webrev/8147505/r3/ With best regards, Tagir Valeev.
On 8 Feb 2016, at 15:23, Tagir F. Valeev <amaembo@gmail.com> wrote:
Hello!
PS> Would you mind updating the date in the license headers and the PS> test with the bug id. Then i will push. I forgot to tell you about PS> those little things, see the following for one of your fixes i PS> pushed with such updates (to avoid any email/review latency):
Ah, sorry. I saw this, but always forget to add to my patch. Will take care about these things in future. Updated:
Thanks. Pushed.
Also updated 8147505 webrev correspondingly (seems it's also not pushed yet): http://cr.openjdk.java.net/~tvaleev/webrev/8147505/r3/
Right not pushed yet, i was pondering if i should submit an internal CCC as it is arguably a minor behavioural change. I'll sit on it for another day. BTW we usually place multiple bugs ids on the same line using under one @bug annotation. Paul.
2016-02-08 15:05 GMT+01:00 Paul Sandoz <paul.sandoz@oracle.com>:
On 8 Feb 2016, at 14:53, Tagir F. Valeev <amaembo@gmail.com> wrote:
PS> Why don’t you check if "buffer == null” at #189? i.e. similar to forEachRemaining:
That would make minimal behavioral change to fix this issue (fix flatMap keys only, but not affect other intermediate ops which were working correctly). Well, if buffer == null check is enough, here's update:
Many thanks, yes it’s sufficient (finished == true when buffer != null && buffer.count() == 0).
Hi all, Question: will this get backported to Java 8 ? Regards, Stefan
participants (3)
-
Paul Sandoz
-
Stefan Zobel
-
Tagir F. Valeev