RFR: 8265029: Preserve SIZED characteristics on slice operations (skip, limit)
Vladimir Sitnikov
vsitnikov at openjdk.java.net
Sat Apr 10 14:22:36 UTC 2021
On Sat, 10 Apr 2021 06:30:33 GMT, Tagir F. Valeev <tvaleev at openjdk.org> wrote:
> With the introduction of `toList()`, preserving the SIZED characteristics in more cases becomes more important. This patch preserves SIZED on `skip()` and `limit()` operations, so now every combination of `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and `toList()`, `toArray()` and `count()` may benefit from this. E. g., `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result instantly with this patch.
>
> Some microbenchmarks added that confirm the reduced memory allocation in `toList()` and `toArray()` cases. Before patch:
> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000 thrpt 10 40235,534 ± 0,984 B/op
> ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000 thrpt 10 106431,101 ± 0,198 B/op
> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000 thrpt 10 106544,977 ± 1,983 B/op
> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000 thrpt 10 40121,878 ± 0,247 B/op
> value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000 thrpt 10 106317,693 ± 1,083 B/op
> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000 thrpt 10 106430,954 ± 0,136 B/op
>
> After patch:
> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000 thrpt 10 40235,648 ± 1,354 B/op
> ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000 thrpt 10 40355,784 ± 1,288 B/op
> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000 thrpt 10 40476,032 ± 2,855 B/op
> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000 thrpt 10 40121,830 ± 0,308 B/op
> value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000 thrpt 10 40242,554 ± 0,443 B/op
> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000 thrpt 10 40363,674 ± 1,576 B/op
>
> Time improvements are less exciting. It's likely that inlining and vectorizing dominate in these tests over array allocations and unnecessary copying. Still, I notice a significant improvement in SliceToArray.seq_limit case (2x) and mild improvement (+12..16%) in other slice tests. No significant change in parallel execution time, though its performance is much less stable and I didn't run enough tests.
>
> Before patch:
> Benchmark (size) Mode Cnt Score Error Units
> ref.SliceToList.par_baseline 10000 thrpt 30 14876,723 ± 99,770 ops/s
> ref.SliceToList.par_limit 10000 thrpt 30 14856,841 ± 215,089 ops/s
> ref.SliceToList.par_skipLimit 10000 thrpt 30 9555,818 ± 991,335 ops/s
> ref.SliceToList.seq_baseline 10000 thrpt 30 23732,290 ± 444,162 ops/s
> ref.SliceToList.seq_limit 10000 thrpt 30 14894,040 ± 176,496 ops/s
> ref.SliceToList.seq_skipLimit 10000 thrpt 30 10646,929 ± 36,469 ops/s
> value.SliceToArray.par_baseline 10000 thrpt 30 25093,141 ± 376,402 ops/s
> value.SliceToArray.par_limit 10000 thrpt 30 24798,889 ± 760,762 ops/s
> value.SliceToArray.par_skipLimit 10000 thrpt 30 16456,310 ± 926,882 ops/s
> value.SliceToArray.seq_baseline 10000 thrpt 30 69669,787 ± 494,562 ops/s
> value.SliceToArray.seq_limit 10000 thrpt 30 21097,081 ± 117,338 ops/s
> value.SliceToArray.seq_skipLimit 10000 thrpt 30 15522,871 ± 112,557 ops/s
>
> After patch:
> Benchmark (size) Mode Cnt Score Error Units
> ref.SliceToList.par_baseline 10000 thrpt 30 14793,373 ± 64,905 ops/s
> ref.SliceToList.par_limit 10000 thrpt 30 13301,024 ± 1300,431 ops/s
> ref.SliceToList.par_skipLimit 10000 thrpt 30 11131,698 ± 1769,932 ops/s
> ref.SliceToList.seq_baseline 10000 thrpt 30 24101,048 ± 263,528 ops/s
> ref.SliceToList.seq_limit 10000 thrpt 30 16872,168 ± 76,696 ops/s
> ref.SliceToList.seq_skipLimit 10000 thrpt 30 11953,253 ± 105,231 ops/s
> value.SliceToArray.par_baseline 10000 thrpt 30 25442,442 ± 455,554 ops/s
> value.SliceToArray.par_limit 10000 thrpt 30 23111,730 ± 2246,086 ops/s
> value.SliceToArray.par_skipLimit 10000 thrpt 30 17980,750 ± 2329,077 ops/s
> value.SliceToArray.seq_baseline 10000 thrpt 30 66512,898 ± 1001,042 ops/s
> value.SliceToArray.seq_limit 10000 thrpt 30 41792,549 ± 1085,547 ops/s
> value.SliceToArray.seq_skipLimit 10000 thrpt 30 18007,613 ± 141,716 ops/s
>
> I also modernized SliceOps a little bit, using switch expression (with no explicit default!) and diamonds on anonymous classes.
test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/CountTest.java line 195:
> 193: assertEquals(Stream.of(1, 2, 3, 4).peek(e -> ai.getAndIncrement())
> 194: .parallel().skip(1).limit(2).skip(1).count(), 1);
> 195: assertEquals(ai.get(), 0);
Does it make sense to extract the method here and launch it with different input streams?
test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/SliceOpTest.java line 370:
> 368: assertNotNull(prefix);
> 369: assertTrue(parSpliterator.hasCharacteristics(Spliterator.SIZED));
> 370: assertTrue(parSpliterator.hasCharacteristics(Spliterator.SUBSIZED));
It would be great to integrate the code comment to the assertion message. Then test failure would print the message making it easier to understand the failure.
It would be great to extract `assertHasCharacteristics` method so the failure printed the actual characteristics rather than "expected true got false"
test/micro/org/openjdk/bench/java/util/stream/ops/ref/SliceToList.java line 50:
> 48: @Benchmark
> 49: public List<String> seq_baseline() {
> 50: return IntStream.range(0, size)
Typically you want to move all the constants to state fields to avoid constant folding by the compiler.
The compiler might accidentally use the fact that range start is always 0 and produce a dedicated optimized code for it.
See https://shipilev.net/blog/2014/java-scala-divided-we-fail/
-------------
PR: https://git.openjdk.java.net/jdk/pull/3427
More information about the core-libs-dev
mailing list