RFR: 8265029: Preserve SIZED characteristics on slice operations (skip, limit) [v3]

Sat Apr 17 01:55:26 UTC 2021

> With the introduction of `toList()`, preserving the SIZED characteristics in more cases becomes more important. This patch preserves SIZED on `skip()` and `limit()` operations, so now every combination of `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and `toList()`, `toArray()` and `count()` may benefit from this. E. g., `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result instantly with this patch.
> 
> Some microbenchmarks added that confirm the reduced memory allocation in `toList()` and `toArray()` cases. Before patch:
> 
> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  thrpt   10   40235,534 ±     0,984    B/op
> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  thrpt   10  106431,101 ±     0,198    B/op
> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  thrpt   10  106544,977 ±     1,983    B/op
> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  thrpt   10   40121,878 ±     0,247    B/op
> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  thrpt   10  106317,693 ±     1,083    B/op
> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  thrpt   10  106430,954 ±     0,136    B/op
> 
> 
> After patch:
> 
> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  thrpt   10  40235,648 ±     1,354    B/op
> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  thrpt   10  40355,784 ±     1,288    B/op
> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  thrpt   10  40476,032 ±     2,855    B/op
> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  thrpt   10  40121,830 ±     0,308    B/op
> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  thrpt   10  40242,554 ±     0,443    B/op
> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  thrpt   10  40363,674 ±     1,576    B/op
> 
> 
> Time improvements are less exciting. It's likely that inlining and vectorizing dominate in these tests over array allocations and unnecessary copying. Still, I notice a significant improvement in SliceToArray.seq_limit case (2x) and mild improvement (+12..16%) in other slice tests. No significant change in parallel execution time, though its performance is much less stable and I didn't run enough tests.
> 
> Before patch:
> 
> Benchmark                         (size)   Mode  Cnt      Score     Error  Units
> ref.SliceToList.par_baseline       10000  thrpt   30  14876,723 ±  99,770  ops/s
> ref.SliceToList.par_limit          10000  thrpt   30  14856,841 ± 215,089  ops/s
> ref.SliceToList.par_skipLimit      10000  thrpt   30   9555,818 ± 991,335  ops/s
> ref.SliceToList.seq_baseline       10000  thrpt   30  23732,290 ± 444,162  ops/s
> ref.SliceToList.seq_limit          10000  thrpt   30  14894,040 ± 176,496  ops/s
> ref.SliceToList.seq_skipLimit      10000  thrpt   30  10646,929 ±  36,469  ops/s
> value.SliceToArray.par_baseline    10000  thrpt   30  25093,141 ± 376,402  ops/s
> value.SliceToArray.par_limit       10000  thrpt   30  24798,889 ± 760,762  ops/s
> value.SliceToArray.par_skipLimit   10000  thrpt   30  16456,310 ± 926,882  ops/s
> value.SliceToArray.seq_baseline    10000  thrpt   30  69669,787 ± 494,562  ops/s
> value.SliceToArray.seq_limit       10000  thrpt   30  21097,081 ± 117,338  ops/s
> value.SliceToArray.seq_skipLimit   10000  thrpt   30  15522,871 ± 112,557  ops/s
> 
> 
> After patch:
> 
> Benchmark                         (size)   Mode  Cnt      Score      Error  Units
> ref.SliceToList.par_baseline       10000  thrpt   30  14793,373 ±   64,905  ops/s
> ref.SliceToList.par_limit          10000  thrpt   30  13301,024 ± 1300,431  ops/s
> ref.SliceToList.par_skipLimit      10000  thrpt   30  11131,698 ± 1769,932  ops/s
> ref.SliceToList.seq_baseline       10000  thrpt   30  24101,048 ±  263,528  ops/s
> ref.SliceToList.seq_limit          10000  thrpt   30  16872,168 ±   76,696  ops/s
> ref.SliceToList.seq_skipLimit      10000  thrpt   30  11953,253 ±  105,231  ops/s
> value.SliceToArray.par_baseline    10000  thrpt   30  25442,442 ±  455,554  ops/s
> value.SliceToArray.par_limit       10000  thrpt   30  23111,730 ± 2246,086  ops/s
> value.SliceToArray.par_skipLimit   10000  thrpt   30  17980,750 ± 2329,077  ops/s
> value.SliceToArray.seq_baseline    10000  thrpt   30  66512,898 ± 1001,042  ops/s
> value.SliceToArray.seq_limit       10000  thrpt   30  41792,549 ± 1085,547  ops/s
> value.SliceToArray.seq_skipLimit   10000  thrpt   30  18007,613 ±  141,716  ops/s
> 
> 
> I also modernized SliceOps a little bit, using switch expression (with no explicit default!) and diamonds on anonymous classes.

Tagir F. Valeev has updated the pull request incrementally with four additional commits since the last revision:

 - More benchmarks
 - adjustSize -> exactOutputSize

   Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com>
 - adjustedLimit -> normalizedLimit

   Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com>
 - adjustSize -> exactOutputSize

   Co-authored-by: Paul Sandoz <paul.d.sandoz at googlemail.com>

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3427/files
  - new: https://git.openjdk.java.net/jdk/pull/3427/files/25755e14..a63e25d0

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3427&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3427&range=01-02

  Stats: 319 lines in 4 files changed: 308 ins; 1 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3427.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3427/head:pull/3427

PR: https://git.openjdk.java.net/jdk/pull/3427