Integrated: 8265029: Preserve SIZED characteristics on slice operations (skip, limit)

Fri May 28 02:00:56 UTC 2021

On Sat, 10 Apr 2021 06:30:33 GMT, Tagir F. Valeev <tvaleev at openjdk.org> wrote:

> With the introduction of `toList()`, preserving the SIZED characteristics in more cases becomes more important. This patch preserves SIZED on `skip()` and `limit()` operations, so now every combination of `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and `toList()`, `toArray()` and `count()` may benefit from this. E. g., `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result instantly with this patch.
> 
> Some microbenchmarks added that confirm the reduced memory allocation in `toList()` and `toArray()` cases. Before patch:
> 
> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  thrpt   10   40235,534 ±     0,984    B/op
> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  thrpt   10  106431,101 ±     0,198    B/op
> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  thrpt   10  106544,977 ±     1,983    B/op
> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  thrpt   10   40121,878 ±     0,247    B/op
> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  thrpt   10  106317,693 ±     1,083    B/op
> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  thrpt   10  106430,954 ±     0,136    B/op
> 
> 
> After patch:
> 
> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  thrpt   10  40235,648 ±     1,354    B/op
> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  thrpt   10  40355,784 ±     1,288    B/op
> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  thrpt   10  40476,032 ±     2,855    B/op
> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  thrpt   10  40121,830 ±     0,308    B/op
> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  thrpt   10  40242,554 ±     0,443    B/op
> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  thrpt   10  40363,674 ±     1,576    B/op
> 
> 
> Time improvements are less exciting. It's likely that inlining and vectorizing dominate in these tests over array allocations and unnecessary copying. Still, I notice a significant improvement in SliceToArray.seq_limit case (2x) and mild improvement (+12..16%) in other slice tests. No significant change in parallel execution time, though its performance is much less stable and I didn't run enough tests.
> 
> Before patch:
> 
> Benchmark                         (size)   Mode  Cnt      Score     Error  Units
> ref.SliceToList.par_baseline       10000  thrpt   30  14876,723 ±  99,770  ops/s
> ref.SliceToList.par_limit          10000  thrpt   30  14856,841 ± 215,089  ops/s
> ref.SliceToList.par_skipLimit      10000  thrpt   30   9555,818 ± 991,335  ops/s
> ref.SliceToList.seq_baseline       10000  thrpt   30  23732,290 ± 444,162  ops/s
> ref.SliceToList.seq_limit          10000  thrpt   30  14894,040 ± 176,496  ops/s
> ref.SliceToList.seq_skipLimit      10000  thrpt   30  10646,929 ±  36,469  ops/s
> value.SliceToArray.par_baseline    10000  thrpt   30  25093,141 ± 376,402  ops/s
> value.SliceToArray.par_limit       10000  thrpt   30  24798,889 ± 760,762  ops/s
> value.SliceToArray.par_skipLimit   10000  thrpt   30  16456,310 ± 926,882  ops/s
> value.SliceToArray.seq_baseline    10000  thrpt   30  69669,787 ± 494,562  ops/s
> value.SliceToArray.seq_limit       10000  thrpt   30  21097,081 ± 117,338  ops/s
> value.SliceToArray.seq_skipLimit   10000  thrpt   30  15522,871 ± 112,557  ops/s
> 
> 
> After patch:
> 
> Benchmark                         (size)   Mode  Cnt      Score      Error  Units
> ref.SliceToList.par_baseline       10000  thrpt   30  14793,373 ±   64,905  ops/s
> ref.SliceToList.par_limit          10000  thrpt   30  13301,024 ± 1300,431  ops/s
> ref.SliceToList.par_skipLimit      10000  thrpt   30  11131,698 ± 1769,932  ops/s
> ref.SliceToList.seq_baseline       10000  thrpt   30  24101,048 ±  263,528  ops/s
> ref.SliceToList.seq_limit          10000  thrpt   30  16872,168 ±   76,696  ops/s
> ref.SliceToList.seq_skipLimit      10000  thrpt   30  11953,253 ±  105,231  ops/s
> value.SliceToArray.par_baseline    10000  thrpt   30  25442,442 ±  455,554  ops/s
> value.SliceToArray.par_limit       10000  thrpt   30  23111,730 ± 2246,086  ops/s
> value.SliceToArray.par_skipLimit   10000  thrpt   30  17980,750 ± 2329,077  ops/s
> value.SliceToArray.seq_baseline    10000  thrpt   30  66512,898 ± 1001,042  ops/s
> value.SliceToArray.seq_limit       10000  thrpt   30  41792,549 ± 1085,547  ops/s
> value.SliceToArray.seq_skipLimit   10000  thrpt   30  18007,613 ±  141,716  ops/s
> 
> 
> I also modernized SliceOps a little bit, using switch expression (with no explicit default!) and diamonds on anonymous classes.

This pull request has now been integrated.

Changeset: 0c9daa7e
Author:    Tagir F. Valeev <tvaleev at openjdk.org>
URL:       https://git.openjdk.java.net/jdk/commit/0c9daa7ed579cd82343f37a68964876ebc48122e
Stats:     719 lines in 12 files changed: 623 ins; 8 del; 88 mod

8265029: Preserve SIZED characteristics on slice operations (skip, limit)

Reviewed-by: psandoz

-------------

PR: https://git.openjdk.java.net/jdk/pull/3427