[Lambda]parallel sort stream slow than series sort

Simon Roberts simon at dancingcloudservices.com
Fri Sep 25 17:56:35 UTC 2020


Tests like this are rarely meaningful. In particular you have a random
number generator in there. They are often built on single threaded code
protected by mutual exclusion. That of itself prevents the code ever going
faster than sequential code, and in fact usually makes it slower due to the
additional overhead of inter-thread communication.

Further, your example seems to be a test of a database and its driver, and
has no obvious relationship to either the Streams API or anything in the
core Java libraries. Perhaps I miss something.


On Fri, Sep 25, 2020 at 4:09 AM tonytao <tonytao0505 at outlook.com> wrote:

> hi,
>
> I wrote a test case to test stream performance,but the parallel sort
> always slow than the series sort.I test the data size in : 20,000 ,
> 5,000,000, 10,000,000 , 20,000,000 .
>
> attatched is the test case source code.
>
> jdk version :
>
> openjdk version "11.0.8" 2020-07-14
> OpenJDK Runtime Environment (build 11.0.8+10-post-Debian-1deb10u1)
> OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Debian-1deb10u1, mixed
> mode, sharing)
>
> jvm argument:
>
> -ea -Xms256m -Xmx8192m
>
> macheine:
>
> cpu:Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
>
> memory: 16GB
>
> Test  result shows as below:
>
> 20000:
>
> sorted execute time:9ms, resultset rows 20000, 2222222 rows/sec
> parallel sorted execute time:24ms, resultset rows 20000, 833333 rows/sec
>
> 5000000:
>
> sorted execute time:245ms, resultset rows 5000000, 20408163 rows/sec
> parallel sorted execute time:402ms, resultset rows 5000000, 12437810
> rows/sec
>
> 10000000:
>
> sorted execute time:577ms, resultset rows 10000000, 17331022 rows/sec
> parallel sorted execute time:1230ms, resultset rows 10000000, 8130081
> rows/sec
>
> 20000000:
>
> sorted execute time:1079ms, resultset rows 20000000, 18535681 rows/sec
> parallel sorted execute time:1790ms, resultset rows 20000000, 11173184
> rows/sec
>
>
> this is the test data sample:
>
> hdb=> select * from testdata limit 10;
>     id    |           uptime           | x  | y  | cmt
>
> ---------+----------------------------+----+----+----------------------------------
>   1340417 | 2023-02-22 07:30:34.391207 | 33 |  9 |
> 4bf16d4c4b638d84b56893de2451c407
>   1340418 | 2023-02-22 07:31:34.391207 | 10 | 91 |
> c9b78bfbd6b684e62605e96d2d8237a0
>   1340419 | 2023-02-22 07:32:34.391207 | 66 | 24 |
> 968e5d19ca3a2ddae5d2a366ba06cf16
>   1340420 | 2023-02-22 07:33:34.391207 |  4 | 42 |
> bdcf7d764121fc9b0039f80eadea1310
>   1340421 | 2023-02-22 07:34:34.391207 | 27 | 45 |
> 06520ac5e508f15f09672fa751d5c17a
>   1340422 | 2023-02-22 07:35:34.391207 | 36 | 11 |
> 5bede83b54dfe76f4a249308d8033691
>   1340423 | 2023-02-22 07:36:34.391207 | 41 | 92 |
> 37f4b34988c0e1387940177a8cc9d83a
>   1340424 | 2023-02-22 07:37:34.391207 | 29 | 59 |
> 416459b54ae00c95e118c93605a40d43
>   1340425 | 2023-02-22 07:38:34.391207 |  9 | 46 |
> 46339b8eeae99c7e922003ed87b9d417
>   1340426 | 2023-02-22 07:39:34.391207 | 21 | 29 |
> 7ede63cdb2a6a86c63534fe5fcfb2f97
> (10 rows)
>
>
> It was generated by sql:
>
> create  table  testdata(
>      idint,
>      uptimetimestamp,
>      xint,
>      yint,
>      cmttext
> );
> insert  into  testdata
>      select
>          id,
>          uptime,
>          round(random()*100),
>          round(random()*100),
>          md5(uptime::text)
>      from  (
>          select
>              generate_series id,
>              current_timestamp  +  make_interval(mins=>  generate_series)
> uptime
>          from  generate_series(1,100000000)
>          )  t;
>
>
> Could you please help me to find the problem?
>
> Thanks a lot.
>
>
>
>

-- 
Simon Roberts
(303) 249 3613


More information about the core-libs-dev mailing list