[Lambda]parallel sort stream slow than series sort

tonytao tonytao0505 at outlook.com
Sun Sep 27 08:24:28 UTC 2020


Hi Roberts,

Thank for your warmly help.
I debugged into stream  parallel it seems parallel stream use a 
ArrayListSpliterator split  list into pieces in jdk11 and 14,not a  
mutual exclusion.Test arrays was retrieved  from postgresql,but all data 
was ready before running test .I rewriteed the data-generate code in java.

hi Vladimir,

I tested in jdk14:
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment (build 14.0.2+12-46)
OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)

parallel sort still slower than series sort.

sorted execute time:274ms, resultset rows 5000000, 18248175 rows/sec
parallel sorted execute time:627ms, resultset rows 5000000, 7974481 rows/sec

hi Roberts & Peter,

Here is the test code:

int  rowCount=  5000000;
LocalDateTime  d=  LocalDateTime.now();
List<Object[]>  data=  new  ArrayList<>();
List<Object[]>  data2=  new  ArrayList<>();
long  startTime=  System.currentTimeMillis();
for  (int  i=  0;  i<  rowCount;  i++)  {
     Object[]  row=  new  Object[3];
     row[0]  =  i;
     d=  d.plusMinutes(1);
     row[1]  =  java.sql.Date.valueOf(d.toLocalDate());
     row[2]  =  DigestUtils.md5Hex(row[1].toString());
     data.add(row);
     data2.add(row);
}
long  endTime=  System.currentTimeMillis();
System.out.println("read data execute time:"  +  (endTime-  startTime)  +  "ms, resultset rows "  +  rowCount+  ", "
         +  rowCount*  1000  /  (endTime-  startTime)  +  " rows/sec");

for  (int  i=  0;  i<  1;  i++)  {
     for  (Object  x:  data.get(i))  {
         System.out.println("data type: "  +  x.getClass()  +  ", value: "  +  x);
     }
}

Comparator<Object[]>  comparator=  new  Comparator<Object[]>()  {
     @Override
     public  int  compare(Object[]  o1,  Object[]  o2)  {
         int  res=  0;
         for  (int  i=  0;  i<  o1.length;  i++)  {
             res=  ((Comparable)  o1[i]).compareTo(o2[i]);
             if  (res!=  0)  {
                 break;
             }
         }
         return  res;
     }
};

List<Object[]>  list;
List<Object[]>  l=  Collections.unmodifiableList(data);


startTime=  System.currentTimeMillis();
list=  data.stream().sorted(comparator).collect(Collectors.toList());
endTime=  System.currentTimeMillis();
System.out.println("sorted execute time:"  +  (endTime-  startTime)  +  "ms, resultset rows "  +  list.size()  +  ", "
         +  (long)  list.size()  *  1000  /  (endTime-  startTime)  +  " rows/sec");

startTime=  System.currentTimeMillis();
list=  data2.stream().parallel().sorted(comparator).collect(Collectors.toList());
endTime=  System.currentTimeMillis();
System.out.println("parallel sorted execute time:"  +  (endTime-  startTime)  +  "ms, resultset rows "  +  list.size()
         +  ", "  +  (long)  list.size()  *  1000  /  (endTime-  startTime)  +  " rows/sec");


You could run it in you ide,it depened on:
         <dependency>
             <groupId>org.apache.commons</groupId>
             <artifactId>commons-lang3</artifactId>
             <version>3.11</version>
         </dependency>


Thanks again!

Tao Jin.


On 9/26/20 1:56 AM, Simon Roberts wrote:
> Tests like this are rarely meaningful. In particular you have a random 
> number generator in there. They are often built on single threaded 
> code protected by mutual exclusion. That of itself prevents the code 
> ever going faster than sequential code, and in fact usually makes it 
> slower due to the additional overhead of inter-thread communication.
>
> Further, your example seems to be a test of a database and its driver, 
> and has no obvious relationship to either the Streams API or anything 
> in the core Java libraries. Perhaps I miss something.
>
>
> On Fri, Sep 25, 2020 at 4:09 AM tonytao <tonytao0505 at outlook.com 
> <mailto:tonytao0505 at outlook.com>> wrote:
>
>     hi,
>
>     I wrote a test case to test stream performance,but the parallel sort
>     always slow than the series sort.I test the data size in : 20,000 ,
>     5,000,000, 10,000,000 , 20,000,000 .
>
>     attatched is the test case source code.
>
>     jdk version :
>
>     openjdk version "11.0.8" 2020-07-14
>     OpenJDK Runtime Environment (build 11.0.8+10-post-Debian-1deb10u1)
>     OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Debian-1deb10u1, mixed
>     mode, sharing)
>
>     jvm argument:
>
>     -ea -Xms256m -Xmx8192m
>
>     macheine:
>
>     cpu:Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
>
>     memory: 16GB
>
>     Test  result shows as below:
>
>     20000:
>
>     sorted execute time:9ms, resultset rows 20000, 2222222 rows/sec
>     parallel sorted execute time:24ms, resultset rows 20000, 833333
>     rows/sec
>
>     5000000:
>
>     sorted execute time:245ms, resultset rows 5000000, 20408163 rows/sec
>     parallel sorted execute time:402ms, resultset rows 5000000, 12437810
>     rows/sec
>
>     10000000:
>
>     sorted execute time:577ms, resultset rows 10000000, 17331022 rows/sec
>     parallel sorted execute time:1230ms, resultset rows 10000000, 8130081
>     rows/sec
>
>     20000000:
>
>     sorted execute time:1079ms, resultset rows 20000000, 18535681 rows/sec
>     parallel sorted execute time:1790ms, resultset rows 20000000,
>     11173184
>     rows/sec
>
>
>     this is the test data sample:
>
>     hdb=> select * from testdata limit 10;
>         id    |           uptime           | x  | y  | cmt
>     ---------+----------------------------+----+----+----------------------------------
>       1340417 | 2023-02-22 07:30:34.391207 | 33 |  9 |
>     4bf16d4c4b638d84b56893de2451c407
>       1340418 | 2023-02-22 07:31:34.391207 | 10 | 91 |
>     c9b78bfbd6b684e62605e96d2d8237a0
>       1340419 | 2023-02-22 07:32:34.391207 | 66 | 24 |
>     968e5d19ca3a2ddae5d2a366ba06cf16
>       1340420 | 2023-02-22 07:33:34.391207 |  4 | 42 |
>     bdcf7d764121fc9b0039f80eadea1310
>       1340421 | 2023-02-22 07:34:34.391207 | 27 | 45 |
>     06520ac5e508f15f09672fa751d5c17a
>       1340422 | 2023-02-22 07:35:34.391207 | 36 | 11 |
>     5bede83b54dfe76f4a249308d8033691
>       1340423 | 2023-02-22 07:36:34.391207 | 41 | 92 |
>     37f4b34988c0e1387940177a8cc9d83a
>       1340424 | 2023-02-22 07:37:34.391207 | 29 | 59 |
>     416459b54ae00c95e118c93605a40d43
>       1340425 | 2023-02-22 07:38:34.391207 |  9 | 46 |
>     46339b8eeae99c7e922003ed87b9d417
>       1340426 | 2023-02-22 07:39:34.391207 | 21 | 29 |
>     7ede63cdb2a6a86c63534fe5fcfb2f97
>     (10 rows)
>
>
>     It was generated by sql:
>
>     create  table  testdata(
>          idint,
>          uptimetimestamp,
>          xint,
>          yint,
>          cmttext
>     );
>     insert  into  testdata
>          select
>              id,
>              uptime,
>              round(random()*100),
>              round(random()*100),
>              md5(uptime::text)
>          from  (
>              select
>                  generate_series id,
>                  current_timestamp  +  make_interval(mins=>
>     generate_series)  uptime
>              from  generate_series(1,100000000)
>              )  t;
>
>
>     Could you please help me to find the problem?
>
>     Thanks a lot.
>
>
>
>
>
> -- 
> Simon Roberts
> (303) 249 3613
>



More information about the core-libs-dev mailing list