[Lambda]parallel sort stream slow than series sort
tonytao
tonytao0505 at outlook.com
Sun Sep 27 08:24:28 UTC 2020
Hi Roberts,
Thank for your warmly help.
I debugged into stream parallel it seems parallel stream use a
ArrayListSpliterator split list into pieces in jdk11 and 14,not a
mutual exclusion.Test arrays was retrieved from postgresql,but all data
was ready before running test .I rewriteed the data-generate code in java.
hi Vladimir,
I tested in jdk14:
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment (build 14.0.2+12-46)
OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)
parallel sort still slower than series sort.
sorted execute time:274ms, resultset rows 5000000, 18248175 rows/sec
parallel sorted execute time:627ms, resultset rows 5000000, 7974481 rows/sec
hi Roberts & Peter,
Here is the test code:
int rowCount= 5000000;
LocalDateTime d= LocalDateTime.now();
List<Object[]> data= new ArrayList<>();
List<Object[]> data2= new ArrayList<>();
long startTime= System.currentTimeMillis();
for (int i= 0; i< rowCount; i++) {
Object[] row= new Object[3];
row[0] = i;
d= d.plusMinutes(1);
row[1] = java.sql.Date.valueOf(d.toLocalDate());
row[2] = DigestUtils.md5Hex(row[1].toString());
data.add(row);
data2.add(row);
}
long endTime= System.currentTimeMillis();
System.out.println("read data execute time:" + (endTime- startTime) + "ms, resultset rows " + rowCount+ ", "
+ rowCount* 1000 / (endTime- startTime) + " rows/sec");
for (int i= 0; i< 1; i++) {
for (Object x: data.get(i)) {
System.out.println("data type: " + x.getClass() + ", value: " + x);
}
}
Comparator<Object[]> comparator= new Comparator<Object[]>() {
@Override
public int compare(Object[] o1, Object[] o2) {
int res= 0;
for (int i= 0; i< o1.length; i++) {
res= ((Comparable) o1[i]).compareTo(o2[i]);
if (res!= 0) {
break;
}
}
return res;
}
};
List<Object[]> list;
List<Object[]> l= Collections.unmodifiableList(data);
startTime= System.currentTimeMillis();
list= data.stream().sorted(comparator).collect(Collectors.toList());
endTime= System.currentTimeMillis();
System.out.println("sorted execute time:" + (endTime- startTime) + "ms, resultset rows " + list.size() + ", "
+ (long) list.size() * 1000 / (endTime- startTime) + " rows/sec");
startTime= System.currentTimeMillis();
list= data2.stream().parallel().sorted(comparator).collect(Collectors.toList());
endTime= System.currentTimeMillis();
System.out.println("parallel sorted execute time:" + (endTime- startTime) + "ms, resultset rows " + list.size()
+ ", " + (long) list.size() * 1000 / (endTime- startTime) + " rows/sec");
You could run it in you ide,it depened on:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.11</version>
</dependency>
Thanks again!
Tao Jin.
On 9/26/20 1:56 AM, Simon Roberts wrote:
> Tests like this are rarely meaningful. In particular you have a random
> number generator in there. They are often built on single threaded
> code protected by mutual exclusion. That of itself prevents the code
> ever going faster than sequential code, and in fact usually makes it
> slower due to the additional overhead of inter-thread communication.
>
> Further, your example seems to be a test of a database and its driver,
> and has no obvious relationship to either the Streams API or anything
> in the core Java libraries. Perhaps I miss something.
>
>
> On Fri, Sep 25, 2020 at 4:09 AM tonytao <tonytao0505 at outlook.com
> <mailto:tonytao0505 at outlook.com>> wrote:
>
> hi,
>
> I wrote a test case to test stream performance,but the parallel sort
> always slow than the series sort.I test the data size in : 20,000 ,
> 5,000,000, 10,000,000 , 20,000,000 .
>
> attatched is the test case source code.
>
> jdk version :
>
> openjdk version "11.0.8" 2020-07-14
> OpenJDK Runtime Environment (build 11.0.8+10-post-Debian-1deb10u1)
> OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Debian-1deb10u1, mixed
> mode, sharing)
>
> jvm argument:
>
> -ea -Xms256m -Xmx8192m
>
> macheine:
>
> cpu:Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
>
> memory: 16GB
>
> Test result shows as below:
>
> 20000:
>
> sorted execute time:9ms, resultset rows 20000, 2222222 rows/sec
> parallel sorted execute time:24ms, resultset rows 20000, 833333
> rows/sec
>
> 5000000:
>
> sorted execute time:245ms, resultset rows 5000000, 20408163 rows/sec
> parallel sorted execute time:402ms, resultset rows 5000000, 12437810
> rows/sec
>
> 10000000:
>
> sorted execute time:577ms, resultset rows 10000000, 17331022 rows/sec
> parallel sorted execute time:1230ms, resultset rows 10000000, 8130081
> rows/sec
>
> 20000000:
>
> sorted execute time:1079ms, resultset rows 20000000, 18535681 rows/sec
> parallel sorted execute time:1790ms, resultset rows 20000000,
> 11173184
> rows/sec
>
>
> this is the test data sample:
>
> hdb=> select * from testdata limit 10;
> id | uptime | x | y | cmt
> ---------+----------------------------+----+----+----------------------------------
> 1340417 | 2023-02-22 07:30:34.391207 | 33 | 9 |
> 4bf16d4c4b638d84b56893de2451c407
> 1340418 | 2023-02-22 07:31:34.391207 | 10 | 91 |
> c9b78bfbd6b684e62605e96d2d8237a0
> 1340419 | 2023-02-22 07:32:34.391207 | 66 | 24 |
> 968e5d19ca3a2ddae5d2a366ba06cf16
> 1340420 | 2023-02-22 07:33:34.391207 | 4 | 42 |
> bdcf7d764121fc9b0039f80eadea1310
> 1340421 | 2023-02-22 07:34:34.391207 | 27 | 45 |
> 06520ac5e508f15f09672fa751d5c17a
> 1340422 | 2023-02-22 07:35:34.391207 | 36 | 11 |
> 5bede83b54dfe76f4a249308d8033691
> 1340423 | 2023-02-22 07:36:34.391207 | 41 | 92 |
> 37f4b34988c0e1387940177a8cc9d83a
> 1340424 | 2023-02-22 07:37:34.391207 | 29 | 59 |
> 416459b54ae00c95e118c93605a40d43
> 1340425 | 2023-02-22 07:38:34.391207 | 9 | 46 |
> 46339b8eeae99c7e922003ed87b9d417
> 1340426 | 2023-02-22 07:39:34.391207 | 21 | 29 |
> 7ede63cdb2a6a86c63534fe5fcfb2f97
> (10 rows)
>
>
> It was generated by sql:
>
> create table testdata(
> idint,
> uptimetimestamp,
> xint,
> yint,
> cmttext
> );
> insert into testdata
> select
> id,
> uptime,
> round(random()*100),
> round(random()*100),
> md5(uptime::text)
> from (
> select
> generate_series id,
> current_timestamp + make_interval(mins=>
> generate_series) uptime
> from generate_series(1,100000000)
> ) t;
>
>
> Could you please help me to find the problem?
>
> Thanks a lot.
>
>
>
>
>
> --
> Simon Roberts
> (303) 249 3613
>
More information about the core-libs-dev
mailing list