[Lambda]parallel sort stream slow than series sort

Mon Sep 28 14:46:11 UTC 2020

Hi Tony,

At this point, I would venture

1) This is still the wrong forum. I would think that stackoverflow is a far
more appropriate place for discussions about how to use an api. This one,
as best I understand it, is about *developing* those libraries.

2) Concurrency inevitably *increases* the total amount of CPU labor
required to perform a given piece of work. There's management and
coordination required to do this. If the amount of work done in the "job"
is small enough, you can find that the overhead swamps the work, and things
go badly.

3) Observation 2 is compounded by the fact that there's also
*communication* that has to happen between threads. Streams do a good job
of minimizing this, but the workload must be sharded across the parallel
threads, and then brought back together again. In some situations, this can
create so much serialization that the whole thing slows down. A really
vicious form of this can arise when performing vast amounts of memory
allocation in multiple threads; a TLAB that runs out of capacity has to get
more memory from main heap, and that's (inevitably I think) a critical
section.

4) This is a "micro-benchmark", and they're essentially never meaningful in
Java. At the very least, if you were to observe the JIT compilations (use
-XX:+PrintCompilation) you're almost certain to see that it's still
optimizing the code by the time the sub-second exercise is over. There are
many routes to more valid benchmarks, starting with simply ensuring the
process runs for "long enough" to amortize those costs.

5) If you want to see this thing shine, create a Stream that calculates
points on a Mandelbrot set, then parallelize that. You'll see that there's
nothing wrong with the implementation.

Other than that, I really think you should ask this in stackoverflow, not
here.

On Sun, Sep 27, 2020 at 2:24 AM tonytao <tonytao0505 at outlook.com> wrote:

> Hi Roberts,
>
> Thank for your warmly help.
> I debugged into stream  parallel it seems parallel stream use a
> ArrayListSpliterator split  list into pieces in jdk11 and 14,not a  mutual
> exclusion.Test arrays was retrieved  from postgresql,but all data was ready
> before running test .I rewriteed the data-generate code in java.
>
> hi Vladimir,
>
> I tested in jdk14:
> openjdk version "14.0.2" 2020-07-14
> OpenJDK Runtime Environment (build 14.0.2+12-46)
> OpenJDK 64-Bit Server VM (build 14.0.2+12-46, mixed mode, sharing)
>
> parallel sort still slower than series sort.
>
> sorted execute time:274ms, resultset rows 5000000, 18248175 rows/sec
> parallel sorted execute time:627ms, resultset rows 5000000, 7974481
> rows/sec
>
> hi Roberts & Peter,
>
> Here is the test code:
>
> int rowCount = 5000000;LocalDateTime d = LocalDateTime.now();List<Object[]> data = new ArrayList<>();List<Object[]> data2 = new ArrayList<>();long startTime = System.currentTimeMillis();for (int i = 0; i < rowCount; i++) {
>     Object[] row = new Object[3];
>     row[0] = i;
>     d = d.plusMinutes(1);
>     row[1] = java.sql.Date.valueOf(d.toLocalDate());
>     row[2] = DigestUtils.md5Hex(row[1].toString());
>     data.add(row);
>     data2.add(row);}long endTime = System.currentTimeMillis();System.out.println("read data execute time:" + (endTime - startTime) + "ms, resultset rows " + rowCount + ", "
>         + rowCount * 1000 / (endTime - startTime) + " rows/sec");
> for (int i = 0; i < 1; i++) {
>     for (Object x : data.get(i)) {
>         System.out.println("data type: " + x.getClass() + ", value: " + x);
>     }}
> Comparator<Object[]> comparator = new Comparator<Object[]>() {
>     @Override
>     public int compare(Object[] o1, Object[] o2) {
>         int res = 0;
>         for (int i = 0; i < o1.length; i++) {
>             res = ((Comparable) o1[i]).compareTo(o2[i]);
>             if (res != 0) {
>                 break;
>             }
>         }
>         return res;
>     }};
> List<Object[]> list;List<Object[]> l = Collections.unmodifiableList(data);
>
>
> startTime = System.currentTimeMillis();
> list = data.stream().sorted(comparator).collect(Collectors.toList());
> endTime = System.currentTimeMillis();System.out.println("sorted execute time:" + (endTime - startTime) + "ms, resultset rows " + list.size() + ", "
>         + (long) list.size() * 1000 / (endTime - startTime) + " rows/sec");
>
> startTime = System.currentTimeMillis();
> list = data2.stream().parallel().sorted(comparator).collect(Collectors.toList());
> endTime = System.currentTimeMillis();System.out.println("parallel sorted execute time:" + (endTime - startTime) + "ms, resultset rows " + list.size()
>         + ", " + (long) list.size() * 1000 / (endTime - startTime) + " rows/sec");
>
>
> You could run it in you ide,it depened on:
>         <dependency>
>             <groupId>org.apache.commons</groupId>
>             <artifactId>commons-lang3</artifactId>
>             <version>3.11</version>
>         </dependency>
>
>
> Thanks again!
>
> Tao Jin.
>
>
> On 9/26/20 1:56 AM, Simon Roberts wrote:
>
> Tests like this are rarely meaningful. In particular you have a random
> number generator in there. They are often built on single threaded code
> protected by mutual exclusion. That of itself prevents the code ever going
> faster than sequential code, and in fact usually makes it slower due to the
> additional overhead of inter-thread communication.
>
> Further, your example seems to be a test of a database and its driver, and
> has no obvious relationship to either the Streams API or anything in the
> core Java libraries. Perhaps I miss something.
>
>
> On Fri, Sep 25, 2020 at 4:09 AM tonytao <tonytao0505 at outlook.com> wrote:
>
>> hi,
>>
>> I wrote a test case to test stream performance,but the parallel sort
>> always slow than the series sort.I test the data size in : 20,000 ,
>> 5,000,000, 10,000,000 , 20,000,000 .
>>
>> attatched is the test case source code.
>>
>> jdk version :
>>
>> openjdk version "11.0.8" 2020-07-14
>> OpenJDK Runtime Environment (build 11.0.8+10-post-Debian-1deb10u1)
>> OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Debian-1deb10u1, mixed
>> mode, sharing)
>>
>> jvm argument:
>>
>> -ea -Xms256m -Xmx8192m
>>
>> macheine:
>>
>> cpu:Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
>>
>> memory: 16GB
>>
>> Test  result shows as below:
>>
>> 20000:
>>
>> sorted execute time:9ms, resultset rows 20000, 2222222 rows/sec
>> parallel sorted execute time:24ms, resultset rows 20000, 833333 rows/sec
>>
>> 5000000:
>>
>> sorted execute time:245ms, resultset rows 5000000, 20408163 rows/sec
>> parallel sorted execute time:402ms, resultset rows 5000000, 12437810
>> rows/sec
>>
>> 10000000:
>>
>> sorted execute time:577ms, resultset rows 10000000, 17331022 rows/sec
>> parallel sorted execute time:1230ms, resultset rows 10000000, 8130081
>> rows/sec
>>
>> 20000000:
>>
>> sorted execute time:1079ms, resultset rows 20000000, 18535681 rows/sec
>> parallel sorted execute time:1790ms, resultset rows 20000000, 11173184
>> rows/sec
>>
>>
>> this is the test data sample:
>>
>> hdb=> select * from testdata limit 10;
>>     id    |           uptime           | x  | y  | cmt
>>
>> ---------+----------------------------+----+----+----------------------------------
>>   1340417 | 2023-02-22 07:30:34.391207 | 33 |  9 |
>> 4bf16d4c4b638d84b56893de2451c407
>>   1340418 | 2023-02-22 07:31:34.391207 | 10 | 91 |
>> c9b78bfbd6b684e62605e96d2d8237a0
>>   1340419 | 2023-02-22 07:32:34.391207 | 66 | 24 |
>> 968e5d19ca3a2ddae5d2a366ba06cf16
>>   1340420 | 2023-02-22 07:33:34.391207 |  4 | 42 |
>> bdcf7d764121fc9b0039f80eadea1310
>>   1340421 | 2023-02-22 07:34:34.391207 | 27 | 45 |
>> 06520ac5e508f15f09672fa751d5c17a
>>   1340422 | 2023-02-22 07:35:34.391207 | 36 | 11 |
>> 5bede83b54dfe76f4a249308d8033691
>>   1340423 | 2023-02-22 07:36:34.391207 | 41 | 92 |
>> 37f4b34988c0e1387940177a8cc9d83a
>>   1340424 | 2023-02-22 07:37:34.391207 | 29 | 59 |
>> 416459b54ae00c95e118c93605a40d43
>>   1340425 | 2023-02-22 07:38:34.391207 |  9 | 46 |
>> 46339b8eeae99c7e922003ed87b9d417
>>   1340426 | 2023-02-22 07:39:34.391207 | 21 | 29 |
>> 7ede63cdb2a6a86c63534fe5fcfb2f97
>> (10 rows)
>>
>>
>> It was generated by sql:
>>
>> create  table  testdata(
>>      idint,
>>      uptimetimestamp,
>>      xint,
>>      yint,
>>      cmttext
>> );
>> insert  into  testdata
>>      select
>>          id,
>>          uptime,
>>          round(random()*100),
>>          round(random()*100),
>>          md5(uptime::text)
>>      from  (
>>          select
>>              generate_series id,
>>              current_timestamp  +  make_interval(mins=>
>> generate_series)  uptime
>>          from  generate_series(1,100000000)
>>          )  t;
>>
>>
>> Could you please help me to find the problem?
>>
>> Thanks a lot.
>>
>>
>>
>>
>
> --
> Simon Roberts
> (303) 249 3613
>
>
>

-- 
Simon Roberts
(303) 249 3613