RFR 8243491: Implementation of Foreign-Memory Access API (Second Incubator)
Peter Levart
peter.levart at gmail.com
Wed Apr 29 19:19:40 UTC 2020
Hi Maurizio,
On 4/29/20 2:41 AM, Maurizio Cimadamore wrote:
> The current implementation has performances that are on par with the
> previous acquire-based implementation, and also on par with what can
> be achieved with Unsafe. We do have a micro benchmark in the patch
> (see ParallelSum (**)) which tests this, and I get _identical_ numbers
> even if I _comment_ the body of acquire/release - so that no
> contention can happen; so, I'm a bit skeptical overall that contention
> on acquire/release is the main factor at play here - but perhaps we
> need more targeted benchmarks.
So I modified your benchmark (just took out the relevant parts) and
added some benchmarks that exhibit Stream.findAny() and
Stream.findFirst(). As I anticipated, the results for parallel stream
variants were slowing the benchmark down, so I had to reduce the number
of elements by a factor of 16 to get results in reasonable time:
http://cr.openjdk.java.net/~plevart/jdk-dev/8243491_MemoryScope/ParallelSum.java
I then swapped-in the alternative implementation of MemoryScope (note
that this is not a whole implementation - the dup() method is missing):
http://cr.openjdk.java.net/~plevart/jdk-dev/8243491_MemoryScope/MemoryScope.java
...and I got these results:
i7 2600K (4 cores / 8 threads)
with proposed MemoryScope:
Benchmark Mode Cnt Score Error Units
ParallelSum.find_any_stream_parallel avgt 10 1332.687 ± 733.535 ms/op
ParallelSum.find_any_stream_serial avgt 10 440.260 ± 3.110 ms/op
ParallelSum.find_first_loop_serial avgt 10 5.809 ± 0.044 ms/op
ParallelSum.find_first_stream_parallel avgt 10 2070.318 ± 41.072 ms/op
ParallelSum.find_first_stream_serial avgt 10 440.034 ± 4.672 ms/op
ParallelSum.sum_loop_serial avgt 10 5.647 ± 0.055 ms/op
ParallelSum.sum_stream_parallel avgt 10 5.314 ± 0.294 ms/op
ParallelSum.sum_stream_serial avgt 10 19.179 ± 0.136 ms/op
with alternative MemoryScope:
Benchmark Mode Cnt Score Error Units
ParallelSum.find_any_stream_parallel avgt 10 80.280 ± 13.183 ms/op
ParallelSum.find_any_stream_serial avgt 10 317.388 ± 2.787 ms/op
ParallelSum.find_first_loop_serial avgt 10 5.790 ± 0.038 ms/op
ParallelSum.find_first_stream_parallel avgt 10 117.925 ± 1.747 ms/op
ParallelSum.find_first_stream_serial avgt 10 315.076 ± 5.725 ms/op
ParallelSum.sum_loop_serial avgt 10 5.652 ± 0.042 ms/op
ParallelSum.sum_stream_parallel avgt 10 4.881 ± 0.053 ms/op
ParallelSum.sum_stream_serial avgt 10 19.143 ± 0.035 ms/op
So here it is. The proof that contention does occur.
Regards, Peter
More information about the core-libs-dev
mailing list