RFR 8243491: Implementation of Foreign-Memory Access API (Second Incubator)
Peter Levart
peter.levart at gmail.com
Thu Apr 30 00:07:56 UTC 2020
On 4/29/20 10:23 PM, Maurizio Cimadamore wrote:
> Many thanks Peter,
> my preference would be for adding the benchmark now, and come back to
> fix this post integration. The MemoryScope code is finicky and getting
> confidence that the code is race-free takes time and I'd prefer not to
> change that during the course of this RFR.
>
> Your approach seems promising, so let's keep working and thinking on
> it on the side (and perhaps let's maybe go for it in the panama repo
> first, to make sure we don't add regressions).
>
> Sounds like a plan?
I agree. There's still plenty of time until 15 ships...
Regards, Peter
>
> Cheers
> Maurizio
>
> On 29/04/2020 20:19, Peter Levart wrote:
>> Hi Maurizio,
>>
>> On 4/29/20 2:41 AM, Maurizio Cimadamore wrote:
>>> The current implementation has performances that are on par with the
>>> previous acquire-based implementation, and also on par with what can
>>> be achieved with Unsafe. We do have a micro benchmark in the patch
>>> (see ParallelSum (**)) which tests this, and I get _identical_
>>> numbers even if I _comment_ the body of acquire/release - so that no
>>> contention can happen; so, I'm a bit skeptical overall that
>>> contention on acquire/release is the main factor at play here - but
>>> perhaps we need more targeted benchmarks.
>>
>> So I modified your benchmark (just took out the relevant parts) and
>> added some benchmarks that exhibit Stream.findAny() and
>> Stream.findFirst(). As I anticipated, the results for parallel stream
>> variants were slowing the benchmark down, so I had to reduce the
>> number of elements by a factor of 16 to get results in reasonable time:
>>
>> http://cr.openjdk.java.net/~plevart/jdk-dev/8243491_MemoryScope/ParallelSum.java
>>
>>
>> I then swapped-in the alternative implementation of MemoryScope (note
>> that this is not a whole implementation - the dup() method is missing):
>>
>> http://cr.openjdk.java.net/~plevart/jdk-dev/8243491_MemoryScope/MemoryScope.java
>>
>>
>> ...and I got these results:
>>
>> i7 2600K (4 cores / 8 threads)
>>
>> with proposed MemoryScope:
>>
>> Benchmark Mode Cnt Score Error Units
>> ParallelSum.find_any_stream_parallel avgt 10 1332.687 ±
>> 733.535 ms/op
>> ParallelSum.find_any_stream_serial avgt 10 440.260 ± 3.110
>> ms/op
>> ParallelSum.find_first_loop_serial avgt 10 5.809 ± 0.044
>> ms/op
>> ParallelSum.find_first_stream_parallel avgt 10 2070.318 ± 41.072
>> ms/op
>> ParallelSum.find_first_stream_serial avgt 10 440.034 ± 4.672
>> ms/op
>> ParallelSum.sum_loop_serial avgt 10 5.647 ± 0.055
>> ms/op
>> ParallelSum.sum_stream_parallel avgt 10 5.314 ± 0.294
>> ms/op
>> ParallelSum.sum_stream_serial avgt 10 19.179 ± 0.136
>> ms/op
>>
>> with alternative MemoryScope:
>>
>> Benchmark Mode Cnt Score Error Units
>> ParallelSum.find_any_stream_parallel avgt 10 80.280 ± 13.183
>> ms/op
>> ParallelSum.find_any_stream_serial avgt 10 317.388 ± 2.787
>> ms/op
>> ParallelSum.find_first_loop_serial avgt 10 5.790 ± 0.038
>> ms/op
>> ParallelSum.find_first_stream_parallel avgt 10 117.925 ± 1.747
>> ms/op
>> ParallelSum.find_first_stream_serial avgt 10 315.076 ± 5.725
>> ms/op
>> ParallelSum.sum_loop_serial avgt 10 5.652 ± 0.042
>> ms/op
>> ParallelSum.sum_stream_parallel avgt 10 4.881 ± 0.053
>> ms/op
>> ParallelSum.sum_stream_serial avgt 10 19.143 ± 0.035
>> ms/op
>>
>>
>> So here it is. The proof that contention does occur.
>>
>> Regards, Peter
>>
More information about the core-libs-dev
mailing list