RFR 8243491: Implementation of Foreign-Memory Access API (Second Incubator)

Thu Apr 30 00:07:56 UTC 2020

On 4/29/20 10:23 PM, Maurizio Cimadamore wrote:
> Many thanks Peter,
> my preference would be for adding the benchmark now, and come back to 
> fix this post integration. The MemoryScope code is finicky and getting 
> confidence that the code is race-free takes time and I'd prefer not to 
> change that during the course of this RFR.
>
> Your approach seems promising, so let's keep working and thinking on 
> it on the side (and perhaps let's maybe go for it in the panama repo 
> first, to make sure we don't add regressions).
>
> Sounds like a plan?

I agree. There's still plenty of time until 15 ships...

Regards, Peter

>
> Cheers
> Maurizio
>
> On 29/04/2020 20:19, Peter Levart wrote:
>> Hi Maurizio,
>>
>> On 4/29/20 2:41 AM, Maurizio Cimadamore wrote:
>>> The current implementation has performances that are on par with the 
>>> previous acquire-based implementation, and also on par with what can 
>>> be achieved with Unsafe. We do have a micro benchmark in the patch 
>>> (see ParallelSum (**)) which tests this, and I get _identical_ 
>>> numbers even if I _comment_ the body of acquire/release - so that no 
>>> contention can happen; so, I'm a bit skeptical overall that 
>>> contention on acquire/release is the main factor at play here - but 
>>> perhaps we need more targeted benchmarks. 
>>
>> So I modified your benchmark (just took out the relevant parts) and 
>> added some benchmarks that exhibit Stream.findAny() and 
>> Stream.findFirst(). As I anticipated, the results for parallel stream 
>> variants were slowing the benchmark down, so I had to reduce the 
>> number of elements by a factor of 16 to get results in reasonable time:
>>
>> http://cr.openjdk.java.net/~plevart/jdk-dev/8243491_MemoryScope/ParallelSum.java 
>>
>>
>> I then swapped-in the alternative implementation of MemoryScope (note 
>> that this is not a whole implementation - the dup() method is missing):
>>
>> http://cr.openjdk.java.net/~plevart/jdk-dev/8243491_MemoryScope/MemoryScope.java 
>>
>>
>> ...and I got these results:
>>
>> i7 2600K (4 cores / 8 threads)
>>
>> with proposed MemoryScope:
>>
>> Benchmark                               Mode  Cnt Score Error Units
>> ParallelSum.find_any_stream_parallel    avgt   10  1332.687 ± 
>> 733.535  ms/op
>> ParallelSum.find_any_stream_serial      avgt   10   440.260 ± 3.110  
>> ms/op
>> ParallelSum.find_first_loop_serial      avgt   10     5.809 ± 0.044  
>> ms/op
>> ParallelSum.find_first_stream_parallel  avgt   10  2070.318 ± 41.072  
>> ms/op
>> ParallelSum.find_first_stream_serial    avgt   10   440.034 ± 4.672  
>> ms/op
>> ParallelSum.sum_loop_serial             avgt   10     5.647 ± 0.055  
>> ms/op
>> ParallelSum.sum_stream_parallel         avgt   10     5.314 ± 0.294  
>> ms/op
>> ParallelSum.sum_stream_serial           avgt   10    19.179 ± 0.136  
>> ms/op
>>
>> with alternative MemoryScope:
>>
>> Benchmark                               Mode  Cnt Score    Error Units
>> ParallelSum.find_any_stream_parallel    avgt   10   80.280 ± 13.183  
>> ms/op
>> ParallelSum.find_any_stream_serial      avgt   10  317.388 ± 2.787  
>> ms/op
>> ParallelSum.find_first_loop_serial      avgt   10    5.790 ± 0.038  
>> ms/op
>> ParallelSum.find_first_stream_parallel  avgt   10  117.925 ± 1.747  
>> ms/op
>> ParallelSum.find_first_stream_serial    avgt   10  315.076 ± 5.725  
>> ms/op
>> ParallelSum.sum_loop_serial             avgt   10    5.652 ± 0.042  
>> ms/op
>> ParallelSum.sum_stream_parallel         avgt   10    4.881 ± 0.053  
>> ms/op
>> ParallelSum.sum_stream_serial           avgt   10   19.143 ± 0.035  
>> ms/op
>>
>>
>> So here it is. The proof that contention does occur.
>>
>> Regards, Peter
>>