[concurrency-interest] RFR [8011215] optimization of CopyOnWriteArrayList.addIfAbsent()

Doug Lea dl at cs.oswego.edu
Wed Apr 3 10:35:27 UTC 2013


A few quick notes while travelling:

This was designed to perform best in the case of possibly contended
updates when the element is absent, by avoiding retraversal, and
thus minimizing lock hold times in the common case. (When not common,
it can be guarded by a contains check.) However even in this case,
it is possible that a retraversal via arraycopy could be faster
because it can use optimized cheaper writes (fewer card marks).
I'll recheck this on enough machines to make a better assessment
of impact withing a few days.

-Doug


> CPU will predict but compiler may not optimize loop aggressively with the
> branch there; e.g. if you unroll, you'll bloat code size by having a
> branch
> on each element, and then you can hit icache misses.
>
> Arrays.copyOf is intrinsic so should have perf similar to
> System.arrayCopy.
>
> Sent from my phone
> On Apr 2, 2013 7:51 PM, "Stanimir Simeonoff" <stanimir at riflexo.com> wrote:
>
>>
>> On Wed, Apr 3, 2013 at 1:47 AM, Martin Buchholz
>> <martinrb at google.com>wrote:
>>
>>>
>>>
>>>
>>> On Tue, Apr 2, 2013 at 3:45 PM, Ivan Gerasimov
>>> <ivan.gerasimov at oracle.com
>>> > wrote:
>>>
>>>> Thank you, Ulf!
>>>>
>>>>
>>>>  maybe the old code wins for looong arrays, so there could be a
>>>>> threshold to decide between old and new code:
>>>>>
>>>>
>>>> I've modified the benchmark code to test arrays with 90'000 to 100'000
>>>> elements. (Previously was testing 1 to 100 elements.)
>>>> The performance gain turns out to be even more significant.
>>>> On my machine tests show that with that many elements the new code
>>>> runs
>>>> 40% faster.
>>>>
>>>> Honestly, I didn't expect that. I thought my code might be a bit
>>>> slower
>>>> and hoped that not much slower.
>>>>
>>>
>>> Yeah, that's a bit surprising.  Perhaps because you're avoiding the
>>> branch of testing object for null on each iteration?
>>>
>>> The branch would be perfectly predicted by the CPU, so it cannot be
>>> that.
>> System.arraycopy would be much better for larger arrays as it uses SSE
>> extensions to copy but normally COW structures would not have so many
>> elements.
>> btw, the 2 pass version benefits of low memory footprint equals() (and
>> all
>> in L2 cache). Basically if you don't get cache trashing Arrays.copy
>> would
>> be the better one.
>>
>> Stanimir
>>
>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>





More information about the core-libs-dev mailing list