RFR: 8186838: Generalize Atomic::inc/dec with templates
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Fri Sep 1 15:52:01 UTC 2017
The only Atomic::inc* that I found in product code that wasn't printing
statistics or exception cases was mostly in G1 and one interesting case
in objectMonitor and safepointing, where a lot of other CAS operations
already have been done. I'm willing to bet this platform specific
optimization has no value. I would vote removal, pending examination
of these places.
share/vm/gc/g1/dirtyCardQueue.cpp
if (result) {
assert_fully_consumed(node, buffer_size());
Atomic::inc(&_processed_buffers_mut);
}
...
Atomic::inc(&_processed_buffers_rs_thread);*
*
share/vm/gc/g1/heapRegionRemSet.cpp
Atomic::inc(&_occupied);
Atomic::inc(&_n_coarsenings);
share/vm/runtime/objectMonitor.cpp
ObjectMonitor::enter()
// Prevent deflation at STW-time. See deflate_idle_monitors() and
is_busy().
// Ensure the object-monitor relationship remains stable while
there's contention.
Atomic::inc(&_count);
share/vm/runtime/safepoint.cpp
if (is_synchronizing()) {
Atomic::inc (&TryingToBlock) ;
}
share/vm/code/nmethod.cpp
nmethodLocker
Atomic::inc(&nm->_lock_count);
Coleen
On 9/1/17 11:23 AM, Erik Österlund wrote:
> Hi Coleen,
>
> On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 9/1/17 10:15 AM, Erik Österlund wrote:
>>> Hi Andrew,
>>>
>>> On 2017-09-01 15:41, Andrew Haley wrote:
>>>> On 31/08/17 13:45, Erik Österlund wrote:
>>>>> Hi everyone,
>>>>>
>>>>> Bug ID:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>>>
>>>>> The time has come for the next step in generalizing Atomic with
>>>>> templates. Today I will focus on Atomic::inc/dec.
>>>>>
>>>>> I have tried to mimic the new Kim style that seems to have been
>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the
>>>>> structure looks like this:
>>>>>
>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object
>>>>> that performs some basic type checks.
>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can
>>>>> define
>>>>> the operation arbitrarily for a given platform. The default
>>>>> implementation if not specialized for a platform is to call
>>>>> Atomic::add.
>>>>> So only platforms that want to do something different than that as an
>>>>> optimization have to provide a specialization.
>>>>> Layer 3) Platforms that decide to specialize
>>>>> PlatformInc/PlatformDec to
>>>>> be more optimized may inherit from a helper class
>>>>> IncUsingConstant/DecUsingConstant. This helper helps performing the
>>>>> necessary computation what the increment/decrement should be after
>>>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation
>>>>> then
>>>>> only needs to define an inc/dec member function, and will then get
>>>>> all
>>>>> the context information necessary to generate a more optimized
>>>>> implementation. Easy peasy.
>>>> I wanted to say something nice, but I honestly can't. I am dismayed.
>>>
>>> Okay.
>>>
>>>> I hoped that inc/dec would turn out to be much simpler than the
>>>> cmpxchg functions: I think they should, because they don't have to
>>>> deal with the complexity of potentially three different types.
>>>> Instead we have, again, a large and complex patch.
>>>>
>>>> Even on AArch64, which should be the simplest case because Atomic::inc
>>>> can be defined as
>>>>
>>>> template <typename T1>
>>>> inc(T1 *dest) {
>>>> return __sync_add_and_fetch(dest, 1);
>>>> }
>>>
>>> AArch64 is indeed the simplest case. It does not have a
>>> specialization in my patch. It simply expresses Atomic::inc in terms
>>> of Atomic::add.
>>>
>>>> or something similar, we have
>>>>
>>>> Atomic::inc<int>
>>>> Atomic::IncImpl<int, void>::operator()
>>>> Atomic::PlatformInc<4ul, IntegralConstant<int, 1> >::operator()
>>>> Atomic::add<int, int>
>>>> Atomic::AddImpl<int, int, void>::operator()
>>>> Atomic::AddAndFetch<Atomic::PlatformAdd<4ul> >::operator()<int, int>
>>>> Atomic::PlatformAdd<4ul>::add_and_fetch<int, int>
>>>> __sync_add_and_fetch
>>>>
>>>> I quite understand that it isn't so easy on some systems, and they
>>>> need a generic form that explodes into four different calls, one for
>>>> each size of integer. I completely accept that it will be more
>>>> complex for everything else. But is it necessary to have so much code
>>>> for something so simple? This is a 1400 line patch. Granted, much of
>>>> it is simply moving stuff around, but despite the potential of
>>>> template code to simplify the implementation we have a more complex
>>>> solution than we had before.
>>>>
>>>> I ask you, is this the simplest solution that you believe is possible?
>>>
>>> It is not the simplest solution I can think of. The simplest
>>> solution I can think of is to remove all specialized versions of
>>> Atomic::inc/dec and just have it call Atomic::add directly. That
>>> would remove the optimizations we have today, for whatever reason we
>>> have them. It would lead to slightly more conservative fencing on
>>> PPC/S390, and would lead to slightly less optimal machine encoding
>>> on x86 (without immediate values in the instructions). But it would
>>> be simpler for sure. I did not put any judgement into whether our
>>> existing optimizations are worthwhile or not. But if you want to
>>> prioritize simplicity, removing those optimizations is one possible
>>> solution. Would you prefer that?
>>
>> I wonder if you could remove the linux x86 asm code for inc/dec,
>> recode it to use add, and do a dev submit run against your patch?
>> While we're discussing this.
>
> Okay, I will try that.
>
> /Erik
>
>> thanks,
>> Coleen
>>
>>>
>>> Thanks,
>>> /Erik
>>
>
More information about the hotspot-dev
mailing list