[External] : Re: Why is mark read as MO_RELAXED in read_stable_mark()?
Roman Kennke
rkennke at redhat.com
Mon Jul 19 10:05:16 UTC 2021
Hi David,
>>>>> I am messing a little bit with read_stable_mark() in
>>>>> synchronizer.cpp in Lilliput project (because I want to load the
>>>>> Klass* from the header).
>>>>>
>>>>> I notice that, in read_stable_mark(), we are loading the header
>>>>> with MO_RELAXED memory ordering, even though the INFLATING header
>>>>> is stored either via CAS or (re-)stored with MO_RELEASE. Wouldn't
>>>>> it be more consequential to load the header with MO_ACQUIRE instead?
>>>> For the monitor case I think we already have an address dependency
>>>> in the reader so there is no need for the acquire. So we first read
>>>> the monitor address and then use that address to access the _header
>>>> field(FastHashCode() case) or _owner
>>>> field(current_thread_holds_lock() and get_lock_owner() cases). The
>>>> writer in turn orders the stores to _owner/_header and the
>>>> publishing of the monitor address by using a release store in
>>>> inflate() (I think this is the one you are referring to).
>>>>
>>>
>>> Thanks Patricio,
>>>
>>> I am not quite sure that I follow you.
>>>
>>> On the reader side, given an object, when a thread calls into
>>> FastHashCode() or any locking that causes inflation, it needs to read
>>> the object header, and that is always done with a plain read. In
>>> particular, read_stable_mark() does that, while waiting for the
>>> header to become != INFLATING. On the writer side, we use CAS (using
>>> MO_SEQ_CST) to temporarily turn on INFLATING(with release_store() to
>>> write back the original header, and store with MO_RELEASE to write
>>> back the original header.
>>>
>>> It probably doesn't matter all that much.
>>>
>>> I am wondering about this because I am currently moving the Klass*
>>> into the object header, which means that for klass() implementation,
>>> we need to load the header similar to how we do it for hashcode(). In
>>> particular it involves read_stable_mark(). Now, there is a variant of
>>> klass-accessor that implies acquire-semantics:
>>> oopDesc::klass_or_null_acquire(). I am not quite sure why exactly we
>>> need acquire semantics there, but in order to implement it, I believe
>>> we would have to make a variant of read_stable_mark() that uses
>>> acquire semantics to load the header, instead of plain reads.
>>>
>>> But I'm thinking it may be cleaner overall to always use acquire
>>> semantics in read_stable_mark() (mirroring the release-store and
>>> seq-cst-CAS), which would make the acquire-accessors for klass()
>>> superfluous.
>>>
>>> What do you think?
>> It definitely doesn't break correctness if you add it and probably
>> doesn't change anything in terms of performance. Now, looking back in
>> history, klass_or_null_acquire() was added in 8166583 to protect
>> against reading a partially allocated object. The description of the
>> issue actually argues to explicitly have klass_or_null_acquire()
>> instead of making klass_or_null() quietly have acquire semantics. If
>> that is the case then I think you would still need to have a version
>> with a plain load without the extra fence. You could probably have two
>> versions of read_stable_mark() by refactoring everything except the
>> load of the mark and check into a separate shared function. Also by
>> removing the uneeded extra code before the 'for' loop will make them
>> even shorter. Another option, although more conservative in terms of
>> memory ordering on certain cpus, is to keep read_stable_mark() as is
>> and have klass_or_null_acquire() use a standalone
>> OrderAccess::acquire() fence before returning. I think any solution
>> would work, but maybe wait for somebody else to chime in to hear other
>> thoughts.
>
> Looking at the uses of read_stable_mark() I would say that FastHashCode
> and get_lock_owner() should have acquire semantics to match the implied
> release semantics used when there is inflation and those methods have to
> read fields of the monitor. I'm not convinced we can rely on any address
> dependency for semantics in shared code - and it seems too subtle a
> point even if we can.
Ok, I agree.
> The other uses don't need acquire semantics
Why do you think so?
I see a use in current_thread_holds_lock(), and how is that different
from get_lock_owner() ? It also reads a field of the monitor after
loading the header word.
And then I see another use in ::inflate(), which seems to me the primary
use where acquire is needed. It certainly reads from the monitor after
loading the header. I'd actually go a little further here, and also give
acquire semantics to the first plain header-loads in that method.
(In-fact, why not restructure the retry-loop to do read_stable_mark()
once at the loop-entry?)
, so I'd argue for adding a
> memory order parameter to read_stable_mark so that it can be controlled
> explicitly. That memory ordering parameter would then need to be passed
> through to the call to obj->mark() - either as a memory order parameter
> or by calling mark() or a new mark_acquire() (adding mark_acquire()
> seems more in keeping with current oopDesc API design style).
In my local prototype, I have added a template parameter to
read_stable_mark and pass that to (a new) oopDesc::mark() that also
takes the MO template parameter to load the header word.
I filed:
https://bugs.openjdk.java.net/browse/JDK-8270894
Thanks,
Roman
More information about the hotspot-runtime-dev
mailing list