[External] : Re: Why is mark read as MO_RELAXED in read_stable_mark()?

Mon Jul 19 10:05:16 UTC 2021

Hi David,

>>>>> I am messing a little bit with read_stable_mark() in 
>>>>> synchronizer.cpp in Lilliput project (because I want to load the 
>>>>> Klass* from the header).
>>>>>
>>>>> I notice that, in read_stable_mark(), we are loading the header 
>>>>> with MO_RELAXED memory ordering, even though the INFLATING header 
>>>>> is stored either via CAS or (re-)stored with MO_RELEASE. Wouldn't 
>>>>> it be more consequential to load the header with MO_ACQUIRE instead?
>>>> For the monitor case I think we already have an address dependency 
>>>> in the reader so there is no need for the acquire. So we first read 
>>>> the monitor address and then use that address to access the _header 
>>>> field(FastHashCode() case) or _owner 
>>>> field(current_thread_holds_lock() and get_lock_owner() cases). The 
>>>> writer in turn orders the stores to _owner/_header and the 
>>>> publishing of the monitor address by using a release store in 
>>>> inflate() (I think this is the one you are referring to).
>>>>
>>>
>>> Thanks Patricio,
>>>
>>> I am not quite sure that I follow you.
>>>
>>> On the reader side, given an object, when a thread calls into 
>>> FastHashCode() or any locking that causes inflation, it needs to read 
>>> the object header, and that is always done with a plain read. In 
>>> particular, read_stable_mark() does that, while waiting for the 
>>> header to become != INFLATING. On the writer side, we use CAS (using 
>>> MO_SEQ_CST) to temporarily turn on INFLATING(with release_store() to 
>>> write back the original header, and store with MO_RELEASE to write 
>>> back the original header.
>>>
>>> It probably doesn't matter all that much.
>>>
>>> I am wondering about this because I am currently moving the Klass* 
>>> into the object header, which means that for klass() implementation, 
>>> we need to load the header similar to how we do it for hashcode(). In 
>>> particular it involves read_stable_mark(). Now, there is a variant of 
>>> klass-accessor that implies acquire-semantics: 
>>> oopDesc::klass_or_null_acquire(). I am not quite sure why exactly we 
>>> need acquire semantics there, but in order to implement it, I believe 
>>> we would have to make a variant of read_stable_mark() that uses 
>>> acquire semantics to load the header, instead of plain reads.
>>>
>>> But I'm thinking it may be cleaner overall to always use acquire 
>>> semantics in read_stable_mark() (mirroring the release-store and 
>>> seq-cst-CAS), which would make the acquire-accessors for klass() 
>>> superfluous.
>>>
>>> What do you think?
>> It definitely doesn't break correctness if you add it and probably 
>> doesn't change anything in terms of performance. Now, looking back in 
>> history, klass_or_null_acquire() was added in 8166583 to protect 
>> against reading a partially allocated object. The description of the 
>> issue actually argues to explicitly have klass_or_null_acquire() 
>> instead of making klass_or_null() quietly have acquire semantics. If 
>> that is the case then I think you would still need to have a version 
>> with a plain load without the extra fence. You could probably have two 
>> versions of read_stable_mark() by refactoring everything except the 
>> load of the mark and check into a separate shared function. Also by 
>> removing the uneeded extra code before the 'for' loop will make them 
>> even shorter. Another option, although more conservative in terms of 
>> memory ordering on certain cpus, is to keep read_stable_mark() as is 
>> and have klass_or_null_acquire() use a standalone 
>> OrderAccess::acquire() fence before returning. I think any solution 
>> would work, but maybe wait for somebody else to chime in to hear other 
>> thoughts.
> 
> Looking at the uses of read_stable_mark() I would say that FastHashCode 
> and get_lock_owner() should have acquire semantics to match the implied 
> release semantics used when there is inflation and those methods have to 
> read fields of the monitor. I'm not convinced we can rely on any address 
> dependency for semantics in shared code - and it seems too subtle a 
> point even if we can.

Ok, I agree.

> The other uses don't need acquire semantics

Why do you think so?

I see a use in current_thread_holds_lock(), and how is that different 
from get_lock_owner() ? It also reads a field of the monitor after 
loading the header word.

And then I see another use in ::inflate(), which seems to me the primary 
use where acquire is needed. It certainly reads from the monitor after 
loading the header. I'd actually go a little further here, and also give 
acquire semantics to the first plain header-loads in that method. 
(In-fact, why not restructure the retry-loop to do read_stable_mark() 
once at the loop-entry?)

, so I'd argue for adding a
> memory order parameter to read_stable_mark so that it can be controlled 
> explicitly. That memory ordering parameter would then need to be passed 
> through to the call to obj->mark() - either as a memory order parameter 
> or by calling mark() or a new mark_acquire() (adding mark_acquire() 
> seems more in keeping with current oopDesc API design style).

In my local prototype, I have added a template parameter to 
read_stable_mark and pass that to (a new) oopDesc::mark() that also 
takes the MO template parameter to load the header word.

I filed:
https://bugs.openjdk.java.net/browse/JDK-8270894

Thanks,
Roman