RFR: 8199417: Modularize interpreter GC barriers
Erik Österlund
erik.osterlund at oracle.com
Wed Apr 4 21:03:52 UTC 2018
Hi Coleen,
On 2018-04-04 22:05, coleen.phillimore at oracle.com wrote:
>
> Thank you for keeping load_heap_oop in the macroAssembler, I like it
> and think it should be consistent with the rest of the platforms.
I'm glad you liked it.
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/x86/templateTable_x86.cpp.udiff.html
>
>
> + do_oop_store(_masm, Address(rdx, 0), rax, IN_HEAP_ARRAY);
>
>
> What happens if someone misses IN_HEAP_ARRAY and puts IN_HEAP instead?
Then we will miss performing precise card marking, which we expect
should happen when either 1) storing into an array, or 2) store using
unsafe, which in turn could be into an array.
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp.udiff.html
>
>
> + do_oop_store(_masm, element_address, noreg, IN_HEAP | IN_HEAP_ARRAY);
>
>
> Here it has both for aarch64.
do_oop_store in the templateTable code is always IN_HEAP. So that seems
redundant.
> I reviewed the x86 interpreter code, and will review the rest when/if
> people say that load_heap_oop is good.
Okay, thanks. In that case, I will start to polish the SPARC code with
similar wrappers meanwhile.
Thanks,
/Erik
> Thanks,
> Coleen
>
>
> On 4/4/18 9:38 AM, Erik Österlund wrote:
>> Hi Kim,
>>
>> Thank you for reviewing this.
>>
>> I have made a new webrev with proposed changes to the x86/x64 code
>> reflecting the concerns you and Coleen have.
>> I thought we should settle for something we like in the x86/x64 code
>> before going ahead and changing the other platforms too much (I don't
>> want to bug Martin and Roman too much before).
>>
>> New full webrev:
>> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/
>>
>> Incremental:
>> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00_01/
>>
>> On 2018-04-01 05:12, Kim Barrett wrote:
>>>> On Mar 26, 2018, at 5:31 AM, Erik Österlund
>>>> <erik.osterlund at oracle.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> The GC barriers for the interpreter are not as modular as they
>>>> could be. They currently use switch statements to check which GC
>>>> barrier set is being used, and call this or that barrier based on
>>>> that, in a way that assumes GCs only use write barriers.
>>>>
>>>> This patch modularizes this by generating accesses in the
>>>> interpreter with declarative semantics. Accesses to the heap may
>>>> now use store_at and load_at functions of the BarrierSetAssembler,
>>>> passing along appropriate arguments and decorators. Each concrete
>>>> BarrierSetAssembler can override the access completely or sprinkle
>>>> some appropriate GC barriers as necessary.
>>>>
>>>> Big thanks go to Martin Doerr and Roman Kennke, who helped plugging
>>>> this into S390, PPC and AArch64 respectively.
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00/
>>>>
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8199417
>>>>
>>>> Thanks,
>>>> /Erik
>>> I've only looked at the x86 and generic code so far. I have some
>>> comments, but also a bunch of questions. I think I'm missing some
>>> things somewhere. I'm sending what I've got so far anyway, and will
>>> continue studying.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>>> 354 // flatten object address if needed
>>> 355 // We do it regardless of precise because we need the registers
>>>
>>> There is no "precise" in this context.
>>
>> I thought "precise" was referring to the concept of precise card
>> marking, as opposed to the variable 'precise' that used to exist. I
>> have reworded the comment to reflect that.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>>> 364 #ifndef _LP64
>>> 365 InterpreterMacroAssembler *imasm =
>>> static_cast<InterpreterMacroAssembler*>(masm);
>>> 366 #endif
>>>
>>> In the old code, the function received an InterpreterMacroAssembler*.
>>>
>>> What is there about this context that lets us know the downcast is
>>> safe here? Is oop_store_at only ever called with an IMA*? If so,
>>> then why isn't that the parameter type. If not, then what?
>>
>> Yes, this is indeed only used by the InterpreterMacroAssembler. But I
>> wanted the interface to use MacroAssembler. Why?
>>
>> 1) It's only 32 bit x86 that relies on this property, and I hope it
>> will go away soon, and the save bcp hack with it.
>> 2) previous load_heap_oop is used not only in the
>> InterpreterMacroAssembler, and I wanted the same assembler in load_at
>> and store_at.
>>
>> So for now it is only used in InterpreterMacroAssembler, and I doubt
>> that will change any time soon. I am hoping 32 bit support will be
>> dropped before that, and the hack will go away. For now, I have added
>> a virtual is_interpreter_masm() call that I use in an assert to make
>> sure this is not accidentally used in the wrong context until 32 bit
>> support is gone.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>>> 753 bs->load_at(_masm, IN_HEAP | ON_WEAK_OOP_REF, T_OBJECT,
>>> 754 rax, field_address, /*tmp1*/ rbx, /*tmp_thread*/
>>> rdx);
>>>
>>> This needs to distinguish between WEAK and PHANTOM references, else it
>>> will break ZGC's concurrent reference processing. (At least so long as
>>> we still have to deal with finalization.)
>>
>> It does. This is ON_WEAK_OOP_REF. The get intrinsic is generated for
>> WeakReference (and SoftReference) only. PhantomReference does not
>> have a getter. Therefore, in all contexts this is intrinsified, it
>> will refer to an ON_WEAK_OOP_REF.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>>> 698 address
>>> TemplateInterpreterGenerator::generate_Reference_get_entry(void) {
>>>
>>> Almost all the code here (and much of the commentary) was/is specific
>>> to G1 (or SATB collectors). Shouldn't it all be in the barrier_set's
>>> load_at? As it is now, if (say) we're using Parallel GC then I think
>>> there's a bunch of additional code being generated and executed here
>>> that wasn't present before.
>>
>> It is true that interpreter Reference.get() intrinsics with
>> Parallel/CMS/Serial currently run a few instructions more than they
>> need to pay for the abstraction. There is no doubt in my mind that
>> this abstraction is worth the cost. Reference.get in the interpreter
>> is not a hot path, and does not need to optimize every cycle.
>>
>> I changed the G1 comments to more general comments instead.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>>> 170 __ load_heap_oop(method_temp, Address(recv,
>>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())));
>>> =>
>>> 175 bs->load_at(_masm, IN_HEAP, T_OBJECT, method_temp,
>>> Address(recv,
>>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())), t
>>>
>>> This doesn't look like an improvement in code readability to me. Why
>>> can't bs->load_at be the implementation of "__ load_heap_oop" ?
>>>
>>> And the lines are getting really long; so long that webrev is cutting
>>> off the right end, so I can't review the new versions of the lines.
>>
>> Due to popular demand, I have done what you (and Coleen) both
>> proposed: retained the load_heap_oop abstraction.
>>
>> Before:
>> * load_heap_oop was the "raw" variant, similar to
>> oopDesc::load_decode_heap_oop, which no longer exists.
>>
>> Now:
>> * load_heap_oop calls MacroAssembler::access_load_at
>> * access_load_at grabs the active BarrierSetAssembler and calls
>> load_at on it.
>> ... same idea for stores.
>> * What used to be the raw versions for oop loads and stores, has
>> moved to BarrierSetAssembler::load_at/store_at.
>>
>> These accessors now optionally take temporary register and decorator
>> parameters than you don't have to supply:
>> * supplying temprary register is polite to the GC so its accesses can
>> be a bit better
>> * supplying decorators can be done if it is not a default access. For
>> example, AS_RAW can now be used to perform a raw load instead.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>>> 363 __ load_heap_oop(temp2_defc, member_clazz);
>>> =>
>>> 368 BarrierSetAssembler* bs =
>>> BarrierSet::barrier_set()->barrier_set_assembler();
>>> 369 Register rthread = LP64_ONLY(r15_thread) NOT_LP64(noreg);
>>> 370 bs->load_at(_masm, IN_HEAP, T_OBJECT, temp2_defc,
>>> member_clazz, noreg, rthread);
>>>
>>> Similarly, this doesn't seem like an improvement to me. And there are
>>> more like this.
>>
>> Fixed.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>>
>>> It would reduce the clutter in this change to leave the as_Address
>>> declarations where they were, since that place is now public, rather
>>> than moving them to a different public section.
>>
>> Fixed.
>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> Generally, it seems like it might be nice for the MacroAssembler class
>>> to have a reference to the barrier_set, or maybe even the bs's
>>> assembler, rather than looking them up in various places, and in
>>> different ways. For example, I see each of
>>>
>>> Universe::heap()->barrier_set()
>>> BarrerSet::barrier_set()
>>>
>>> used in various different places *in new code*. Some consistency
>>> would be nice.
>>
>> I think it seems fine to just fetch them only in the
>> access_load/store_at wrappers.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>>> 5249 bs->load_at(this, IN_ROOT | ON_PHANTOM_OOP_REF, T_OBJECT,
>>> 5250 value, Address(value,
>>> -JNIHandles::weak_tag_value), tmp, thread);
>>> 5251 verify_oop(value);
>>>
>>> This puts the verify_oop after the G1 pre-barrier, where in the old
>>> code verify_oop was before the pre-barrier. I remember the old order
>>> being intentional, though don't off-hand recall how important that
>>> was. If for no other reason, it seems like the new order could rarely
>>> (due to bad timing) have a bad oop reported somewhere far away during
>>> SATB buffer processing, rather than at the point of the load.
>>>
>>> With the new version's order there could be one verify after the done
>>> label. That's just a minor nit.
>>
>> Conversely, I think the new order is better and more GC agnostic.
>> It avoids, for example, sending in bad oops for verification with
>> ZGC. The more GC-agnostic interface is that after the load_at
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>>> 230 #ifdef _LP64
>>> 231 if (c_rarg1 != thread) {
>>> 232 __ mov(c_rarg1, thread);
>>> 233 }
>>> 234 if (c_rarg0 != pre_val) {
>>> 235 __ mov(c_rarg0, pre_val);
>>> 236 }
>>> 237 #else
>>> 238 __ push(thread);
>>> 239 __ push(pre_val);
>>> 240 #endif
>>>
>>> Old code used:
>>>
>>> __ pass_arg1(thread);
>>> __ pass_arg0(pre_val);
>>>
>>> Why is this being changed? Oh, pass_argN are static helpers not
>>> available outside macroAssembler_x86.cpp. Maybe they should be
>>> exposed in some way, like public helpers in the x86 version of the
>>> MacroAssembler class, rather than copied inline here (and maybe other
>>> places?).
>>
>> Fixed. (added a wrapper for this to MacroAssembler, as suggested)
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/interp_masm_x86.cpp
>>> Deleted these lines:
>>> 519 // The resulting oop is null if the reference is not yet
>>> resolved.
>>> 520 // It is Universe::the_null_sentinel() if the reference
>>> resolved to NULL via condy.
>>>
>>> I don't think those comment lines should be deleted.
>>
>> Fixed.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/gc/shared/modRefBarrierSetAssembler_x86.cpp
>>> 86 void ModRefBarrierSetAssembler::store_at(MacroAssembler* masm,
>>> DecoratorSet decorators, BasicType type,
>>> 87 Address dst, Register val,
>>> Register tmp1, Register tmp2) {
>>> 88 if (type == T_OBJECT || type == T_ARRAY) {
>>> 89 oop_store_at(masm, decorators, type, dst, val, tmp1, tmp2);
>>> 90 } else {
>>> 91 BarrierSetAssembler::store_at(masm, decorators, type, dst,
>>> val, tmp1, tmp2);
>>> 92 }
>>> 93 }
>>>
>>> Doesn't the conditional conversion from store_at to oop_store_at
>>> actually indicate a bug in the caller? That is, calling store_at with
>>> a oop (T_OBJECT or T_ARRAY) value seems like a bug, and should be
>>> asserted against, rather than translating.
>>>
>>> And oop_store_at should assert that the value *is* an oop value.
>>>
>>> And should there be any verification that the decorators and the type
>>> are consistent?
>>>
>>> But maybe I'm not understanding the protocol around oop_store_at?
>>> There being no documentation, it seems entirely possible that I've
>>> misunderstood something here. Maybe I'm being confused by
>>> similarities in naming with Access:: that don't hold?
>>
>> The confusion roots in that Access and BarrierSetAssembler are not
>> structured exactly the same.
>>
>> Access provides store_at and oop_store_at. The reason for providing
>> both of these, is that I could not safely infer whether the passed in
>> parameters were oops or not without changing oop and narrowOop to
>> always be value objects, which I did not dare to do as the generated
>> code was worse.
>>
>> In BarrierSetAssembler, there is no such restriction. The type for
>> the access is always passed in to store_at/load_at as BasicType. It
>> is the responsibility of ModRefBarrierSetAssembler to filter away
>> non-oop references, and call oop_store_at for relevant accesses for
>> ModRef. This follows the same pattern that the arraycopy stubs used.
>> This allows, e.g. CardTableModRef to only override oop_store_at. This
>> is similar to how ModRefBarrierSet::AccessBarrier<decorators,
>> BarrierSetT>::oop_store_in_heap calls the concrete barriersets for
>> help generating that access. I hope this helps understanding the
>> structuring.
>>
>> This is also the reason why G1BarrierSetAssembler::load_at checks
>> whether it is an oop that is loaded or not, because loads are not
>> riding on the ModRef layer, which only knows about oop stores.
>>
>>> ------------------------------------------------------------------------------
>>>
>>> src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.cpp
>>> 88 void CardTableBarrierSetAssembler::store_check(MacroAssembler*
>>> masm, Register obj, Address dst) {
>>> 89 // Does a store check for the oop in register obj. The content of
>>> 90 // register obj is destroyed afterwards.
>>>
>>> The comment states obj is an oop, but when called by the precise case
>>> of oop_store_at, it is not an oop, but rather a derived pointer. I
>>> think it's the comment that's wrong.
>>
>> I changed the comment to refer to obj as an oop* instead of an oop.
>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>> I was surprised that oop_store_at is initially declared at the
>>> ModRefBarrierSetAssembler level, rather than at the
>>> BarrierSetAssembler level.
>>>
>>> I was also surprised to not see an oop_load_at.
>>>
>>> In Access we have _at suffixed operations paired with non-_at suffixed
>>> operations, the difference being the _at suffixed operations have a
>>> base object, to allow dealing with the brooks-ptr. I'm not finding any
>>> such distinction in the xxxAssembler API. Presumably I've missed
>>> something somewhere. How is that handled?
>>
>> I think I previously already explained this one. ModRef does not know
>> about oop loads, only stores. Therefore, G1 must override load_at,
>> and not oop_load_at.
>>
>> Thanks,
>> /Erik
>>
>>> ------------------------------------------------------------------------------
>>>
>>>
>>>
>>
>
More information about the hotspot-dev
mailing list