RFR: 8199417: Modularize interpreter GC barriers

Roman Kennke rkennke at redhat.com
Wed Apr 4 20:52:29 UTC 2018


Hi Erik,

the changes make sense to me, and look good.

I am thinking ahead a little bit and am wondering how to fit primitive
heap access into it. Will we have to add one load_heap_X for each
primitive type X, or call access_load_at() directly from all the places
in the interpreter?


Thanks,
Roman

> Hi Kim,
> 
> Thank you for reviewing this.
> 
> I have made a new webrev with proposed changes to the x86/x64 code
> reflecting the concerns you and Coleen have.
> I thought we should settle for something we like in the x86/x64 code
> before going ahead and changing the other platforms too much (I don't
> want to bug Martin and Roman too much before).
> 
> New full webrev:
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/
> 
> Incremental:
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00_01/
> 
> On 2018-04-01 05:12, Kim Barrett wrote:
>>> On Mar 26, 2018, at 5:31 AM, Erik Österlund
>>> <erik.osterlund at oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> The GC barriers for the interpreter are not as modular as they could
>>> be. They currently use switch statements to check which GC barrier
>>> set is being used, and call this or that barrier based on that, in a
>>> way that assumes GCs only use write barriers.
>>>
>>> This patch modularizes this by generating accesses in the interpreter
>>> with declarative semantics. Accesses to the heap may now use store_at
>>> and load_at functions of the BarrierSetAssembler, passing along
>>> appropriate arguments and decorators. Each concrete
>>> BarrierSetAssembler can override the access completely or sprinkle
>>> some appropriate GC barriers as necessary.
>>>
>>> Big thanks go to Martin Doerr and Roman Kennke, who helped plugging
>>> this into S390, PPC and AArch64 respectively.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00/
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8199417
>>>
>>> Thanks,
>>> /Erik
>> I've only looked at the x86 and generic code so far. I have some
>> comments, but also a bunch of questions. I think I'm missing some
>> things somewhere. I'm sending what I've got so far anyway, and will
>> continue studying.
>>
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 354   // flatten object address if needed
>> 355   // We do it regardless of precise because we need the registers
>>
>> There is no "precise" in this context.
> 
> I thought "precise" was referring to the concept of precise card
> marking, as opposed to the variable 'precise' that used to exist. I have
> reworded the comment to reflect that.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 364 #ifndef _LP64
>> 365   InterpreterMacroAssembler *imasm =
>> static_cast<InterpreterMacroAssembler*>(masm);
>> 366 #endif
>>
>> In the old code, the function received an InterpreterMacroAssembler*.
>>
>> What is there about this context that lets us know the downcast is
>> safe here?  Is oop_store_at only ever called with an IMA*?  If so,
>> then why isn't that the parameter type.  If not, then what?
> 
> Yes, this is indeed only used by the InterpreterMacroAssembler. But I
> wanted the interface to use MacroAssembler. Why?
> 
> 1) It's only 32 bit x86 that relies on this property, and I hope it will
> go away soon, and the save bcp hack with it.
> 2) previous load_heap_oop is used not only in the
> InterpreterMacroAssembler, and I wanted the same assembler in load_at
> and store_at.
> 
> So for now it is only used in InterpreterMacroAssembler, and I doubt
> that will change any time soon. I am hoping 32 bit support will be
> dropped before that, and the hack will go away. For now, I have added a
> virtual is_interpreter_masm() call that I use in an assert to make sure
> this is not accidentally used in the wrong context until 32 bit support
> is gone.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>> 753   bs->load_at(_masm, IN_HEAP | ON_WEAK_OOP_REF, T_OBJECT,
>> 754               rax, field_address, /*tmp1*/ rbx, /*tmp_thread*/ rdx);
>>
>> This needs to distinguish between WEAK and PHANTOM references, else it
>> will break ZGC's concurrent reference processing.  (At least so long as
>> we still have to deal with finalization.)
> 
> It does. This is ON_WEAK_OOP_REF. The get intrinsic is generated for
> WeakReference (and SoftReference) only. PhantomReference does not have a
> getter. Therefore, in all contexts this is intrinsified, it will refer
> to an ON_WEAK_OOP_REF.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>> 698 address
>> TemplateInterpreterGenerator::generate_Reference_get_entry(void) {
>>
>> Almost all the code here (and much of the commentary) was/is specific
>> to G1 (or SATB collectors).  Shouldn't it all be in the barrier_set's
>> load_at?  As it is now, if (say) we're using Parallel GC then I think
>> there's a bunch of additional code being generated and executed here
>> that wasn't present before.
> 
> It is true that interpreter Reference.get() intrinsics with
> Parallel/CMS/Serial currently run a few instructions more than they need
> to pay for the abstraction. There is no doubt in my mind that this
> abstraction is worth the cost. Reference.get in the interpreter is not a
> hot path, and does not need to optimize every cycle.
> 
> I changed the G1 comments to more general comments instead.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>> 170   __ load_heap_oop(method_temp, Address(recv,
>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())));
>> =>
>> 175   bs->load_at(_masm, IN_HEAP, T_OBJECT, method_temp, Address(recv,
>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())), t
>>
>> This doesn't look like an improvement in code readability to me.  Why
>> can't bs->load_at be the implementation of "__ load_heap_oop" ?
>>
>> And the lines are getting really long; so long that webrev is cutting
>> off the right end, so I can't review the new versions of the lines.
> 
> Due to popular demand, I have done what you (and Coleen) both proposed:
> retained the load_heap_oop abstraction.
> 
> Before:
> * load_heap_oop was the "raw" variant, similar to
> oopDesc::load_decode_heap_oop, which no longer exists.
> 
> Now:
> * load_heap_oop calls MacroAssembler::access_load_at
> * access_load_at grabs the active BarrierSetAssembler and calls load_at
> on it.
> ... same idea for stores.
> * What used to be the raw versions for oop loads and stores, has moved
> to BarrierSetAssembler::load_at/store_at.
> 
> These accessors now optionally take temporary register and decorator
> parameters than you don't have to supply:
> * supplying temprary register is polite to the GC so its accesses can be
> a bit better
> * supplying decorators can be done if it is not a default access. For
> example, AS_RAW can now be used to perform a raw load instead.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>> 363         __ load_heap_oop(temp2_defc, member_clazz);
>> =>
>> 368         BarrierSetAssembler* bs =
>> BarrierSet::barrier_set()->barrier_set_assembler();
>> 369         Register rthread = LP64_ONLY(r15_thread) NOT_LP64(noreg);
>> 370         bs->load_at(_masm, IN_HEAP, T_OBJECT, temp2_defc,
>> member_clazz, noreg, rthread);
>>
>> Similarly, this doesn't seem like an improvement to me.  And there are
>> more like this.
> 
> Fixed.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>
>> It would reduce the clutter in this change to leave the as_Address
>> declarations where they were, since that place is now public, rather
>> than moving them to a different public section.
> 
> Fixed.
> 
>> ------------------------------------------------------------------------------
>>
>>
>> Generally, it seems like it might be nice for the MacroAssembler class
>> to have a reference to the barrier_set, or maybe even the bs's
>> assembler, rather than looking them up in various places, and in
>> different ways.  For example, I see each of
>>
>>    Universe::heap()->barrier_set()
>>    BarrerSet::barrier_set()
>>
>> used in various different places *in new code*.  Some consistency
>> would be nice.
> 
> I think it seems fine to just fetch them only in the
> access_load/store_at wrappers.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>> 5249   bs->load_at(this, IN_ROOT | ON_PHANTOM_OOP_REF, T_OBJECT,
>> 5250               value, Address(value, -JNIHandles::weak_tag_value),
>> tmp, thread);
>> 5251   verify_oop(value);
>>
>> This puts the verify_oop after the G1 pre-barrier, where in the old
>> code verify_oop was before the pre-barrier. I remember the old order
>> being intentional, though don't off-hand recall how important that
>> was.  If for no other reason, it seems like the new order could rarely
>> (due to bad timing) have a bad oop reported somewhere far away during
>> SATB buffer processing, rather than at the point of the load.
>>
>> With the new version's order there could be one verify after the done
>> label.  That's just a minor nit.
> 
> Conversely, I think the new order is better and more GC agnostic.
> It avoids, for example, sending in bad oops for verification with ZGC.
> The more GC-agnostic interface is that after the load_at
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 230 #ifdef _LP64
>> 231     if (c_rarg1 != thread) {
>> 232       __ mov(c_rarg1, thread);
>> 233     }
>> 234     if (c_rarg0 != pre_val) {
>> 235       __ mov(c_rarg0, pre_val);
>> 236     }
>> 237 #else
>> 238     __ push(thread);
>> 239     __ push(pre_val);
>> 240 #endif
>>
>> Old code used:
>>
>>    __ pass_arg1(thread);
>>    __ pass_arg0(pre_val);
>>
>> Why is this being changed?  Oh, pass_argN are static helpers not
>> available outside macroAssembler_x86.cpp.  Maybe they should be
>> exposed in some way, like public helpers in the x86 version of the
>> MacroAssembler class, rather than copied inline here (and maybe other
>> places?).
> 
> Fixed. (added a wrapper for this to MacroAssembler, as suggested)
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/interp_masm_x86.cpp
>> Deleted these lines:
>> 519   // The resulting oop is null if the reference is not yet resolved.
>> 520   // It is Universe::the_null_sentinel() if the reference resolved
>> to NULL via condy.
>>
>> I don't think those comment lines should be deleted.
> 
> Fixed.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/shared/modRefBarrierSetAssembler_x86.cpp
>> 86 void ModRefBarrierSetAssembler::store_at(MacroAssembler* masm,
>> DecoratorSet decorators, BasicType type,
>> 87                                        Address dst, Register val,
>> Register tmp1, Register tmp2) {
>> 88   if (type == T_OBJECT || type == T_ARRAY) {
>> 89     oop_store_at(masm, decorators, type, dst, val, tmp1, tmp2);
>> 90   } else {
>> 91     BarrierSetAssembler::store_at(masm, decorators, type, dst, val,
>> tmp1, tmp2);
>> 92   }
>> 93 }
>>
>> Doesn't the conditional conversion from store_at to oop_store_at
>> actually indicate a bug in the caller? That is, calling store_at with
>> a oop (T_OBJECT or T_ARRAY) value seems like a bug, and should be
>> asserted against, rather than translating.
>>
>> And oop_store_at should assert that the value *is* an oop value.
>>
>> And should there be any verification that the decorators and the type
>> are consistent?
>>
>> But maybe I'm not understanding the protocol around oop_store_at?
>> There being no documentation, it seems entirely possible that I've
>> misunderstood something here.  Maybe I'm being confused by
>> similarities in naming with Access:: that don't hold?
> 
> The confusion roots in that Access and BarrierSetAssembler are not
> structured exactly the same.
> 
> Access provides store_at and oop_store_at. The reason for providing both
> of these, is that I could not safely infer whether the passed in
> parameters were oops or not without changing oop and narrowOop to always
> be value objects, which I did not dare to do as the generated code was
> worse.
> 
> In BarrierSetAssembler, there is no such restriction. The type for the
> access is always passed in to store_at/load_at as BasicType. It is the
> responsibility of ModRefBarrierSetAssembler to filter away non-oop
> references, and call oop_store_at for relevant accesses for ModRef. This
> follows the same pattern that the arraycopy stubs used. This allows,
> e.g. CardTableModRef to only override oop_store_at. This is similar to
> how ModRefBarrierSet::AccessBarrier<decorators,
> BarrierSetT>::oop_store_in_heap calls the concrete barriersets for help
> generating that access. I hope this helps understanding the structuring.
> 
> This is also the reason why G1BarrierSetAssembler::load_at checks
> whether it is an oop that is loaded or not, because loads are not riding
> on the ModRef layer, which only knows about oop stores.
> 
>> ------------------------------------------------------------------------------
>>
>> src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.cpp
>> 88 void CardTableBarrierSetAssembler::store_check(MacroAssembler*
>> masm, Register obj, Address dst) {
>> 89   // Does a store check for the oop in register obj. The content of
>> 90   // register obj is destroyed afterwards.
>>
>> The comment states obj is an oop, but when called by the precise case
>> of oop_store_at, it is not an oop, but rather a derived pointer.  I
>> think it's the comment that's wrong.
> 
> I changed the comment to refer to obj as an oop* instead of an oop.
> 
>> ------------------------------------------------------------------------------
>>
>>
>> I was surprised that oop_store_at is initially declared at the
>> ModRefBarrierSetAssembler level, rather than at the
>> BarrierSetAssembler level.
>>
>> I was also surprised to not see an oop_load_at.
>>
>> In Access we have _at suffixed operations paired with non-_at suffixed
>> operations, the difference being the _at suffixed operations have a
>> base object, to allow dealing with the brooks-ptr. I'm not finding any
>> such distinction in the xxxAssembler API. Presumably I've missed
>> something somewhere. How is that handled?
> 
> I think I previously already explained this one. ModRef does not know
> about oop loads, only stores. Therefore, G1 must override load_at, and
> not oop_load_at.
> 
> Thanks,
> /Erik
> 
>> ------------------------------------------------------------------------------
>>
>>
>>
> 




More information about the hotspot-dev mailing list