RFR: 8199417: Modularize interpreter GC barriers

Wed Apr 4 20:05:41 UTC 2018

Thank you for keeping load_heap_oop in the macroAssembler, I like it and 
think it should be consistent with the rest of the platforms.

http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/x86/templateTable_x86.cpp.udiff.html

+ do_oop_store(_masm, Address(rdx, 0), rax, IN_HEAP_ARRAY);

What happens if someone misses IN_HEAP_ARRAY and puts IN_HEAP instead?

http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp.udiff.html

+ do_oop_store(_masm, element_address, noreg, IN_HEAP | IN_HEAP_ARRAY);

Here it has both for aarch64.

I reviewed the x86 interpreter code, and will review the rest when/if 
people say that load_heap_oop is good.

Thanks,
Coleen

On 4/4/18 9:38 AM, Erik Österlund wrote:
> Hi Kim,
>
> Thank you for reviewing this.
>
> I have made a new webrev with proposed changes to the x86/x64 code 
> reflecting the concerns you and Coleen have.
> I thought we should settle for something we like in the x86/x64 code 
> before going ahead and changing the other platforms too much (I don't 
> want to bug Martin and Roman too much before).
>
> New full webrev:
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.01/
>
> Incremental:
> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00_01/
>
> On 2018-04-01 05:12, Kim Barrett wrote:
>>> On Mar 26, 2018, at 5:31 AM, Erik Österlund 
>>> <erik.osterlund at oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> The GC barriers for the interpreter are not as modular as they could 
>>> be. They currently use switch statements to check which GC barrier 
>>> set is being used, and call this or that barrier based on that, in a 
>>> way that assumes GCs only use write barriers.
>>>
>>> This patch modularizes this by generating accesses in the 
>>> interpreter with declarative semantics. Accesses to the heap may now 
>>> use store_at and load_at functions of the BarrierSetAssembler, 
>>> passing along appropriate arguments and decorators. Each concrete 
>>> BarrierSetAssembler can override the access completely or sprinkle 
>>> some appropriate GC barriers as necessary.
>>>
>>> Big thanks go to Martin Doerr and Roman Kennke, who helped plugging 
>>> this into S390, PPC and AArch64 respectively.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8199417/webrev.00/
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8199417
>>>
>>> Thanks,
>>> /Erik
>> I've only looked at the x86 and generic code so far. I have some
>> comments, but also a bunch of questions. I think I'm missing some
>> things somewhere. I'm sending what I've got so far anyway, and will
>> continue studying.
>>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 354   // flatten object address if needed
>> 355   // We do it regardless of precise because we need the registers
>>
>> There is no "precise" in this context.
>
> I thought "precise" was referring to the concept of precise card 
> marking, as opposed to the variable 'precise' that used to exist. I 
> have reworded the comment to reflect that.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 364 #ifndef _LP64
>> 365   InterpreterMacroAssembler *imasm = 
>> static_cast<InterpreterMacroAssembler*>(masm);
>> 366 #endif
>>
>> In the old code, the function received an InterpreterMacroAssembler*.
>>
>> What is there about this context that lets us know the downcast is
>> safe here?  Is oop_store_at only ever called with an IMA*?  If so,
>> then why isn't that the parameter type.  If not, then what?
>
> Yes, this is indeed only used by the InterpreterMacroAssembler. But I 
> wanted the interface to use MacroAssembler. Why?
>
> 1) It's only 32 bit x86 that relies on this property, and I hope it 
> will go away soon, and the save bcp hack with it.
> 2) previous load_heap_oop is used not only in the 
> InterpreterMacroAssembler, and I wanted the same assembler in load_at 
> and store_at.
>
> So for now it is only used in InterpreterMacroAssembler, and I doubt 
> that will change any time soon. I am hoping 32 bit support will be 
> dropped before that, and the hack will go away. For now, I have added 
> a virtual is_interpreter_masm() call that I use in an assert to make 
> sure this is not accidentally used in the wrong context until 32 bit 
> support is gone.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>> 753   bs->load_at(_masm, IN_HEAP | ON_WEAK_OOP_REF, T_OBJECT,
>> 754               rax, field_address, /*tmp1*/ rbx, /*tmp_thread*/ rdx);
>>
>> This needs to distinguish between WEAK and PHANTOM references, else it
>> will break ZGC's concurrent reference processing.  (At least so long as
>> we still have to deal with finalization.)
>
> It does. This is ON_WEAK_OOP_REF. The get intrinsic is generated for 
> WeakReference (and SoftReference) only. PhantomReference does not have 
> a getter. Therefore, in all contexts this is intrinsified, it will 
> refer to an ON_WEAK_OOP_REF.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp
>> 698 address 
>> TemplateInterpreterGenerator::generate_Reference_get_entry(void) {
>>
>> Almost all the code here (and much of the commentary) was/is specific
>> to G1 (or SATB collectors).  Shouldn't it all be in the barrier_set's
>> load_at?  As it is now, if (say) we're using Parallel GC then I think
>> there's a bunch of additional code being generated and executed here
>> that wasn't present before.
>
> It is true that interpreter Reference.get() intrinsics with 
> Parallel/CMS/Serial currently run a few instructions more than they 
> need to pay for the abstraction. There is no doubt in my mind that 
> this abstraction is worth the cost. Reference.get in the interpreter 
> is not a hot path, and does not need to optimize every cycle.
>
> I changed the G1 comments to more general comments instead.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>> 170   __ load_heap_oop(method_temp, Address(recv, 
>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())));
>> =>
>> 175   bs->load_at(_masm, IN_HEAP, T_OBJECT, method_temp, 
>> Address(recv, 
>> NONZERO(java_lang_invoke_MethodHandle::form_offset_in_bytes())), t
>>
>> This doesn't look like an improvement in code readability to me.  Why
>> can't bs->load_at be the implementation of "__ load_heap_oop" ?
>>
>> And the lines are getting really long; so long that webrev is cutting
>> off the right end, so I can't review the new versions of the lines.
>
> Due to popular demand, I have done what you (and Coleen) both 
> proposed: retained the load_heap_oop abstraction.
>
> Before:
> * load_heap_oop was the "raw" variant, similar to 
> oopDesc::load_decode_heap_oop, which no longer exists.
>
> Now:
> * load_heap_oop calls MacroAssembler::access_load_at
> * access_load_at grabs the active BarrierSetAssembler and calls 
> load_at on it.
> ... same idea for stores.
> * What used to be the raw versions for oop loads and stores, has moved 
> to BarrierSetAssembler::load_at/store_at.
>
> These accessors now optionally take temporary register and decorator 
> parameters than you don't have to supply:
> * supplying temprary register is polite to the GC so its accesses can 
> be a bit better
> * supplying decorators can be done if it is not a default access. For 
> example, AS_RAW can now be used to perform a raw load instead.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/methodHandles_x86.cpp
>> 363         __ load_heap_oop(temp2_defc, member_clazz);
>> =>
>> 368         BarrierSetAssembler* bs = 
>> BarrierSet::barrier_set()->barrier_set_assembler();
>> 369         Register rthread = LP64_ONLY(r15_thread) NOT_LP64(noreg);
>> 370         bs->load_at(_masm, IN_HEAP, T_OBJECT, temp2_defc, 
>> member_clazz, noreg, rthread);
>>
>> Similarly, this doesn't seem like an improvement to me.  And there are
>> more like this.
>
> Fixed.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/macroAssembler_x86.hpp
>>
>> It would reduce the clutter in this change to leave the as_Address
>> declarations where they were, since that place is now public, rather
>> than moving them to a different public section.
>
> Fixed.
>
>> ------------------------------------------------------------------------------ 
>>
>>
>> Generally, it seems like it might be nice for the MacroAssembler class
>> to have a reference to the barrier_set, or maybe even the bs's
>> assembler, rather than looking them up in various places, and in
>> different ways.  For example, I see each of
>>
>>    Universe::heap()->barrier_set()
>>    BarrerSet::barrier_set()
>>
>> used in various different places *in new code*.  Some consistency
>> would be nice.
>
> I think it seems fine to just fetch them only in the 
> access_load/store_at wrappers.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/macroAssembler_x86.cpp
>> 5249   bs->load_at(this, IN_ROOT | ON_PHANTOM_OOP_REF, T_OBJECT,
>> 5250               value, Address(value, 
>> -JNIHandles::weak_tag_value), tmp, thread);
>> 5251   verify_oop(value);
>>
>> This puts the verify_oop after the G1 pre-barrier, where in the old
>> code verify_oop was before the pre-barrier. I remember the old order
>> being intentional, though don't off-hand recall how important that
>> was.  If for no other reason, it seems like the new order could rarely
>> (due to bad timing) have a bad oop reported somewhere far away during
>> SATB buffer processing, rather than at the point of the load.
>>
>> With the new version's order there could be one verify after the done
>> label.  That's just a minor nit.
>
> Conversely, I think the new order is better and more GC agnostic.
> It avoids, for example, sending in bad oops for verification with ZGC. 
> The more GC-agnostic interface is that after the load_at
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp
>> 230 #ifdef _LP64
>> 231     if (c_rarg1 != thread) {
>> 232       __ mov(c_rarg1, thread);
>> 233     }
>> 234     if (c_rarg0 != pre_val) {
>> 235       __ mov(c_rarg0, pre_val);
>> 236     }
>> 237 #else
>> 238     __ push(thread);
>> 239     __ push(pre_val);
>> 240 #endif
>>
>> Old code used:
>>
>>    __ pass_arg1(thread);
>>    __ pass_arg0(pre_val);
>>
>> Why is this being changed?  Oh, pass_argN are static helpers not
>> available outside macroAssembler_x86.cpp.  Maybe they should be
>> exposed in some way, like public helpers in the x86 version of the
>> MacroAssembler class, rather than copied inline here (and maybe other
>> places?).
>
> Fixed. (added a wrapper for this to MacroAssembler, as suggested)
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/interp_masm_x86.cpp
>> Deleted these lines:
>> 519   // The resulting oop is null if the reference is not yet resolved.
>> 520   // It is Universe::the_null_sentinel() if the reference 
>> resolved to NULL via condy.
>>
>> I don't think those comment lines should be deleted.
>
> Fixed.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/gc/shared/modRefBarrierSetAssembler_x86.cpp
>> 86 void ModRefBarrierSetAssembler::store_at(MacroAssembler* masm, 
>> DecoratorSet decorators, BasicType type,
>> 87                                        Address dst, Register val, 
>> Register tmp1, Register tmp2) {
>> 88   if (type == T_OBJECT || type == T_ARRAY) {
>> 89     oop_store_at(masm, decorators, type, dst, val, tmp1, tmp2);
>> 90   } else {
>> 91     BarrierSetAssembler::store_at(masm, decorators, type, dst, 
>> val, tmp1, tmp2);
>> 92   }
>> 93 }
>>
>> Doesn't the conditional conversion from store_at to oop_store_at
>> actually indicate a bug in the caller? That is, calling store_at with
>> a oop (T_OBJECT or T_ARRAY) value seems like a bug, and should be
>> asserted against, rather than translating.
>>
>> And oop_store_at should assert that the value *is* an oop value.
>>
>> And should there be any verification that the decorators and the type
>> are consistent?
>>
>> But maybe I'm not understanding the protocol around oop_store_at?
>> There being no documentation, it seems entirely possible that I've
>> misunderstood something here.  Maybe I'm being confused by
>> similarities in naming with Access:: that don't hold?
>
> The confusion roots in that Access and BarrierSetAssembler are not 
> structured exactly the same.
>
> Access provides store_at and oop_store_at. The reason for providing 
> both of these, is that I could not safely infer whether the passed in 
> parameters were oops or not without changing oop and narrowOop to 
> always be value objects, which I did not dare to do as the generated 
> code was worse.
>
> In BarrierSetAssembler, there is no such restriction. The type for the 
> access is always passed in to store_at/load_at as BasicType. It is the 
> responsibility of ModRefBarrierSetAssembler to filter away non-oop 
> references, and call oop_store_at for relevant accesses for ModRef. 
> This follows the same pattern that the arraycopy stubs used. This 
> allows, e.g. CardTableModRef to only override oop_store_at. This is 
> similar to how ModRefBarrierSet::AccessBarrier<decorators, 
> BarrierSetT>::oop_store_in_heap calls the concrete barriersets for 
> help generating that access. I hope this helps understanding the 
> structuring.
>
> This is also the reason why G1BarrierSetAssembler::load_at checks 
> whether it is an oop that is loaded or not, because loads are not 
> riding on the ModRef layer, which only knows about oop stores.
>
>> ------------------------------------------------------------------------------ 
>>
>> src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.cpp
>> 88 void CardTableBarrierSetAssembler::store_check(MacroAssembler* 
>> masm, Register obj, Address dst) {
>> 89   // Does a store check for the oop in register obj. The content of
>> 90   // register obj is destroyed afterwards.
>>
>> The comment states obj is an oop, but when called by the precise case
>> of oop_store_at, it is not an oop, but rather a derived pointer.  I
>> think it's the comment that's wrong.
>
> I changed the comment to refer to obj as an oop* instead of an oop.
>
>> ------------------------------------------------------------------------------ 
>>
>>
>> I was surprised that oop_store_at is initially declared at the
>> ModRefBarrierSetAssembler level, rather than at the
>> BarrierSetAssembler level.
>>
>> I was also surprised to not see an oop_load_at.
>>
>> In Access we have _at suffixed operations paired with non-_at suffixed
>> operations, the difference being the _at suffixed operations have a
>> base object, to allow dealing with the brooks-ptr. I'm not finding any
>> such distinction in the xxxAssembler API. Presumably I've missed
>> something somewhere. How is that handled?
>
> I think I previously already explained this one. ModRef does not know 
> about oop loads, only stores. Therefore, G1 must override load_at, and 
> not oop_load_at.
>
> Thanks,
> /Erik
>
>> ------------------------------------------------------------------------------ 
>>
>>
>>
>